Importing Data
You can load the following data formats into Dolt:
- CSV, JSON, Parquet
- MySQL databases
- Postgres
- Spreadsheets
- Pandas dataframes
We support several commands for this including:
- 2.
- 3.
The easiest sources of data to work with are CSV, JSON, and Parquet files. These pair best with the custom
dolt table import
command.- 1.Importing with no schema
Dolt supports importing
csv
files without a defined SQL schema. Consider the following csv file:pk,val
1,2
2,3
We can import and create a table as follows:
dolt table import -c --pk=pk mytable file.csv
Rows Processed: 2, Additions: 2, Modifications: 0, Had No Effect: 0
Import completed successfully.
We can query the table and see the new schema and data:
> dolt sql -q "describe mytable"
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| pk | int unsigned | NO | PRI | | |
| val | int unsigned | NO | | | |
+-------+--------------+------+-----+---------+-------+
> dolt sql -q "select * from mytable"
+----+-----+
| pk | val |
+----+-----+
| 1 | 2 |
| 2 | 3 |
+----+-----+
You can reference the
dolt table import
documenation for additional ways to modify your database such as updating or replacing your existing data.- 2.Importing with a schema
In the case of JSON or Parquet files we require you provide a schema in the form of a
CREATE TABLE
SQL statement. You can also specify a schema for a csv file. Let's walk through the following file.{
"rows": [
{
"id": 0,
"first name": "tim",
"last name": "sehn",
"title": "ceo"
},
{
"id": 1,
"first name": "aaron",
"last name": "son",
"title": "founder"
},
{
"id": 2,
"first name": "brian",
"last name": "hendricks",
"title": "founder"
}
]
}
Our sql schema will look like this:
CREATE TABLE employees (
`id` LONGTEXT NOT NULL,
`first name` LONGTEXT,
`last name` LONGTEXT,
`title` LONGTEXT,
PRIMARY KEY (`id`)
);
Putting it all together
> dolt table import -c -s schema.sql employees file.json
Rows Processed: 3, Additions: 3, Modifications: 0, Had No Effect: 0
Import completed successfully.
> dolt sql -q "select * from employees"
+----+------------+-----------+---------+
| id | first name | last name | title |
+----+------------+-----------+---------+
| 0 | tim | sehn | ceo |
| 1 | aaron | son | founder |
| 2 | brian | hendricks | founder |
+----+------------+-----------+---------+
- 3.You can also use the MySQL
LOAD DATA
command to work with data that is compatible with the LOAD DATA api. For example you can load the abovefile.csv
as follows:
create table test(pk int, val int);
LOAD DATA INFILE '/Users/vinairachakonda/misc/test/file.csv' INTO table test FIELDS TERMINATED BY ',' IGNORE 1 LINES;
Selecting from above you get
test> select * from test;
+----+-----+
| pk | val |
+----+-----+
| 2 | 3 |
| 1 | 2 |
+----+-----+
Migrating your MySQL database to Dolt is easy. We recommend the
mysqldump
program. Let's say you have MySQL server with the following configurationuser: root
password:
host: 0.0.0.0
port: 3306
database: test
You can dump the database
test
as follows:mysqldump --databases test -P 3306 -h 0.0.0.0 -u root -p > dump.sql
Note: Using the
--databases
flag will cause mysqldump
to include a command to create the database for you if it doesn't exist yet. Without this flag, you will need to ensure you have already created your database (e.g. create database test;
) before you can load in the mysqldump
file.To load into dolt:
dolt sql < dump.sql
Importing data into a Hosted Dolt deployment can be done using the deployment's default configuration. Let's say you've dumped an existing MySQL database using
mysqldump
and want to import that dump into your Hosted Dolt deployment.Navigate to the "Configuration" tab of your Hosted deployment dashboard, and make sure that the box
behavior_auto_commit
is checked. This will automatically create SQL COMMIT
s after every statement is executed, ensuring the data is persisted after each import statement is executed.Also, make sure that the box
behavior_disable_multistatements
is unchecked. This is required for importing dumps, as they often combine multiple statements into single strings.
Hosted Dolt Importing Configuration
New Hosted deployments will have these correct server configuration settings by default. Please note that altering them prior to performing an import has been known to cause some imports to fail.
With the above settings in place, you can import the SQL dump using a standard MySQL client or database tool:
mysql -h my-deployment.dbs.hosted.doltdb.com -u <user name> -p<password> database < dump.sql
With a postgres dump file of
file.pgdump
you can convert it into a mysql dump as follows../pg2mysql.pl < file.pgdump > mysql.sql
Finally, you can load the mysql file into dolt.
dolt sql < mysql.sql
There are multiple ways to get spreadsheet data into Dolt. The first is with
dolt table import
. Consider the following excel file: employees.xlsx
0,tim,sehn,ceo
1,aaron,son,founder
2,brian,hendricks,founder
Just like a csv file we run the command
dolt table import -c --pk=id employees employees.xlsx
to load the excel file into our Dolt. Be sure to give the same name for the table as the spreadsheet's name.The other way to work with spreadsheet data is with Dolthub's "edit like a spreadsheet" feature. You can create a SQL table from scratch with just typing values into a spreadsheet.

Spreadsheet
We recommend standard MySQL + Python techniques when intergrating Dolt with Pandas. First you want to connect to your Dolt database with the SQLAlchemy ORM. Here's some sample code below:
from sqlalchemy import create_engine
import pymysql
import pandas as pd
# Define a data frame
employees = {
"id": [0, 1, 2],
"FirstName": ["Tim","Aaron","Brian"],
"LastName": ["Sehn", "Son", "Hendriks"],
"Title": ["CEO", "Founder", "Founder"],
}
tableName = "employees"
dataFrame = pd.DataFrame(data=employees)
# Create an engine that connect to our dolt sql-server.
sqlEngine = create_engine("mysql+pymysql://root:@127.0.0.1/test", pool_recycle=3600)
dbConnection = sqlEngine.connect()
try:
frame = dataFrame.to_sql(tableName, dbConnection, if_exists="fail")
print("Table %s created successfully." % tableName)
except Exception as ex:
print(ex)
finally:
dbConnection.close()
In the above example we are creating a data frame of employees and writing it to our Dolt database with the
to_sql
function. We can then use the read_sql function to read back data from our MySQL database into Dolt.frame = pd.read_sql('SELECT * from employees', dbConnection)
print(frame)
# index id FirstName LastName Title
# 0 0 0 Tim Sehn CEO
# 1 2 2 Brian Hendriks Founder
# 2 1 1 Aaron Son Founder
The second way to do this is by exporting your pandas dataframe as a csv file which can then be imported with
dolt table import
.>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
... 'mask': ['red', 'purple'],
... 'weapon': ['sai', 'bo staff']})
>>> df.to_csv('out.csv', index=False)
There are some best practices to keep in mind in order to make importing external data into Dolt as fast as possible. These performance differences are especially relevant with large databases (~50GB+).
- Avoid adding foreign keys or unique indexes until after the import is completed. These substantially increase import time.
- Minimize your use of blob types. These are expensive to create.
- If running multiple import jobs back to back, be sure to garbage collect the database. Imports can generate a substantial amount of garbage.
- If you're writing your own
.sql
file instead of usingmysqldump
, consider the following:- Import one table at a time.
- Prefer fewer INSERT statements that each insert multiple values over more, shorter statements that only insert a single value each.
- Sort your inserts by primary key. If a table doesn't have a primary key, consider adding one.
Last modified 1d ago