Importing Data
You can load the following data formats into Dolt:
  • CSV, JSON, Parquet
  • MySQL databases
  • Postgres
  • Spreadsheets
  • Pandas dataframes
We support several commands for this including:
  1. 2.
    LOAD DATA
  2. 3.
    dolt sql

CSV, JSON, and Parquet Files

The easiest sources of data to work with are CSV, JSON, and Parquet files. These pair best with the custom dolt table import command.
  1. 1.
    Importing with no schema
Dolt supports importing csv files without a defined SQL schema. Consider the following csv file:
1
pk,val
2
1,2
3
2,3
Copied!
We can import and create a table as follows:
1
dolt table import -c --pk=pk mytable file.csv
2
3
Rows Processed: 2, Additions: 2, Modifications: 0, Had No Effect: 0
4
Import completed successfully.
Copied!
We can query the table and see the new schema and data:
1
> dolt sql -q "describe mytable"
2
+-------+--------------+------+-----+---------+-------+
3
| Field | Type | Null | Key | Default | Extra |
4
+-------+--------------+------+-----+---------+-------+
5
| pk | int unsigned | NO | PRI | | |
6
| val | int unsigned | NO | | | |
7
+-------+--------------+------+-----+---------+-------+
8
9
> dolt sql -q "select * from mytable"
10
+----+-----+
11
| pk | val |
12
+----+-----+
13
| 1 | 2 |
14
| 2 | 3 |
15
+----+-----+
Copied!
You can reference the dolt table import documenation for additional ways to modify your database such as updating or replacing your existing data.
  1. 1.
    Importing with a schema
In the case of JSON or Parquet files we require you provide a schema in the form of a CREATE TABLE SQL statement. You can also specify a schema for a csv file. Let's walk through the following file.
1
{
2
"rows": [
3
{
4
"id": 0,
5
"first name": "tim",
6
"last name": "sehn",
7
"title": "ceo"
8
},
9
{
10
"id": 1,
11
"first name": "aaron",
12
"last name": "son",
13
"title": "founder"
14
},
15
{
16
"id": 2,
17
"first name": "brian",
18
"last name": "hendricks",
19
"title": "founder"
20
}
21
]
22
}
Copied!
Our sql schema will look like this:
1
CREATE TABLE employees (
2
`id` LONGTEXT NOT NULL,
3
`first name` LONGTEXT,
4
`last name` LONGTEXT,
5
`title` LONGTEXT,
6
PRIMARY KEY (`id`)
7
);
Copied!
Putting it all together
1
> dolt table import -c -s schema.sql employees file.json
2
Rows Processed: 3, Additions: 3, Modifications: 0, Had No Effect: 0
3
Import completed successfully.
4
5
> dolt sql -q "select * from employees"
6
+----+------------+-----------+---------+
7
| id | first name | last name | title |
8
+----+------------+-----------+---------+
9
| 0 | tim | sehn | ceo |
10
| 1 | aaron | son | founder |
11
| 2 | brian | hendricks | founder |
12
+----+------------+-----------+---------+
Copied!
  1. 1.
    You can also use the MySQL LOAD DATA command to work with data that is compatible with the LOAD DATA api. For example you can load the above file.csv as follows:
1
create table test(pk int, val int);
2
LOAD DATA INFILE '/Users/vinairachakonda/misc/test/file.csv' INTO table test FIELDS TERMINATED BY ',' IGNORE 1 LINES;
Copied!
Selecting from above you get
1
test> select * from test;
2
+----+-----+
3
| pk | val |
4
+----+-----+
5
| 2 | 3 |
6
| 1 | 2 |
7
+----+-----+
Copied!

MySQL Databases

Migrating your MySQL database to Dolt is easy. We recommend the mysqldump program. Let's say you have MySQL server with the following configuration
1
user: root
2
password:
3
host: 0.0.0.0
4
port: 3306
5
database: test
Copied!
You can dump the database test as follows:
1
mysqldump test -P 3306 -h 0.0.0.0 -u root -p > dump.sql
Copied!
To load into dolt:
1
dolt sql < dump.sql
Copied!

Postgres

You can load a postgres database into dolt with our custom pg2mysql tool. If you have a postgres database you can export a postgres dump file with the pg_dump utility.
With the a postgres dump file of file.pgdump you can convert it into a mysql dump as follows.
1
./pg2mysql.pl < file.pgdump > mysql.sql
Copied!
Finally you can load the mysql file into dolt.
1
dolt sql < mysql.sql
Copied!

Spreadsheets

There are multiple ways to get spreadsheet data into Dolt. The first is with dolt table import. Consider the following excel file: employees.xlsx
1
0,tim,sehn,ceo
2
1,aaron,son,founder
3
2,brian,hendricks,founder
Copied!
Just like a csv file we run the command dolt table import -c --pk=id employees employees.xlsx to load the excel file into our Dolt. Be sure to give the same name for the table as the spreadsheet's name.
The other way to work with spreadsheet data is with Dolthub's "edit like a spreadsheet" feature. You can create a SQL table from scratch with just typing values into a spreadsheet.
Spreadsheet

Pandas Dataframe

We recommend standard MySQL + Python techniques when intergrating Dolt with Pandas. First you want to connect to your Dolt database with the SQLAlchemy ORM. Here's some sample code below:
1
from sqlalchemy import create_engine
2
import pymysql
3
import pandas as pd
4
5
# Define a data frame
6
employees = {
7
"id": [0, 1, 2],
8
"FirstName": ["Tim","Aaron","Brian"],
9
"LastName": ["Sehn", "Son", "Hendriks"],
10
"Title": ["CEO", "Founder", "Founder"],
11
}
12
13
tableName = "employees"
14
dataFrame = pd.DataFrame(data=employees)
15
16
# Create an engine that connect to our dolt sql-server.
17
sqlEngine = create_engine("mysql+pymysql://root:@127.0.0.1/test", pool_recycle=3600)
18
dbConnection = sqlEngine.connect()
19
20
try:
21
frame = dataFrame.to_sql(tableName, dbConnection, if_exists="fail")
22
print("Table %s created successfully." % tableName)
23
except Exception as ex:
24
print(ex)
25
finally:
26
dbConnection.close()
Copied!
In the above example we are creating a data frame of employees and writing it to our Dolt database with the to_sql function. We can then use the read_sql function to read back data from our MySQL database into Dolt.
1
frame = pd.read_sql('SELECT * from employees', dbConnection)
2
print(frame)
3
4
# index id FirstName LastName Title
5
# 0 0 0 Tim Sehn CEO
6
# 1 2 2 Brian Hendriks Founder
7
# 2 1 1 Aaron Son Founder
Copied!
The second way to do this is by exporting your pandas dataframe as a csv file which can then be imported with dolt table import.
1
>>> import pandas as pd
2
>>> df = pd.DataFrame({'name': ['Raphael', 'Donatello'],
3
... 'mask': ['red', 'purple'],
4
... 'weapon': ['sai', 'bo staff']})
5
>>> df.to_csv('out.csv', index=False)
Copied!

Import Best Practices

There are some best practices to keep in mind in order to make importing external data into Dolt as fast as possible. These performance differences are especially relevant with large databases (~50GB+).
  • Avoid adding foreign keys or unique indexes until after the import is completed. These substantially increase import time.
  • Minimize your use of blob types. These are expensive to create.
  • If running multiple import jobs back to back, be sure to garbage collect the database. Imports can generate a substantial amount of garbage.