1 of 100

Dolt

Introduction

What Is Dolt?

Dolt is a SQL database you can fork, clone, branch, merge, push and pull just like a Git repository. Connect to Dolt just like any MySQL database to run SQL queries. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate's changes.

All the commands you know from Git work exactly the same in Dolt. Git versions files, Dolt versions tables. It's like Git and MySQL had a baby.

Dolt is a version controlled database. Dolt is Git for Data. Dolt is a Versioned MySQL Replica.

Version Controlled Database

Dolt is a version controlled SQL database. Connect to Dolt just like any MySQL database to run SQL queries. Use Dolt system tables, functions, or procedures to access version control information and features.

Git for Data

Dolt is Git for data. Dolt matches the Git CLI exactly. When you would have run git add, you run dolt add. When you would have run git commit, you run dolt commit.

Versioned MySQL Replica

Dolt can be deployed as a Versioned MySQL Replica. Because Dolt is MySQL compatible, Dolt can be configured just like any other MySQL replica. A Dolt replica gives you features of a version controlled database without migrating from MySQL.

Hosted Dolt is a cloud-deployed Dolt database. Choose the type of server and disk you need and we'll provision the resources and run Dolt for you. Connect with any MySQL client. Hosted Dolt is perfect for teams who want to build a Dolt-powered application.

We also built DoltHub, a place to share Dolt databases. We host public data for free! DoltHub adds a modern, secure, always on database management web GUI to the Dolt ecosystem. Edit your database on the web, have another person review it via a pull request, and have the production database pull it to deploy.

Not ready to put your databases on the internet, no matter the permissions? We have a self-hosted version of DoltHub we call DoltLab. DoltLab gives you all the features of DoltHub, wherever you want them, in your own network or on your development machine.

Installation

Dolt is extremely simple to install. Dolt is a single ~100 megabyte program. To install it, you download or compile that program and put it on your PATH. For each operating system, we created simpler, more familiar methods of installation.

Linux
Windows
Mac

Dolt is free and open source. You always have the option to build from source. This is also the best option if you want to use unreleased features or bug fixes.

Build from Source

If you want to run Dolt as a MySQL compatible server, we have additional instructions on how to do that on a Linux host or with Docker.

Application Server
Docker

Linux

For Linux users we provide an installation script that will detect your architecture, download the appropriate binary, and place in /usr/local/bin:

sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'

The use of sudo is required to ensure the binary lands in your path. The script can be examined before executing should you have any concerns.

Windows

winget

winget install dolt

Chocolatey

choco install dolt

We provide both .msi files and .zip files.

Scoop

scoop install dolt

MSI Files

The easiest way to install Dolt on Windows is to use our MSI files that are provided with each release, they can be found in the Assets section of our release. Grab the latest here.

`.zip` Archive

For those preferring to install Dolt manually a zipped archive is provided with the requisite executables. It can be found in assets along with our latest release.

Mac

Install Script

The download script for Linux can be used, as OSX is a *nix system. It will download the appropriate binary, and place it in /usr/local/bin:

Homebrew

We publish a Homebrew formula with every release, so Mac users using Homebrew for package management can build Dolt from source with a single command:

This will install Dolt as follows:

MacPorts

Build from Source

For those interested in building from source, clone the Dolt repo from GitHub and use go install. Note, you must have Golang installed on the machine you are compiling on.

$ git clone git@github.com:dolthub/dolt.git
Cloning into 'dolt'...
remote: Enumerating objects: 25, done.
remote: Counting objects: 100% (25/25), done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 87117 (delta 4), reused 6 (delta 0), pack-reused 87092
Receiving objects: 100% (87117/87117), 93.77 MiB | 13.94 MiB/s, done.
Resolving deltas: 100% (57066/57066), done.v
$ cd dolt/go && go install ./cmd/dolt

This will create a binary named dolt at ~/go/bin/dolt, unless you have $GO_HOME set to something other than ~/go.

Application Server

These instructions are for bootstrapping dolt as an application database server. They assume you are starting from scratch on a Linux machine without dolt installed or running.

Package manager support (.deb and .rpm distributions) is coming soon, but for now this set of manual setup work is necessary.

We have the instructions below packaged in a script here.

Installation

Install dolt. Run the following command:

sudo bash -c 'curl -L https://github.com/dolthub/dolt/releases/latest/download/install.sh | sudo bash'

This script puts the dolt binary in /usr/local/bin, which is probably on your $PATH. If it isn't add it there or use use the absolute path of the dolt binary for next steps.

Configuration

Create a system account for the dolt user to run the server.

sudo useradd -r -m -d /var/lib/doltdb dolt

Before running the server, you need to give this user a name and email, which it will use to create its commits. Choose a dolt system account for your product or company.

$ cd /var/lib/doltdb
$ sudo -u dolt dolt config --global --add user.email doltServer@company.com
$ sudo -u dolt dolt config --global --add user.name "Dolt Server Account"

You can override this user for future commits with the --author flag, but this will be default author of every commit in the server.

Database creation

Before running the dolt server for the first time, you need to create a database. Choose a directory within /var/lib/doltdb/databases where you want your dolt data to live. Name the directory the same as the name of your database.

cd /var/lib/doltdb
sudo -u dolt mkdir -p databases/my_db
cd databases/my_db
sudo -u dolt dolt init

You should see output indicating that the database has been initialized:

Successfully initialized dolt data repository.

Start the server

Assuming you want your dolt server to always be running when the machine is alive, you should configure it to run through the Linux service management tool, systemctl. Some distributions of Linux do not support this tool; consult their documentation for configuration instructions.

Write the server's config file in your home directory, then move it to where systemctl needs it to live.

cd ~
cat > doltdb.service <<EOF
[Unit]
Description=dolt SQL server
After=network.target

[Install]
WantedBy=multi-user.target

[Service]
User=dolt
Group=dolt
ExecStart=/usr/local/bin/dolt sql-server
WorkingDirectory=/var/lib/doltdb/databases/my_db
KillSignal=SIGTERM
SendSIGKILL=no
EOF

sudo chown root:root doltdb.service
sudo chmod 644 doltdb.service
sudo mv doltdb.service /etc/systemd/system

Finally, start the server as a system daemon.

sudo systemctl daemon-reload
sudo systemctl enable doltdb.service
sudo systemctl start doltdb

The dolt sql server will now be running as a daemon. Test connecting to it with any SQL shell. Here we are using the mysql shell to connect.

mysql -h 127.0.0.1 -u root -p''

Note that by default, Dolt runs on the same port as MySQL (3306). If you have MySQL installed on the same host, choose a different port for the server with the -P argument.

Users and passwords

By default, when starting dolt sql-server, Dolt will automatically initialize the default root@localhostsuperuser, which is accessible only from the localhost and without a password. To change this account or add any additional accounts, you can use the standardCREATE USER, ALTER USER, and GRANT` SQL statements.

Other Linux distributions

These instructions should work for Debian, Ubuntu, Amazon Linux, and many other common distributions. If you find they don't work for yours and you would like your distribution documented, come chat with us on Discord or submit a PR to update the docs.

Docker

Docker Image for Dolt CLI

Running this image is equivalent to running the dolt command. You can get the latest version with latest tag, or you can get a specific, older version by using the Dolt version you want as the image's tag (e.g. 0.50.8).

Docker Image for Dolt SQL-Server

To check out supported options for dolt sql-server, you can run the image with --help flag.

Connect to the server in the container from the host system

From the host system, to connect to a server running in a container, we need to map a port on the host system to the port our sql-server is running on in the container.

We also need a user account that has permission to connect to the server from the host system's address. By default, as of Dolt version 1.46.0, the root superuser is limited to connections from localhost. This is a security feature to prevent unauthorized access to the server. If you don't want to log in to the container and then connect to your sql-server, you can set the DOLT_ROOT_HOST and DOLT_ROOT_PASSWORD environment variables to control how the root superuser is initialized. When the Dolt sql-server container is started, it will ensure the root superuser is configured according to those environment variables.

In our example below, we're using DOLT_ROOT_HOST to override the host of the root superuser account to % in order to allow any host to connect to our server and log in as root. We're also using DOLT_ROOT_PASSWORD to override the default, empty password to specify a password for the root account. This is strongly advised for security when allowing the root account to connect from any host.

If we run the command above with -d or switch to a separate window we can connect with MySQL:

Define configuration for the server

Or, we can mount a local directory to specific directories in the container. The special directory for server configuration is /etc/dolt/servercfg.d/. You can only have one .yaml configuration file in this directory. If there are multiple, the default configuration will be used. If the location of configuration file was /Users/jennifer/docker/server/config.yaml, this is how to use -v flag which mounts /Users/jennifer/docker/server/ local directory to /etc/dolt/servercfg.d/ directory in the container.

The Dolt configuration and data directories can be configured similarly:

The dolt configuration directory is /etc/dolt/doltcfg.d/ There should be one .json dolt configuration file. It will replace the global dolt configuration file in the container.
We set the location of where data to be stored to default location at /var/lib/dolt/ in the container. The data directory does not need to be defined in server configuration for container, but to store the data on the host system, it can also be mounted to this default location.

There will be directory called /docker-entrypoint-initdb.d inside the container, and all appropriate files including .sh or .sql files. They will be run after server has started. This is useful for such as setting up your database with importing data by providing SQL dump file.

Let's look at an example

Here is how I set up my directories to be mounted. I have three directories to mount in a directory called shared,

databases is empty and is used for storing my data,
dolt has a single .json file that stores my dolt configuration
server has a single .yaml file that stores my server configuration

We can see both config files were used successfully.

We can verify that we have the data we create through the server in our local directory we mounted.

We can check for directory doltdb created in our local /shared/databases directory.

You can verify it has the data we created by using Dolt CLI Docker image if you do not have Dolt installed locally.

Server liveness and readiness checks

When running dolthub/dolt-sql-server in an environment like Kubernetes, liveness and readiness checks can be configured with something like:

This above configuration uses the dolt client within the server container to execute queries against the live server.

Upgrading

Dolt is constantly evolving. We release a new Dolt approximately once a week.

To upgrade, download the latest Dolt binary for your platform and replace the Dolt binary on your PATH with the downloaded one. Running the install process on most platforms again will do this for you.

If you are running a dolt sql-server you must restart the server to start using the new binary.

Getting Started

Dolt has three primary functions all with different ways to get started.

1. Version Controlled Database

Run Dolt like you would MySQL or Postgres.

2. Git for Data

Use the Dolt Command Line Interface like you would the Git Command Line Interface.

3. Versioned MySQL Replica

Use Dolt as a replica to your primary MySQL server to get version control features without migrating.

Version Controlled Database

Dolt is a MySQL compatible database server.

This document will walk you through step-by-step on how to get Dolt running as a MySQL compatible server on your host. You will set up a schema, insert data, and compose read queries using SQL. The document will also cover a number of unique Git-like Dolt features like commits, logs, as of queries, rollback, branches, and merges.

Navigate to the directory where you would like your data stored

Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt.

Start a MySQL-compatible database server

Dolt ships with a MySQL compatible database server built in. To start it you use the command dolt sql-server. Running this command starts the server on port 3306.

Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there and open a new terminal.

Connect with any MySQL client

In the new terminal, we will now connect to the running database server using a client.

MySQL comes with a MySQL server called mysqld and a MySQL client called mysql. You're only interested in the client. After following the instructions from MySQL's documentation, make sure you have a copy of the mysql client on your path:

Now, to connect the mysql client to Dolt, you are going to force the MySQL client through the TCP interface by passing in a host and port. The default is the socket interface which Dolt supports, but is only available on localhost. So, it's better to show off the TCP interface. The MySQL client also requires you specify a user, in this case root.

To ensure the client actually connected, you should see the following in the dolt sql-server terminal

As you can see, Dolt supports any MySQL-compatible client.

Create a schema

Now we're actually ready to do something interesting. I'll stay in the mysql client and execute the following SQL statements to create a database called getting_started. The getting_started database will have three tables: employees, teams, and employees_teams.

Dolt supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich SQL database.

Make a Dolt commit

The naming of the system tables and stored procedures follows the dolt_<command> pattern. So dolt add on the CLI becomes dolt_add as a stored procedure. Passing options also follows the command line model. For instance, to specify tables to add, send the table names in as options to the dolt_add procedure. For named arguments like sending a message into the dolt_commit command use two arguments in sequence like ('-m', 'This is a message'). If you know Git, the version control procedures and system tables should feel familiar.

So, we add and commit our new schema like so.

There you have it. Your schema is created and you have a Dolt commit tracking the creation, as seen in the dolt_log system table.

Insert some data

Now, I'm going to populate the database with a few employees here at DoltHub. Then, I'll assign the employees to two teams: engineering and sales. The CEO wears many hats at a start up so he'll be assigned to multiple teams.

Oops, I violated a constraint. It looks like I created the table with teams before employees. You should always specify your columns when you insert, not rely on natural ordering. Serves me right! Dolt comes with the full power of a modern SQL relational database to ensure data integrity.

Looks like everything is inserted and correct. I was able to list the members of the engineering team using that three table JOIN. Dolt supports up to twelve table JOINs. Again, Dolt is a modern SQL relational database paired with Git-style version control.

Examine the diff

Now, what if you want to see what changed in your working set before you make a commit? You use the dolt_status and dolt_diff_<tablename> system tables.

As you can see from the diff I've added the correct values to the employees table. The values were previously NULL and now they are populated.

Let's finish off with another Dolt commit this time adding all modified tables using -am.

You can inspect the log using dolt_log and see which tables changed in each commit using an unscoped dolt_diff. Unscoped dolt_diff tells you whether schema, data, or both changed in that particular commit for the table.

Oh no! I made a mistake.

Dolt supports undoing changes via call dolt_reset(). Let's imagine I accidentally drop a table.

In a traditional database, this could be disastrous. In Dolt, you're one command away from getting your table back.

See the data in a SQL Workbench

Now, to connect you must select MySQL as the connection type. Then enter a name for your connection, getting_started as your database, and root as your user.

Click connect and you'll be presented with a familiar database workbench GUI.

Make changes on a branch

To make changes on a branch, I use the dolt_checkout() stored procedure. Using the -b option creates a branch, just like in Git.

Tableplus gives me the ability to enter a multiple line SQL script on the SQL tab. I entered the following SQL to checkout a branch, update, insert, delete, and finally Dolt commit my changes.

Here's the result in Tableplus.

Back in my terminal, I cannot see the table modifications made in Tableplus because they happened on a different branch than the one I have checked out in my session.

I can query the branch no matter what I have checked out using SQL as of syntax.

If I'd like to see the diff between the two branches, I can use the dolt_diff() table function. It takes two branches and the table name as arguments.

As you can see, you have the full power of Git-style branches and diffs in a SQL database with Dolt.

Make a schema change on another branch

I can also make schema changes on branches for isolated testing of new schema. I'm going to add a start_date column on a new branch and populate it.

Changing schema on a branch gives you a new method for doing isolated integration testing of new schema changes.

Merge it all together

Let's assume all the testing of the new schema on the schema_changes branch and data on the modifications branch completed flawlessly. It's time to merge all our edits together onto main. This is done using the dolt_merge stored procedure.

Schema change successful. We now have start dates. Data changes are next.

Data changes successful as well. As you can see, I am now "Timothy" instead of "Tim", Daylon is added, and we all have start dates except for Daylon who was added on a different branch.

I'm also gone from the Sales Team. Engineering is life.

Now, we have a database with all the schema and data changes merged and ready for use.

Audit Cell Lineage

Which commit changed my first name? With Dolt you have lineage for every cell in your database. Let's use the dolt_history_<tablename> and dolt_diff_<tablename> to explore the lineage features in Dolt.

dolt_history_<tablename> shows you the state of the row at every commit.

dolt_diff_<tablename> allows you to filter the history down to only commits when the cell in question changed. In this case, I'm interested in the commits that are changing my first name. Note, there are two commits that changed my name because one is the original change and the second is the merge commit.

Dolt provides powerful data audit capabilities down to individual cells. When, how, and why has each cell in your database changed over time?

Conclusion

That should be enough to get you started. We covered installation, starting a SQL server, connecting with various clients, creating a database and schema, inserting and updating data on main, using branches for change isolation, rollback, diffs and logs, merge, and cell lineage. You had the grand tour. Hopefully you are starting to imagine the possibilities for your Dolt-backed applications.

Want to dive even deeper? Here are some links to advanced topics:

Git For Data

Dolt is Git for Data. You can use Dolt's command line interface to version control data like you version control files with Git. Git versions files, Dolt versions tables.

Once you have Dolt installed, type dolt and you'll start to feel the git vibes immediately.

$ dolt
Valid commands for dolt are
                init - Create an empty Dolt data repository.
              status - Show the working tree status.
                 add - Add table changes to the list of staged table changes.
                diff - Diff a table.
               reset - Remove table changes from the list of staged table changes.
               clean - Remove untracked tables from working set.
              commit - Record changes to the repository.
                 sql - Run a SQL query against tables in repository.
          sql-server - Start a MySQL-compatible server.
                 log - Show commit logs.
              branch - Create, list, edit, delete branches.
            checkout - Checkout a branch or overwrite a table from HEAD.
               merge - Merge a branch.
           conflicts - Commands for viewing and resolving merge conflicts.
         cherry-pick - Apply the changes introduced by an existing commit.
              revert - Undo the changes introduced in a commit.
               clone - Clone from a remote data repository.
               fetch - Update the database from a remote data repository.
                pull - Fetch from a dolt remote data repository and merge.
                push - Push to a dolt remote.
              config - Dolt configuration.
              remote - Manage set of tracked repositories.
              backup - Manage a set of server backups.
               login - Login to a dolt remote host.
               creds - Commands for managing credentials.
                  ls - List tables in the working set.
              schema - Commands for showing and importing table schemas.
               table - Commands for copying, renaming, deleting, and exporting tables.
                 tag - Create, list, delete tags.
               blame - Show what revision and author last modified each row of a table.
         constraints - Commands for handling constraints.
             migrate - Executes a database migration to use the latest Dolt data format.
         read-tables - Fetch table(s) at a specific commit into a new dolt repo
                  gc - Cleans up unreferenced data from the repository.
       filter-branch - Edits the commit history using the provided query.
          merge-base - Find the common ancestor of two commits.
             version - Displays the current Dolt cli version.
                dump - Export all tables in the working set into a file.
                docs - Commands for working with Dolt documents.

That's right, all the git commands your used to like checkout, diff, and merge are all implemented on top of SQL tables instead of files. Dolt really is Git for Data.

Configure Dolt

After installing Dolt, the first thing you must do is set the user.name and user.email config variables. This information will be used to attribute each Dolt commit to you. Defining the Git equivalent variables is also required by Git.

$ dolt config --global --add user.name "Tim Sehn"
$ dolt config --global --add user.email "tim@dolthub.com"

After running these commands you can see a file with them in your ~/.dolt directory.

$ ls ~/.dolt/config_global.json 
/Users/timsehn/.dolt/config_global.json

Navigate to the directory where you would like your data stored

Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt.

$ cd ~
$ mkdir dolt
$ cd dolt

Initialize a database

Like Git, Dolt relies on directories to store your databases. The directories will have a hidden .dolt directory where your database is stored after you run dolt init. So, let's make a directory called git_for_data that will house our dolt database, cd to it, and run dolt init. The database name will be git_for_data, the same as the directory name.

$ mkdir git_for_data
$ cd git_for_data
$ dolt init
Successfully initialized dolt data repository.

You now have a fresh Dolt database. It has a single entry in dolt log.

$ dolt log
commit f06jtfp6fqaak6dkm0olmv175atkbhl3 (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Wed Jan 18 17:02:38 -0800 2023

        Initialize data repository

Make a table

Git versions files. Dolt versions tables.

In Git, you usually use a text editor to make files. In Dolt, there a few ways to make tables. You can import a file, like a CSV. You can run SQL offline via the command line. Or you can start a SQL server and run SQL online. I'll walk through examples of each in this document as we go.

Let's make our table initially from a CSV. Dolt supports creating tables via the dolt table import command. In Dolt, tables have schema and data. With dolt table import, Dolt automatically infers the schema from the data, making it easier to version CSVs without having to worry about types.

Here's our CSV file. We're going to use a simple list of employees here at DoltHub.

$ cat employees.csv  
id,first_name,last_name
0,Tim,Sehn
1,Brian,Hendriks
2,Aaron,Son

dolt table import is fairly simple. You pass in a table name and the file path as well as the --create-table, --replace-table or --update-table option. We're going to pass in --create-table because we're making a new table.

We're going to pass in the id column as a primary key as well. Primary keys in Dolt make for better diffs. Dolt can identify rows across versions by Primary Key. I'm trying to limit the database talk here being "Getting Started: Git for Data" and all but I'll need to introduce a couple other database concepts as well. Dolt is like Git and MySQL had a baby.

$ dolt table import --create-table --pk id employees employees.csv
Rows Processed: 3, Additions: 3, Modifications: 0, Had No Effect: 0
Import completed successfully.

We can make sure it's there using the dolt status command. Dolt has a staging area just like Git so right now it is in the working set but not staged.

$ dolt status
On branch main
Untracked files:
  (use "dolt add <table>" to include in what will be committed)
	new table:      employees

We can inspect the table using SQL on the command line. Dolt allows you to run queries from the command line using dolt sql -q. This is often more convenient, especially in the Git for Data use case, than starting a server and opening a MySQL client. Dolt supports the MySQL flavor of SQL.

$ dolt sql -q "show tables" 
+------------------------+
| Tables_in_git_for_data |
+------------------------+
| employees              |
+------------------------+

$ dolt sql -q "describe employees"
+------------+----------------+------+-----+---------+-------+
| Field      | Type           | Null | Key | Default | Extra |
+------------+----------------+------+-----+---------+-------+
| id         | int            | NO   | PRI | NULL    |       |
| first_name | varchar(16383) | YES  |     | NULL    |       |
| last_name  | varchar(16383) | YES  |     | NULL    |       |
+------------+----------------+------+-----+---------+-------+

$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Tim        | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
+----+------------+-----------+

Make a Dolt commit

Everything looks good so it's time to add and commit our new employees table. This is just like adding and committing a new file in Git. Tables start off untracked so you must explicitly add them, just like new files in Git.

$ dolt add employees
$ dolt status
On branch main
Changes to be committed:
  (use "dolt reset <table>..." to unstage)
	new table:      employees
$ dolt commit -m "Added new employees table containing the founders of DoltHub"
commit aq86v87h1g05i5cdht6v6tptp70eibms (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 14:56:13 -0800 2023

        Added new employees table containing the founders of DoltHub

$ dolt status
On branch main
nothing to commit, working tree clean
$ dolt log
commit aq86v87h1g05i5cdht6v6tptp70eibms (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 14:56:13 -0800 2023

        Added new employees table containing the founders of DoltHub

commit f06jtfp6fqaak6dkm0olmv175atkbhl3 
Author: timsehn <tim@dolthub.com>
Date:  Wed Jan 18 17:02:38 -0800 2023

        Initialize data repository

And inspecting the log it looks like we're good! As you can see, Dolt takes "Git for Data" very literally.

Examine a diff

Now, I want to add an employee and change my name from "Tim" to "Timothy", you know, to be professional. I'm going to do that through the command line SQL interface and show you the diff.

$ dolt sql -q "insert into employees values (3, 'Daylon', 'Wilkins')"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "update employees set first_name='Timothy' where last_name like 'S%'" 
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0
$ dolt diff
diff --dolt a/employees b/employees
--- a/employees @ m3qr6lhb8ad6fc5puvsaiv5ladajfi9r
+++ b/employees @ uvrbmnv52n2m25gpmom92qf4723bn9og
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| < | 0  | Tim        | Sehn      |
| > | 0  | Timothy    | Sehn      |
| < | 2  | Aaron      | Son       |
| > | 2  | Timothy    | Son       |
| + | 3  | Daylon     | Wilkins   |
+---+----+------------+-----------+

That's not right! Diffs in Dolt are a powerful way to ensure you changed exactly what you thought you've changed, ensuring data quality.

Oh no! I made a mistake.

Just like with Git, In Dolt I can roll back a number of ways. I can checkout the table or reset --hard. Let's checkout the table.

$ dolt checkout employees
$ dolt diff 
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Tim        | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
+----+------------+-----------+

Now, I'll re-run the correct queries and check the diff tyo make sure I did it right this time.

$ dolt sql -q "insert into employees values (3, 'Daylon', 'Wilkins')"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "update employees set first_name='Timothy' where first_name='Tim'"
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
$ dolt diff                                                          
diff --dolt a/employees b/employees
--- a/employees @ m3qr6lhb8ad6fc5puvsaiv5ladajfi9r
+++ b/employees @ 72aq85jbhr83v4gmh73v550gupk4mr3k
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| < | 0  | Tim        | Sehn      |
| > | 0  | Timothy    | Sehn      |
| + | 3  | Daylon     | Wilkins   |
+---+----+------------+-----------+

Looks like I got it right this time. I'll make a commit.

$ dolt commit -am "Added Daylon. Make Tim Timothy."
commit envoh3j93s47idjmrn16r2tka3ap8s0d (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 16:55:14 -0800 2023

        Added Daylon. Make Tim Timothy.

Create a branch

Dolt is also a drop in replacement for MySQL. So, if you like working in a SQL Workbench like TablePlus or Datagrip instead of the command line, I will show you how now. This is the closest you will get to using something like Visual Studio Code with Git.

In your terminal, run:

$ dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"|S="/tmp/mysql.sock"

Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there.

Now we can connect with TablePlus. Download and open TablePlus. Click "Create a new connection...". Select MySQL and click "Create". You'll be granted with a set of options. Fill it i n like so.

Click connect and you'll be presented with a familiar database workbench GUI.

Now we want to make some changes on a branch. You can so this by running the following SQL.

call dolt_checkout('-b','modifications');
insert INTO employees values (5,'Taylor', 'Bantle');
call dolt_commit('-am', 'Modifications on a branch');

Notice how the Git command line is implemented as SQL stored procedures. Write operations like checkout and commit are implemented as stored procedures and read operations like diff and log are implemented as system tables.

In TablePlus, you click SQL, enter the SQL and the "Run Current" which should generate something that looks the following output.

Alright, now that we've shown you that you can work in server mode, let's get back to the command line like true Gits. Hit Ctrl-C on the server terminal to kill the server. You'll notice we have two branches:

$ dolt branch
* main                                          	
  modifications

Let's checkout the branch and see that Taylor is on it.

$ dolt checkout modifications
Switched to branch 'modifications'
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Timothy    | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
| 3  | Daylon     | Wilkins   |
| 5  | Taylor     | Bantle    |
+----+------------+-----------+

$ dolt diff main
diff --dolt a/employees b/employees
--- a/employees @ 72aq85jbhr83v4gmh73v550gupk4mr3k
+++ b/employees @ pacpigp52ubvo5gcrl29h61310kt9p3s
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| + | 5  | Taylor     | Bantle    |
+---+----+------------+-----------+

Branches work the exact same way as Git. Make a branch so that your changes don't effect other people.

Merge to Main

Finally, let's merge it all to main and delete our branch.

$ dolt checkout main
Switched to branch 'main'
$ dolt merge modifications
Updating envoh3j93s47idjmrn16r2tka3ap8s0d..74m09obaaae0am5n7iucupt2od1lhi4v
Fast-forward
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Timothy    | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
| 3  | Daylon     | Wilkins   |
| 5  | Taylor     | Bantle    |
+----+------------+-----------+
$ dolt branch -d modifications
$ dolt branch
* main

I got a fast-forward merge, just like Git, since there were no other changes on main.

Conclusion

As you can see, Dolt is Git For Data. The Dolt command line works exactly like the Git command line except the versioning target is tables instead of files.

Use Cases

We built Dolt as a better way to share data. Along the way, customers wanted an OLTP SQL database with Git features, so that is what Dolt became. Dolt is still a great way to share data but it's also a great SQL database.

Anything you can build with MySQL or Postgres you can build with Dolt.

Dolt really shines when your database can benefit from branches, merges, diffs, or clones. We've written about customers who use Dolt to build better cancer cell simulations, power an application with branches, or add a versioning layer to important spreadsheets. These are just the customers who allowed us to write about their use case.

Other customers use Dolt to manage video game configuration, get an immutable audit log of changes to their database, build reproducibility into machine learning models, ensure data quality using a pull request workflow, and much more.

Data Sharing
Data and Model Quality Control
Manual Data Curation
Version Control for your Application
Versioned MySQL Replica
Audit
Configuration Management
Offline First

Data Sharing

Problem

Do you share data with customers?
Do they ask you what changed between versions you share?
Do they want to actively switch versions instead of having data change out from under them?
Or, are customers or vendors sharing data with you?
Are you having trouble maintaining quality of scraped data?
When new data is shared or scraped, do downstream systems break?
Would you like to see exactly what changed between data versions?
Do you want to add automated testing to data shared with you?
Would you like to instantly rollback to the previous version if tests fail?

Dolt solves this by…

Dolt was built for sharing. The Git model of code sharing has scaled to thousands of contributors for open source software. We believe the same model can work for data.

Dolt is the world's first version controlled SQL database. Git-style version control allows for decentralized, asynchronous collaboration. Every person gets their own copy of the database to read and write. DoltHub allows you to coordinate collaboration over the internet with permissions, human review, forks and all the other distributed collaboration tools you are used to from GitHub.

Dolt and DoltHub is the best way to share data with customers. Use versions to satisfy both slow and fast upgrading consumers. Let your customers help make your data better. Versions offer better debugging information. Version X works but version Y doesn't. Your customers can even make changes and submit data patches for your review, much like open source.

Dolt and DoltHub are also great if vendors share data with you. When you receive data from a vendor, import the data into Dolt. Examine the diff, either with the human eye or programmatically, before putting the data into production. You can now build integration tests for vendor data. If there's a problem, never merge the import branch into main or roll the change back if a bug was discovered in production. Use the problematic diff to debug with your vendor. The same tools you have for software dependencies, you now have for data dependencies.

Dolt replaces...

Exchanging Files

Dolt replaces exchanging flat data files like CSVs via email, FTP servers, or other file transfer techniques. Dolt allows data to maintain schema on exchange including constraints, triggers, and views. This more rich format of exchange reduces transfer errors. Dolt also allows you to change the data to fit your needs and still get updates from your source. Dolt will notify you if your changes conflict with the source.

External APIs

Dolt is ideal for sharing data that does not have an API. But even for data with an API, Dolt is often more convenient. With Dolt, you get all the data and its history. With APIs you often have to assemble the data with multiple API calls. With APIs, the data can change out from under you, whereas with Dolt you can read a version of the data until you are ready to upgrade. DoltHub ships with a SQL API so you can choose the data sharing solution that is right for your use case.

Companies Doing This

Case Studies

Let us know if you would like us to feature your use of Dolt for data sharing here.

Data and Model Quality Control

Problem

Are you in the business of creating data and models?
Do you want to institute human or automated review on data changes for data quality assurance?
Are you worried about model reproducibility?
Do different people or teams want to work on slightly different versions of the data?
Are long running projects hard to pull off because of parallel data changes?
Would data branches help?
Do you want the ability to query or roll back to a previous version of the data instantly?

Dolt solves this by…

Traditional databases were built for a world of transactions and reports. Modern data science tools use data to create models that behave more like software than reports. Models produce user visible outputs and define application behavior. Tuning data to get the right model can be a lot like writing code.

The version control tools we use to build software apply to modern data science. Version control for data did not exist until Dolt, the first and only database you can branch, diff, and merge just like a Git repository.

Modern data science applications require model reproducibility, data quality, and multiple versions of data to perform at their best. Dolt allows for these capabilities directly in your database, in a Git-style version control model most developers understand.

Dolt is used for model reproducibility. If you build a model from a version of the data, make a tag at that commit and refer to that tag in the model metadata. Some of our data and model quality control customers only use Dolt for this simple feature. Dolt shares storage between versions so you can store many more copies of the data using Dolt than say storing copies of the data in S3.

Dolt allows for human or automated review on data changes increasing data quality. If a bad change makes it through review simply roll the data back to a previous version. DoltHub, DoltLab, and the Hosted Dolt Workbench all implement a Pull Request workflow, the standard for human reviewing code changes. Extend that model to your data changes.

Dolt is the only database with branch and merge functionality. Branches allow for long running data projects. Want to add an additional feature to a model but don't want the new feature effecting the production model build? Make branch and run the project on that branch. Occasionally merge production data into that branch so you can stay in touch with changes there. Companies use Dolt branches to increase the number of parallel data projects by an order of magnitude.

Lastly, commits, logs, and diffs can be used for model insights. Did Thursday's model perform better than Tuesday's but had the same model weights? Inspect the data diff to see what changed. Inspect the commit log to see where that new data came from.

Dolt replaces...

Unstructured files in cloud storage

It is common practice to store copies of training data or database backups in cloud storage for model reproducibility. A full copy of the data is stored for every training run. This can become quite expensive and limit the amount of models you can reproduce. Dolt stores only the differences between stored versions decreasing the cost of data storage. Additionally, Dolt can produce diffs between versions of training data producing novel model insights.

MySQL, Postgres, or other databases

Dolt can replace any database used to store and query data. Many of our customers switch from other OLTP databases like MySQL or Postgres to improve data and model quality through versioning. Customers have also switched to Dolt from document databases like MongoDB. Dolt's additional unique features like branches, diffs, and merges allow for human review of data changes and multiple parallel data projects.

Companies Doing This

Case Studies

Turbine

Manual Data Curation

Problem

Are you using spreadsheets to curate production data?
Is the process of merging and reviewing everyone’s changes getting out of hand?
Are bad data changes causing production issues?
Would human review of cell-level data changes help?

Dolt solves this by…

Dolt allows you to treat your spreadsheet like code. DoltHub and DoltLab implement a Pull Request workflow on tables, the standard for reviewing code changes. Extend that model to your data changes. Make changes on branches and then have the changes human reviewed. Data diffs are easily consumed by a human reviewer. Add continuous integration tests to data changes. Have dozens or hundreds of changes in flight at one time.

DoltHub and DoltLab support SQL, File Upload (CSV), and a spreadsheet editor for data modification. These interfaces are simple enough that non-technical users can make and review data changes.

Dolt is a MySQL compatible database so exporting the manually created data to production can be as simple as cloning a copy and starting a server for your developers to connect to.

Dolt replaces...

Spreadsheets

Dolt replaces Excel or Google Sheets for manual data curation. Versioning features allow for more efficient asynchronous collaboration and human review of data changes. The DoltHub interface is still easy enough for non-technical users to contribute and review data changes.

Companies Doing This

Case Studies

Aktify

Version Control for your Application

Problem

Do your customers want branches and merges in your application?
Do your customers want to review changes in your application before they go live?
Do you want to add a pull request workflow to your application?
Do you want to expose audit log functionality in your application?
Do you want to expose rollback functionality in your application?

Dolt solves this by…

Dolt replaces

Soft Deletes

Slowly Changing Dimension

Companies Doing This

Case Studies

Versioned MySQL Replica

Problem

Is your production MySQL vulnerable to data loss?
If an operator runs a bad query, script, or deployment can your production MySQL can be down for hours or days as you recover data from backups or logs?
Are you worried your backups aren't working?
Does internal audit want an immutable log of what changes on your MySQL instance?
Do you want the ability to copy and sync your production MySQL database for analytics, development, or debugging?

Dolt solves this by…

Because Dolt is MySQL-compatible, you can set Dolt up as a versioned replica of your MySQL primary. Every transaction commit on your primary becomes a Dolt commit on the Dolt replica.

On your Dolt replica, you get a full, immutable, queryable audit log of every cell in your database. If an auditor wants guarantees that a cell in your database has not been modified, you can use Dolt to prove it. Diffs can be produced for every transaction.

If an operator makes a bad query, runs a bad script, or makes a bad deployment, you have an additional tool beyond backups and logs to restore production data. Find the bad transactions using Dolt's audit capabilities. Rollback the bad individual transactions. Produce a SQL patch and apply that back to your primary. If there are conflicting writes, Dolt will surface those for you and you can decide how to proceed. A Dolt replica becomes an essential part of your disaster recovery plan, shortening some outages by hours or days or recovering lost production data.

Moreover, Dolt can be added to your serving path as a read-only MySQL replica, so you know that it is always in sync with your primary. Your disaster recovery instance can serve production traffic so you always know it's working.

Additionally, a Dolt replica can be easily cloned (ie. copied) to a developer's machine for debugging purposes. See a data issue in production? Debug locally on your laptop safely.

Dolt replaces...

Backups and Transaction Logs

Dolt as a versioned replica becomes your first line of defense against a bad operator query, script, or deployment. Dolt is online and contains the full history of your database. In a disaster you can use diffs to find a bad query and roll it back. Then you can produce a database patch and apply it to production. You do not need to reinstall from a backup and play the transaction log back to the point of the failure, an extremely time consuming process.

Change Data Capture

Change Data Capture is a way to add a history of data changes to an existing database. Modern change data capture tools consume replication logs to produce database changes in a consumable stream. Dolt can consume the same logs producing a simpler change data capture solution.

Companies Doing this

NoCD

Case Studies

Let us know if you would like us to feature your use of Dolt as a versioned MySQL replica here.

Audit

Problem

Do you need to know who changed what, when, why in your SQL database?
Do you want an immutable record of changes going back to the inception of your database?
Is an audit team asking for this information for compliance purposes?
Do you want to be able to query this audit log like any other table in your database?
Do you want the data to be efficiently stored so you can trace changes back to inception?

Dolt solves this by…

Dolt replaces...

Soft Deletes

Change Data Capture

Moreover, if Dolt is your production database, there is no need for an additional change data capture system. The audit capability is a built-in feature of the production Dolt database.

Companies Doing This

Case Studies

Let us know if you would like us to feature your use of Dolt for audit here.

Configuration Management

Problem

Is your configuration too big and complex for files?
Is your configuration more like code than configuration?
Does configuration have a large production impact?
Are configuration changes hard to review?
Are multiple configuration changes hard to merge together when it’s time to ship?
Are you building a game with lots of assets and configuration?

Dolt solves this by…

Configuration is generally structured and managed as large text files. YAML and JSON formatted configuration is very popular. These formats are unordered, meaning standard version control solutions like Git cannot reliably produce diffs and merges. Moreover, configuration can get quite large, running up against the file size limits of tools like Git.

Some configuration is better modeled as tables. Tables by design are unordered. Tables can contain even JSON columns for parts of your configuration you want to remain loosely typed.

Dolt is an ideal solution for version controlling tabular configuration. Dolt allows for all the version control features you came to know and love when your data was small like branches, diffs, and human review via pull requests.

This use case is particularly popular in video games where much of the game functionality is modeled as configuration. Store the likelihood of an item drop or the strength of a particular enemy in Dolt tables. Review and manage changes. When the configuration is ready, use a build process to create whatever format your game needs.

Dolt replaces...

Files in Git

Most large configuration files are stored and versioned in Git. If the files get too large they are store in cloud storage and linked to Git using git-lfs. If the files are stored in git-lfs, you lose the ability to diff the contents of the files. Dolt improves the experience by adding query capabilities and large fine-grained diffs to the data stored in configuration files. The diff and merge experience will be greatly improved in Dolt for this type of data.

Companies Doing This

Case Studies

Let us know if you would like us to feature your use of Dolt for configuration management here.

Offline First

Problem

Are you expecting your application to make writes locally while offline?
Do these writes need to be synced to a central server or other nodes?
How are you going to detect conflicting writes?
What are you going to do if you detect them?
Would the Git model of clone, push, and pull on your data help?

Dolt solves this by…

Dolt brings Git-style decentralization to the SQL database. Just like Git is ideal in no connectivity environments when dealing with files, Dolt is ideal in low connectivity environments when dealing with tables. Most large scale data is stored in tables.

With Dolt you write to the database disconnected. You can have a fully functioning offline application that uses the exact same software and models it would use if it were a standard centralized SQL database.

When it is safe to connect to the internet, Dolt computes the difference between what you have and what a peer database has and only sends these differences both ways. This synchronization process is very efficient, effectively allowing you to get the most information possible in and out in the shortest amount of time. Once the synchronization is complete, go back to disconnected. You and the peer now share a synchronized view with complete, auditable edit history.

Conflicting writes are surfaced quickly and an operator or software can take additional action to resolve.

Dolt replaces

Custom syncing processes

Companies Doing This

Be the first

Case Studies

Let us know if you would like us to feature your use of Dolt for data sharing here.

Concepts

Dolt

Dolt brings the features of Git-style distributed version control to the SQL database.

Git-style Distributed Version Control allowed the world to collaborate on open source software in a beautiful way. Dolt aspires to bring that distributed collaboration model to data.

SQL is the worldwide standard for data description and querying. SQL has been popular for 50 years. By combining schema and data, SQL gives data a powerful language for data practitioners to communicate with.

Before Dolt, to share a SQL database with a fellow data practitioner, you both needed to share the same view of the data. Only one write could happen at a time. Making a copy implied creating a point in time backup and restoring on a separate running server. Once that copy was made, the two databases could change independently. There was no tractable way to compare the two copies of the database to see what changed. Moreover, there was no easy way to merge the two copies back together. In source code parlance, the copy was a hard fork of the database.

The inability to copy and merge forced databases into a specific model of usage. Data was hard to move and share. As an industry, we built complicated pipelines to move and transform data between databases. We built APIs to allow programmatic, controlled access to data.

Here at DoltHub, we looked at all these systems and thought there must be a better way. What if you could copy a database, make changes, compare the database to any other copy, and merge the changes whenever you wanted? What if thousands of people could do this at the same time? What if you could use Git workflows on databases?

A database with these properties would allow thousands of users to read and write at the same time. If someone made a mistake, no big deal, just roll back the change. Need a copy of the data to run a metrics job on? No problem, just make a clone. Bug in production? Create a copy of the database on your laptop, start your services, change the production data to speed debugging. Want to open your data up to the world? Push it up to a remote that's accessible via the internet.

Concepts

In order to achieve the above mission, Dolt needed to implement Git concepts in a SQL database. As best we could, we tried to keep things as similar as possible.

We built Dolt using the following axioms:

Git versions files. Dolt versions table schema and table data.
Dolt will copy the Git command line exactly.
Dolt will be MySQL compatible.
Git features in SQL will extend MySQL SQL. Write operations will be procedures. Read operations will be system tables.

Git

Dolt implements Git-style version control on tables instead of files.

On the command-line, these concepts are exposed as a replica of the Git command line. Where you would type git log, you now type dolt log. Where you would type git add, you type dolt add. The replication extends to the command arguments.

In this section we explore the following Git concepts and explain how they work in Dolt:

Log

What is a log?

The Dolt log is a way to visualize the Dolt commit graph in an intuitive way. When viewing the log, you are seeing a topologically sorted commit order that led to the commit you have checked out. The log is an audit trail of commits.

In Dolt, you can visualize the log of a database, a table, a row, or even a cell.

Log is usually filtered by branch. Any commits not reachable in the graph from the current commit will be omitted from the log.

How to use logs

Logs are useful in reverting the database to a previous state. You determine the state of the database you want via log and then use other Dolt commands to change the database to a different state.

Logs are useful when trying to track down why the database is in a particular state. You use log to find the commits in question and usually follow up with diffs (i.e. differences) between two commits you found in the log.

Logs are useful in audit. If you would like to ensure a particular value in the database has not changed since the last time you read it, log is useful in verifying this.

Difference between Git log and Dolt log

Conceptually and practically log on the command line is very similar between Git and Dolt. A table is akin to a file in Git.

Dolt has additional log functionality beyond Git. You can produce a log of any cell (i.e. row, column pair) in the database using a SQL query against the dolt_history_<tablename> system table.

Example

Commit Log

Cell History

SQL Reference

CLI Reference

Architecture

Version Controlled Database

Dolt is a MySQL compatible database server.

Navigate to the directory where you would like your data stored

Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt.

% cd ~
% mkdir dolt
% cd dolt

Any databases you create will be stored in this directory. So, for this example, a directory named getting_started will be created here later in this walkthrough, after you run create database getting_started; in a SQL shell (see section ). Navigating to ~/dolt/getting_started will then allow you to access this database using the Dolt command line.

Start a MySQL-compatible database server

Dolt ships with a MySQL compatible database server built in. To start it you use the command dolt sql-server. Running this command starts the server on port 3306.

dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"

Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there and open a new terminal.

Connect with any MySQL client

In the new terminal, we will now connect to the running database server using a client.

Let's grab a copy of MySQL so we can connect with that client. Head over to the documentation and install MySQL on your machine. I used to install MySQL on my Mac.

% mysql --version
mysql  Ver 8.0.29 for macos12.2 on x86_64 (Homebrew)

% mysql --host 127.0.0.1 --port 3306 -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.9-Vitess

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

To ensure the client actually connected, you should see the following in the dolt sql-server terminal

2022-06-06T13:26:55-07:00 INFO [conn 2] NewConnection {DisableClientMultiStatements=false}

As you can see, Dolt supports any MySQL-compatible client.

Create a schema

mysql> create database getting_started;
Query OK, 1 row affected (0.04 sec)

mysql> use getting_started;
Database changed
mysql> create table employees (
    id int,
    last_name varchar(255),
    first_name varchar(255),
    primary key(id));
Query OK, 0 rows affected (0.01 sec)

mysql> create table teams (
    id int,
    team_name varchar(255),
    primary key(id));
Query OK, 0 rows affected (0.00 sec)

mysql> create table employees_teams(
    team_id int,
    employee_id int,
    primary key(team_id, employee_id),
    foreign key (team_id) references teams(id),
    foreign key (employee_id) references employees(id));
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+---------------------------+
| Tables_in_getting_started |
+---------------------------+
| employees                 |
| employees_teams           |
| teams                     |
+---------------------------+
3 rows in set (0.00 sec)

Dolt supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich SQL database.

Make a Dolt commit

It's time to use your first Dolt feature. We're going to make a Dolt . A Dolt commit allows you to time travel and see lineage. Make a Dolt commit whenever you want to restore or compare to this point in time.

Dolt exposes version control functionality through a Git-style interface. On the command line, Dolt commands map exactly to their Git equivalent with the targets being tables instead of files. In SQL, Dolt exposes version control read operations as and version control write operations as .

So, we add and commit our new schema like so.

mysql> call dolt_add('teams', 'employees', 'employees_teams');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.03 sec)

mysql> call dolt_commit('-m', 'Created initial schema');
+----------------------------------+
| hash                             |
+----------------------------------+
| ne182jemgrlm8jnjmoubfqsstlfi1s98 |
+----------------------------------+
1 row in set (0.02 sec)

mysql> select * from dolt_log;
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
| commit_hash                      | committer | email           | date                    | message                    |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema     |
| vluuhvd0bn59598utedt77ed9q5okbcb | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:33:59.531 | Initialize data repository |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
2 rows in set (0.01 sec)

There you have it. Your schema is created and you have a Dolt commit tracking the creation, as seen in the dolt_log system table.

Note, a Dolt commit is different than a standard SQL transaction COMMIT. In this case, I am running the database with on, so each SQL statement is automatically generating a transaction COMMIT. If you want a system to generate a Dolt commit for every transaction use the system variable,.

Insert some data

mysql> insert into employees values
    (0, 'Sehn', 'Tim'),
    (1, 'Hendriks', 'Brian'),
    (2, 'Son','Aaron'),
    (3, 'Fitzgerald', 'Brian');
Query OK, 4 rows affected (0.01 sec)

mysql> select * from employees where first_name='Brian';
+------+------------+------------+
| id   | last_name  | first_name |
+------+------------+------------+
|    1 | Hendriks   | Brian      |
|    3 | Fitzgerald | Brian      |
+------+------------+------------+
2 rows in set (0.00 sec)

mysql> insert into teams values
    (0, 'Engineering'),
    (1, 'Sales');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into employees_teams values
    (0,0),
    (1,0),
    (2,0),
    (0,1),
    (3,1);
ERROR 1452 (HY000): cannot add or update a child row - Foreign key violation on fk: `rv9ek7ft`, table: `employees_teams`, referenced table: `teams`, key: `[2]`

mysql> insert into employees_teams(employee_id, team_id) values
    (0,0),
    (1,0),
    (2,0),
    (0,1),
    (3,1);
Query OK, 5 rows affected (0.01 sec)

mysql> select first_name, last_name, team_name from employees
    join employees_teams on (employees.id=employees_teams.employee_id)
    join teams on (teams.id=employees_teams.team_id)
    where team_name='Engineering';
+------------+-----------+-------------+
| first_name | last_name | team_name   |
+------------+-----------+-------------+
| Tim        | Sehn      | Engineering |
| Brian      | Hendriks  | Engineering |
| Aaron      | Son       | Engineering |
+------------+-----------+-------------+
3 rows in set (0.00 sec)

Examine the diff

Now, what if you want to see what changed in your working set before you make a commit? You use the dolt_status and dolt_diff_<tablename> system tables.

mysql> select * from dolt_status;
+-----------------+--------+----------+
| table_name      | staged | status   |
+-----------------+--------+----------+
| teams           |      0 | modified |
| employees       |      0 | modified |
| employees_teams |      0 | modified |
+-----------------+--------+----------+
3 rows in set (0.01 sec)

mysql> select * from dolt_diff_employees;
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+
| to_last_name | to_first_name | to_id | to_commit | to_commit_date | from_last_name | from_first_name | from_id | from_commit                      | from_commit_date        | diff_type |
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+
| Sehn         | Tim           |     0 | WORKING   | NULL           | NULL           | NULL            |    NULL | ne182jemgrlm8jnjmoubfqsstlfi1s98 | 2022-06-07 16:35:49.277 | added     |
| Hendriks     | Brian         |     1 | WORKING   | NULL           | NULL           | NULL            |    NULL | ne182jemgrlm8jnjmoubfqsstlfi1s98 | 2022-06-07 16:35:49.277 | added     |
| Son          | Aaron         |     2 | WORKING   | NULL           | NULL           | NULL            |    NULL | ne182jemgrlm8jnjmoubfqsstlfi1s98 | 2022-06-07 16:35:49.277 | added     |
| Fitzgerald   | Brian         |     3 | WORKING   | NULL           | NULL           | NULL            |    NULL | ne182jemgrlm8jnjmoubfqsstlfi1s98 | 2022-06-07 16:35:49.277 | added     |
+--------------+---------------+-------+-----------+----------------+----------------+-----------------+---------+----------------------------------+-------------------------+-----------+
4 rows in set (0.00 sec)

As you can see from the diff I've added the correct values to the employees table. The values were previously NULL and now they are populated.

Let's finish off with another Dolt commit this time adding all modified tables using -am.

mysql> call dolt_commit('-am', 'Populated tables with data');
+----------------------------------+
| hash                             |
+----------------------------------+
| 13qfqa5rojq18j84d1n2htjkm6fletg4 |
+----------------------------------+
1 row in set (0.02 sec)

mysql> select * from dolt_log;
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
| commit_hash                      | committer | email           | date                    | message                    |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:39:32.066 | Populated tables with data |
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema     |
| vluuhvd0bn59598utedt77ed9q5okbcb | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:33:59.531 | Initialize data repository |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------+
3 rows in set (0.00 sec)

mysql> select * from dolt_diff;
+----------------------------------+-----------------+-----------+-----------------+-------------------------+----------------------------+-------------+---------------+
| commit_hash                      | table_name      | committer | email           | date                    | message                    | data_change | schema_change |
+----------------------------------+-----------------+-----------+-----------------+-------------------------+----------------------------+-------------+---------------+
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | teams           | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:39:32.066 | Populated tables with data |           1 |             0 |
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | employees       | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:39:32.066 | Populated tables with data |           1 |             0 |
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | employees_teams | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:39:32.066 | Populated tables with data |           1 |             0 |
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | employees       | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema     |           0 |             1 |
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | employees_teams | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema     |           0 |             1 |
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | teams           | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema     |           0 |             1 |
+----------------------------------+-----------------+-----------+-----------------+-------------------------+----------------------------+-------------+---------------+
6 rows in set (0.00 sec)

Oh no! I made a mistake.

Dolt supports undoing changes via call dolt_reset(). Let's imagine I accidentally drop a table.

mysql> drop table employees_teams;
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+---------------------------+
| Tables_in_getting_started |
+---------------------------+
| employees                 |
| teams                     |
+---------------------------+
2 rows in set (0.00 sec)

In a traditional database, this could be disastrous. In Dolt, you're one command away from getting your table back.

mysql> call dolt_reset('--hard');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.01 sec)

mysql> show tables;
+---------------------------+
| Tables_in_getting_started |
+---------------------------+
| employees                 |
| employees_teams           |
| teams                     |
+---------------------------+
3 rows in set (0.01 sec)

Dolt makes operating databases less error prone. You can always back out changes you have in progress or rewind to a known good state. You also have the ability to undo specific commits using .

Note, undoing changes from a drop database statement requires a special SQL procedure, .

See the data in a SQL Workbench

Hate the command line? Let's use to make some modifications. Tableplus is a free SQL Workbench. Follow the installation instructions from their website.

Now, to connect you must select MySQL as the connection type. Then enter a name for your connection, getting_started as your database, and root as your user.

Click connect and you'll be presented with a familiar database workbench GUI.

Make changes on a branch

To make changes on a branch, I use the dolt_checkout() stored procedure. Using the -b option creates a branch, just like in Git.

Tableplus gives me the ability to enter a multiple line SQL script on the SQL tab. I entered the following SQL to checkout a branch, update, insert, delete, and finally Dolt commit my changes.

call dolt_checkout('-b','modifications');
update employees SET first_name='Timothy' where first_name='Tim';
insert INTO employees (id, first_name, last_name) values (4,'Daylon', 'Wilkins');
insert into employees_teams(team_id, employee_id) values (0,4);
delete from employees_teams where employee_id=0 and team_id=1;
call dolt_commit('-am', 'Modifications on a branch')

Here's the result in Tableplus.

Back in my terminal, I cannot see the table modifications made in Tableplus because they happened on a different branch than the one I have checked out in my session.

mysql> select * from dolt_branches;
+---------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
| name          | hash                             | latest_committer | latest_committer_email | latest_commit_date      | latest_commit_message      |
+---------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
| main          | 13qfqa5rojq18j84d1n2htjkm6fletg4 | Tim Sehn         | tim@dolthub.com        | 2022-06-07 16:39:32.066 | Populated tables with data |
| modifications | uhkv57j4bp2v16vcnmev9lshgkqq8ppb | Tim Sehn         | tim@dolthub.com        | 2022-06-07 16:41:49.847 | Modifications on a branch  |
+---------------+----------------------------------+------------------+------------------------+-------------------------+----------------------------+
2 rows in set (0.00 sec)

mysql> select active_branch();
+-----------------+
| active_branch() |
+-----------------+
| main            |
+-----------------+
1 row in set (0.00 sec)

mysql> select * from employees;
+------+------------+------------+
| id   | last_name  | first_name |
+------+------------+------------+
|    0 | Sehn       | Tim        |
|    1 | Hendriks   | Brian      |
|    2 | Son        | Aaron      |
|    3 | Fitzgerald | Brian      |
+------+------------+------------+
4 rows in set (0.00 sec)

I can query the branch no matter what I have checked out using SQL as of syntax.

mysql> select * from employees as of 'modifications';
+------+------------+------------+
| id   | last_name  | first_name |
+------+------------+------------+
|    0 | Sehn       | Timothy    |
|    1 | Hendriks   | Brian      |
|    2 | Son        | Aaron      |
|    3 | Fitzgerald | Brian      |
|    4 | Wilkins    | Daylon     |
+------+------------+------------+
5 rows in set (0.01 sec)

If I'd like to see the diff between the two branches, I can use the dolt_diff() table function. It takes two branches and the table name as arguments.

mysql> select * from dolt_diff('main', 'modifications', 'employees');
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+
| to_last_name | to_first_name | to_id | to_commit     | to_commit_date          | from_last_name | from_first_name | from_id | from_commit | from_commit_date        | diff_type |
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+
| Sehn         | Timothy       |     0 | modifications | 2022-06-07 16:41:49.847 | Sehn           | Tim             |       0 | main        | 2022-06-07 16:39:32.066 | modified  |
| Wilkins      | Daylon        |     4 | modifications | 2022-06-07 16:41:49.847 | NULL           | NULL            |    NULL | main        | 2022-06-07 16:39:32.066 | added     |
+--------------+---------------+-------+---------------+-------------------------+----------------+-----------------+---------+-------------+-------------------------+-----------+
2 rows in set (0.00 sec)

As you can see, you have the full power of Git-style branches and diffs in a SQL database with Dolt.

Make a schema change on another branch

I can also make schema changes on branches for isolated testing of new schema. I'm going to add a start_date column on a new branch and populate it.

mysql> call dolt_checkout('-b', 'schema_changes');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.01 sec)

mysql> alter table employees add column start_date date;
Query OK, 0 rows affected (0.02 sec)

mysql> update employees set start_date='2018-09-08';
Query OK, 4 rows affected (0.01 sec)
Rows matched: 4  Changed: 4  Warnings: 0

mysql> update employees set start_date='2021-04-19' where last_name='Fitzgerald';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from employees;
+------+------------+------------+------------+
| id   | last_name  | first_name | start_date |
+------+------------+------------+------------+
|    0 | Sehn       | Tim        | 2018-09-08 |
|    1 | Hendriks   | Brian      | 2018-09-08 |
|    2 | Son        | Aaron      | 2018-09-08 |
|    3 | Fitzgerald | Brian      | 2021-04-19 |
+------+------------+------------+------------+
4 rows in set (0.00 sec)

mysql> call dolt_commit('-am', 'Added start_date column to employees');
+----------------------------------+
| hash                             |
+----------------------------------+
| pg3nfi0j1dpc5pf1rfgckpmlteaufdrt |
+----------------------------------+
1 row in set (0.01 sec)

Changing schema on a branch gives you a new method for doing isolated integration testing of new schema changes.

Merge it all together

mysql> call dolt_checkout('main');
+--------+
| status |
+--------+
|      0 |
+--------+
1 row in set (0.01 sec)

mysql> select * from dolt_status;
Empty set (0.00 sec)

mysql> call dolt_merge('schema_changes');
+--------------+
| no_conflicts |
+--------------+
|            1 |
+--------------+
1 row in set (0.01 sec)

mysql> select * from employees;
+------+------------+------------+------------+
| id   | last_name  | first_name | start_date |
+------+------------+------------+------------+
|    0 | Sehn       | Tim        | 2018-09-08 |
|    1 | Hendriks   | Brian      | 2018-09-08 |
|    2 | Son        | Aaron      | 2018-09-08 |
|    3 | Fitzgerald | Brian      | 2021-04-19 |
+------+------------+------------+------------+
4 rows in set (0.00 sec)

Schema change successful. We now have start dates. Data changes are next.

mysql> call dolt_merge('modifications');
+--------------+
| no_conflicts |
+--------------+
|            1 |
+--------------+
1 row in set (0.02 sec)

mysql> select * from employees;
+------+------------+------------+------------+
| id   | last_name  | first_name | start_date |
+------+------------+------------+------------+
|    0 | Sehn       | Timothy    | 2018-09-08 |
|    1 | Hendriks   | Brian      | 2018-09-08 |
|    2 | Son        | Aaron      | 2018-09-08 |
|    3 | Fitzgerald | Brian      | 2021-04-19 |
|    4 | Wilkins    | Daylon     | NULL       |
+------+------------+------------+------------+
5 rows in set (0.00 sec)

Data changes successful as well. As you can see, I am now "Timothy" instead of "Tim", Daylon is added, and we all have start dates except for Daylon who was added on a different branch.

mysql> select first_name, last_name, team_name from employees
    join employees_teams on (employees.id=employees_teams.employee_id)
    join teams on (teams.id=employees_teams.team_id)
    where team_name='Sales';
+------------+------------+-----------+
| first_name | last_name  | team_name |
+------------+------------+-----------+
| Brian      | Fitzgerald | Sales     |
+------------+------------+-----------+
1 row in set (0.01 sec)

I'm also gone from the Sales Team. Engineering is life.

Now, we have a database with all the schema and data changes merged and ready for use.

mysql> select * from dolt_log;
+----------------------------------+-----------+-----------------+-------------------------+----------------------------------------+
| commit_hash                      | committer | email           | date                    | message                                |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------------------+
| vn9b0qcematsj2f6ka0hfoflhr5s6p0b | Tim Sehn  | tim@dolthub.com | 2022-06-07 17:10:02.07  | Merge branch 'modifications' into main |
| pg3nfi0j1dpc5pf1rfgckpmlteaufdrt | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:44:37.513 | Added start_date column to employees   |
| uhkv57j4bp2v16vcnmev9lshgkqq8ppb | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:41:49.847 | Modifications on a branch              |
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:39:32.066 | Populated tables with data             |
| ne182jemgrlm8jnjmoubfqsstlfi1s98 | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:35:49.277 | Created initial schema                 |
| vluuhvd0bn59598utedt77ed9q5okbcb | Tim Sehn  | tim@dolthub.com | 2022-06-07 16:33:59.531 | Initialize data repository             |
+----------------------------------+-----------+-----------------+-------------------------+----------------------------------------+
6 rows in set (0.00 sec)

Audit Cell Lineage

dolt_history_<tablename> shows you the state of the row at every commit.

mysql> select * from dolt_history_employees where id=0 order by commit_date;
+------+-----------+------------+------------+----------------------------------+-----------+-------------------------+
| id   | last_name | first_name | start_date | commit_hash                      | committer | commit_date             |
+------+-----------+------------+------------+----------------------------------+-----------+-------------------------+
|    0 | Sehn      | Tim        | NULL       | 13qfqa5rojq18j84d1n2htjkm6fletg4 | Tim Sehn  | 2022-06-07 16:39:32.066 |
|    0 | Sehn      | Timothy    | NULL       | uhkv57j4bp2v16vcnmev9lshgkqq8ppb | Tim Sehn  | 2022-06-07 16:41:49.847 |
|    0 | Sehn      | Tim        | 2018-09-08 | pg3nfi0j1dpc5pf1rfgckpmlteaufdrt | Tim Sehn  | 2022-06-07 16:44:37.513 |
|    0 | Sehn      | Timothy    | 2018-09-08 | vn9b0qcematsj2f6ka0hfoflhr5s6p0b | Tim Sehn  | 2022-06-07 17:10:02.07  |
+------+-----------+------------+------------+----------------------------------+-----------+-------------------------+
4 rows in set (0.00 sec)

mysql> select to_commit,from_first_name,to_first_name from dolt_diff_employees
    where (from_id=0 or to_id=0) and (from_first_name <> to_first_name or from_first_name is NULL)
    order by to_commit_date;
+----------------------------------+-----------------+---------------+
| to_commit                        | from_first_name | to_first_name |
+----------------------------------+-----------------+---------------+
| 13qfqa5rojq18j84d1n2htjkm6fletg4 | NULL            | Tim           |
| uhkv57j4bp2v16vcnmev9lshgkqq8ppb | Tim             | Timothy       |
| vn9b0qcematsj2f6ka0hfoflhr5s6p0b | Tim             | Timothy       |
+----------------------------------+-----------------+---------------+
3 rows in set (0.01 sec)

Dolt provides powerful data audit capabilities down to individual cells. When, how, and why has each cell in your database changed over time?

Conclusion

Want to dive even deeper? Here are some links to advanced topics:

Git For Data

Dolt is Git for Data. You can use Dolt's command line interface to version control data like you version control files with Git. Git versions files, Dolt versions tables.

Once you have Dolt installed, type dolt and you'll start to feel the git vibes immediately.

$ dolt
Valid commands for dolt are
                init - Create an empty Dolt data repository.
              status - Show the working tree status.
                 add - Add table changes to the list of staged table changes.
                diff - Diff a table.
               reset - Remove table changes from the list of staged table changes.
               clean - Remove untracked tables from working set.
              commit - Record changes to the repository.
                 sql - Run a SQL query against tables in repository.
          sql-server - Start a MySQL-compatible server.
                 log - Show commit logs.
              branch - Create, list, edit, delete branches.
            checkout - Checkout a branch or overwrite a table from HEAD.
               merge - Merge a branch.
           conflicts - Commands for viewing and resolving merge conflicts.
         cherry-pick - Apply the changes introduced by an existing commit.
              revert - Undo the changes introduced in a commit.
               clone - Clone from a remote data repository.
               fetch - Update the database from a remote data repository.
                pull - Fetch from a dolt remote data repository and merge.
                push - Push to a dolt remote.
              config - Dolt configuration.
              remote - Manage set of tracked repositories.
              backup - Manage a set of server backups.
               login - Login to a dolt remote host.
               creds - Commands for managing credentials.
                  ls - List tables in the working set.
              schema - Commands for showing and importing table schemas.
               table - Commands for copying, renaming, deleting, and exporting tables.
                 tag - Create, list, delete tags.
               blame - Show what revision and author last modified each row of a table.
         constraints - Commands for handling constraints.
             migrate - Executes a database migration to use the latest Dolt data format.
         read-tables - Fetch table(s) at a specific commit into a new dolt repo
                  gc - Cleans up unreferenced data from the repository.
       filter-branch - Edits the commit history using the provided query.
          merge-base - Find the common ancestor of two commits.
             version - Displays the current Dolt cli version.
                dump - Export all tables in the working set into a file.
                docs - Commands for working with Dolt documents.

That's right, all the git commands your used to like checkout, diff, and merge are all implemented on top of SQL tables instead of files. Dolt really is Git for Data.

Configure Dolt

$ dolt config --global --add user.name "Tim Sehn"
$ dolt config --global --add user.email "tim@dolthub.com"

After running these commands you can see a file with them in your ~/.dolt directory.

$ ls ~/.dolt/config_global.json 
/Users/timsehn/.dolt/config_global.json

Navigate to the directory where you would like your data stored

Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt.

$ cd ~
$ mkdir dolt
$ cd dolt

Initialize a database

$ mkdir git_for_data
$ cd git_for_data
$ dolt init
Successfully initialized dolt data repository.

You now have a fresh Dolt database. It has a single entry in dolt log.

$ dolt log
commit f06jtfp6fqaak6dkm0olmv175atkbhl3 (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Wed Jan 18 17:02:38 -0800 2023

        Initialize data repository

Make a table

Git versions files. Dolt versions tables.

Here's our CSV file. We're going to use a simple list of employees here at DoltHub.

$ cat employees.csv  
id,first_name,last_name
0,Tim,Sehn
1,Brian,Hendriks
2,Aaron,Son

$ dolt table import --create-table --pk id employees employees.csv
Rows Processed: 3, Additions: 3, Modifications: 0, Had No Effect: 0
Import completed successfully.

We can make sure it's there using the dolt status command. Dolt has a staging area just like Git so right now it is in the working set but not staged.

$ dolt status
On branch main
Untracked files:
  (use "dolt add <table>" to include in what will be committed)
	new table:      employees

$ dolt sql -q "show tables" 
+------------------------+
| Tables_in_git_for_data |
+------------------------+
| employees              |
+------------------------+

$ dolt sql -q "describe employees"
+------------+----------------+------+-----+---------+-------+
| Field      | Type           | Null | Key | Default | Extra |
+------------+----------------+------+-----+---------+-------+
| id         | int            | NO   | PRI | NULL    |       |
| first_name | varchar(16383) | YES  |     | NULL    |       |
| last_name  | varchar(16383) | YES  |     | NULL    |       |
+------------+----------------+------+-----+---------+-------+

$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Tim        | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
+----+------------+-----------+

Make a Dolt commit

$ dolt add employees
$ dolt status
On branch main
Changes to be committed:
  (use "dolt reset <table>..." to unstage)
	new table:      employees
$ dolt commit -m "Added new employees table containing the founders of DoltHub"
commit aq86v87h1g05i5cdht6v6tptp70eibms (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 14:56:13 -0800 2023

        Added new employees table containing the founders of DoltHub

$ dolt status
On branch main
nothing to commit, working tree clean
$ dolt log
commit aq86v87h1g05i5cdht6v6tptp70eibms (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 14:56:13 -0800 2023

        Added new employees table containing the founders of DoltHub

commit f06jtfp6fqaak6dkm0olmv175atkbhl3 
Author: timsehn <tim@dolthub.com>
Date:  Wed Jan 18 17:02:38 -0800 2023

        Initialize data repository

And inspecting the log it looks like we're good! As you can see, Dolt takes "Git for Data" very literally.

Examine a diff

Now, I want to add an employee and change my name from "Tim" to "Timothy", you know, to be professional. I'm going to do that through the command line SQL interface and show you the diff.

$ dolt sql -q "insert into employees values (3, 'Daylon', 'Wilkins')"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "update employees set first_name='Timothy' where last_name like 'S%'" 
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0
$ dolt diff
diff --dolt a/employees b/employees
--- a/employees @ m3qr6lhb8ad6fc5puvsaiv5ladajfi9r
+++ b/employees @ uvrbmnv52n2m25gpmom92qf4723bn9og
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| < | 0  | Tim        | Sehn      |
| > | 0  | Timothy    | Sehn      |
| < | 2  | Aaron      | Son       |
| > | 2  | Timothy    | Son       |
| + | 3  | Daylon     | Wilkins   |
+---+----+------------+-----------+

That's not right! Diffs in Dolt are a powerful way to ensure you changed exactly what you thought you've changed, ensuring data quality.

Oh no! I made a mistake.

Just like with Git, In Dolt I can roll back a number of ways. I can checkout the table or reset --hard. Let's checkout the table.

$ dolt checkout employees
$ dolt diff 
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Tim        | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
+----+------------+-----------+

Now, I'll re-run the correct queries and check the diff tyo make sure I did it right this time.

$ dolt sql -q "insert into employees values (3, 'Daylon', 'Wilkins')"
Query OK, 1 row affected (0.00 sec)
$ dolt sql -q "update employees set first_name='Timothy' where first_name='Tim'"
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
$ dolt diff                                                          
diff --dolt a/employees b/employees
--- a/employees @ m3qr6lhb8ad6fc5puvsaiv5ladajfi9r
+++ b/employees @ 72aq85jbhr83v4gmh73v550gupk4mr3k
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| < | 0  | Tim        | Sehn      |
| > | 0  | Timothy    | Sehn      |
| + | 3  | Daylon     | Wilkins   |
+---+----+------------+-----------+

Looks like I got it right this time. I'll make a commit.

$ dolt commit -am "Added Daylon. Make Tim Timothy."
commit envoh3j93s47idjmrn16r2tka3ap8s0d (HEAD -> main) 
Author: timsehn <tim@dolthub.com>
Date:  Thu Jan 19 16:55:14 -0800 2023

        Added Daylon. Make Tim Timothy.

Create a branch

In your terminal, run:

$ dolt sql-server
Starting server with Config HP="localhost:3306"|T="28800000"|R="false"|L="info"|S="/tmp/mysql.sock"

Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there.

Now we can connect with TablePlus. Download and open TablePlus. Click "Create a new connection...". Select MySQL and click "Create". You'll be granted with a set of options. Fill it i n like so.

Click connect and you'll be presented with a familiar database workbench GUI.

Now we want to make some changes on a branch. You can so this by running the following SQL.

call dolt_checkout('-b','modifications');
insert INTO employees values (5,'Taylor', 'Bantle');
call dolt_commit('-am', 'Modifications on a branch');

In TablePlus, you click SQL, enter the SQL and the "Run Current" which should generate something that looks the following output.

$ dolt branch
* main                                          	
  modifications

Let's checkout the branch and see that Taylor is on it.

$ dolt checkout modifications
Switched to branch 'modifications'
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Timothy    | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
| 3  | Daylon     | Wilkins   |
| 5  | Taylor     | Bantle    |
+----+------------+-----------+

$ dolt diff main
diff --dolt a/employees b/employees
--- a/employees @ 72aq85jbhr83v4gmh73v550gupk4mr3k
+++ b/employees @ pacpigp52ubvo5gcrl29h61310kt9p3s
+---+----+------------+-----------+
|   | id | first_name | last_name |
+---+----+------------+-----------+
| + | 5  | Taylor     | Bantle    |
+---+----+------------+-----------+

Branches work the exact same way as Git. Make a branch so that your changes don't effect other people.

Merge to Main

Finally, let's merge it all to main and delete our branch.

$ dolt checkout main
Switched to branch 'main'
$ dolt merge modifications
Updating envoh3j93s47idjmrn16r2tka3ap8s0d..74m09obaaae0am5n7iucupt2od1lhi4v
Fast-forward
$ dolt sql -q "select * from employees"
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 0  | Timothy    | Sehn      |
| 1  | Brian      | Hendriks  |
| 2  | Aaron      | Son       |
| 3  | Daylon     | Wilkins   |
| 5  | Taylor     | Bantle    |
+----+------------+-----------+
$ dolt branch -d modifications
$ dolt branch
* main

I got a fast-forward merge, just like Git, since there were no other changes on main.

Conclusion

As you can see, Dolt is Git For Data. The Dolt command line works exactly like the Git command line except the versioning target is tables instead of files.

Dolt

Introduction

What Is Dolt?

Version Controlled Database

Git for Data

Versioned MySQL Replica

Installation

Linux

Windows

winget

Chocolatey

Scoop

MSI Files

.zip Archive

Mac

Install Script

Homebrew

MacPorts

Build from Source

Application Server

Installation

Configuration

Database creation

Start the server

Users and passwords

Other Linux distributions

Docker

Docker Image for Dolt CLI

Docker Image for Dolt SQL-Server

Connect to the server in the container from the host system

Define configuration for the server

Let's look at an example

Server liveness and readiness checks

Upgrading

Getting Started

1. Version Controlled Database

2. Git for Data

3. Versioned MySQL Replica

Version Controlled Database

Navigate to the directory where you would like your data stored

Start a MySQL-compatible database server

Connect with any MySQL client

Create a schema

Make a Dolt commit

Insert some data

Examine the diff

Oh no! I made a mistake.

See the data in a SQL Workbench

Make changes on a branch

Make a schema change on another branch

Merge it all together

Audit Cell Lineage

Conclusion

Git For Data

Configure Dolt

Navigate to the directory where you would like your data stored

Initialize a database

Make a table

Make a Dolt commit

Examine a diff

Oh no! I made a mistake.

Create a branch

Merge to Main

Conclusion

Use Cases

Data Sharing

Problem

Dolt solves this by…

Dolt replaces...

Exchanging Files

External APIs

Companies Doing This

Case Studies

Other Related Articles

Data and Model Quality Control

Problem

Dolt solves this by…

Dolt replaces...

Unstructured files in cloud storage

MySQL, Postgres, or other databases

`.zip` Archive