Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
We provide both .msi
files and .zip
files.
The easiest way to install Dolt on Windows is to use our MSI files that are provided with each release, they can be found in the Assets section of our release. Grab the latest here.
.zip
ArchiveFor those preferring to install Dolt manually a zipped archive is provided with the requisite executables. It can be found in assets along with our latest release.
For Linux users we provide an installation script that will detect your architecture, download the appropriate binary, and place in /usr/local/bin
:
The use of sudo
is required to ensure the binary lands in your path. The script can be examined before executing should you have any concerns.
Dolt is extremely simple to install. Dolt is a single ~100 megabyte program. To install it, you download or compile that program and put it on your PATH
. For each operating system, we created simpler, more familiar methods of installation.
Dolt is free and open source. You always have the option to build from source. This is also the best option if you want to use unreleased features or bug fixes.
If you want to run Dolt as a MySQL compatible server, we have additional instructions on how to do that on a Linux host or with Docker.
For those interested in building from source, clone the Dolt repo from GitHub and use go install
. Note, you must have Golang installed on the machine you are compiling on.
This will create a binary named dolt
at ~/go/bin/dolt
, unless you have $GO_HOME
set to something other than ~/go
.
Dolt is a SQL database you can fork, clone, branch, merge, push and pull just like a Git repository. Connect to Dolt just like any MySQL database to run SQL queries. Use the command line interface to import CSV files, commit your changes, push them to a remote, or merge your teammate's changes.
All the commands you know from Git work exactly the same in Dolt. Git versions files, Dolt versions tables. It's like Git and MySQL had a baby.
Dolt is a version controlled database. Dolt is Git for Data. Dolt is a Versioned MySQL Replica.
Dolt is a version controlled SQL database. Connect to Dolt just like any MySQL database to run SQL queries. Use Dolt system tables, functions, or procedures to access version control information and features.
Dolt is Git for data. Dolt matches the Git CLI exactly. When you would have run git add
, you run dolt add
. When you would have run git commit
, you run dolt commit
.
Dolt can be deployed as a Versioned MySQL Replica. Because Dolt is MySQL compatible, Dolt can be configured just like any other MySQL replica. A Dolt replica gives you features of a version controlled database without migrating from MySQL.
Hosted Dolt is a cloud-deployed Dolt database. Choose the type of server and disk you need and we'll provision the resources and run Dolt for you. Connect with any MySQL client. Hosted Dolt is perfect for teams who want to build a Dolt-powered application.
We also built DoltHub, a place to share Dolt databases. We host public data for free! DoltHub adds a modern, secure, always on database management web GUI to the Dolt ecosystem. Edit your database on the web, have another person review it via a pull request, and have the production database pull it to deploy.
Not ready to put your databases on the internet, no matter the permissions? We have a self-hosted version of DoltHub we call DoltLab. DoltLab gives you all the features of DoltHub, wherever you want them, in your own network or on your development machine.
The download script for Linux can be used, as OSX is a *nix
system. It will download the appropriate binary, and place it in /usr/local/bin
:
We publish a Homebrew formula with every release, so Mac users using Homebrew for package management can build Dolt from source with a single command:
This will install Dolt as follows:
On macOS, Dolt can also be installed via a via :
These instructions are for bootstrapping dolt as an application database server. They assume you are starting from scratch on a Linux machine without dolt installed or running.
Package manager support (.deb
and .rpm
distributions) is coming soon, but for now this set of manual setup work is necessary.
We have the instructions below packaged in a script here.
Install dolt. Run the following command:
This script puts the dolt
binary in /usr/local/bin
, which is probably on your $PATH
. If it isn't add it there or use use the absolute path of the dolt
binary for next steps.
Create a system account for the dolt user to run the server.
Before running the server, you need to give this user a name and email, which it will use to create its commits. Choose a dolt system account for your product or company.
You can override this user for future commits with the --author
flag, but this will be default author of every commit in the server.
Before running the dolt server for the first time, you need to create a database. Choose a directory within /var/lib/doltdb/databases
where you want your dolt data to live. Name the directory the same as the name of your database.
You should see output indicating that the database has been initialized:
Assuming you want your dolt server to always be running when the machine is alive, you should configure it to run through the Linux service management tool, systemctl
. Some distributions of Linux do not support this tool; consult their documentation for configuration instructions.
Write the server's config file in your home directory, then move it to where systemctl
needs it to live.
Finally, start the server as a system daemon.
The dolt sql server will now be running as a daemon. Test connecting to it with any SQL shell. Here we are using the mysql shell to connect.
Note that by default, Dolt runs on the same port as MySQL (3306). If you have MySQL installed on the same host, choose a different port for the server with the -P
argument.
By default, when starting dolt sql-server, Dolt will automatically initialize the default
root@localhostsuperuser, which is accessible only from the localhost and without a password. To change this account or add any additional accounts, you can use the standard
CREATE USER,
ALTER USER, and
GRANT` SQL statements.
These instructions should work for Debian, Ubuntu, Amazon Linux, and many other common distributions. If you find they don't work for yours and you would like your distribution documented, come chat with us on Discord or submit a PR to update the docs.
Dolt is constantly evolving. We release a new Dolt approximately once a week.
To upgrade, download the latest Dolt binary for your platform and replace the Dolt binary on your PATH
with the downloaded one. Running the install process on most platforms again will do this for you.
If you are running a dolt sql-server
you must restart the server to start using the new binary.
Running this image is equivalent to running the dolt
command. You can get the latest version with latest
tag, or you can get a specific, older version by using the Dolt version you want as the image's tag (e.g. 0.50.8
).
To check out supported options for dolt sql-server
, you can run the image with --help
flag.
From the host system, to connect to a server running in a container, we need to map a port on the host system to the port our sql-server is running on in the container.
We also need a user account that has permission to connect to the server from the host system's address. By default, as of Dolt version 1.46.0, the root
superuser is limited to connections from localhost. This is a security feature to prevent unauthorized access to the server. If you don't want to log in to the container and then connect to your sql-server, you can set the DOLT_ROOT_HOST
and DOLT_ROOT_PASSWORD
environment variables to control how the root
superuser is initialized. When the Dolt sql-server container is started, it will ensure the root
superuser is configured according to those environment variables.
In our example below, we're using DOLT_ROOT_HOST
to override the host of the root
superuser account to %
in order to allow any host to connect to our server and log in as root
. We're also using DOLT_ROOT_PASSWORD
to override the default, empty password to specify a password for the root
account. This is strongly advised for security when allowing the root
account to connect from any host.
If we run the command above with -d or switch to a separate window we can connect with MySQL:
Or, we can mount a local directory to specific directories in the container. The special directory for server configuration is /etc/dolt/servercfg.d/
. You can only have one .yaml
configuration file in this directory. If there are multiple, the default configuration will be used. If the location of configuration file was /Users/jennifer/docker/server/config.yaml
, this is how to use -v
flag which mounts /Users/jennifer/docker/server/
local directory to /etc/dolt/servercfg.d/
directory in the container.
The Dolt configuration and data directories can be configured similarly:
The dolt configuration directory is /etc/dolt/doltcfg.d/
There should be one .json
dolt configuration file. It will replace the global dolt configuration file in the container.
We set the location of where data to be stored to default location at /var/lib/dolt/
in the container. The data directory does not need to be defined in server configuration for container, but to store the data on the host system, it can also be mounted to this default location.
There will be directory called /docker-entrypoint-initdb.d
inside the container, and all appropriate files including .sh
or .sql
files. They will be run after server has started. This is useful for such as setting up your database with importing data by providing SQL dump file.
Here is how I set up my directories to be mounted. I have three directories to mount in a directory called shared
,
databases
is empty and is used for storing my data,
dolt
has a single .json
file that stores my dolt configuration
server
has a single .yaml
file that stores my server configuration
We can see both config files were used successfully.
We can verify that we have the data we create through the server in our local directory we mounted.
We can check for directory doltdb
created in our local /shared/databases
directory.
You can verify it has the data we created by using Dolt CLI Docker image if you do not have Dolt installed locally.
When running dolthub/dolt-sql-server
in an environment like Kubernetes, liveness and readiness checks can be configured with something like:
This above configuration uses the dolt
client within the server container to execute queries against the live server.
To check which version you have installed, run dolt version
on the command line or select dolt_version()
against a running SQL server. Make sure the version matches the latest as seen on the .
You can get a Dolt Docker container using our . Both images support linux/amd64
and linux/arm64
platforms and are updated on every release of . Older versions are also available, and tagged with the Dolt version they contain. The source of the Dockerfiles can be found
is useful if you need a container that already has the Dolt CLI installed on it. For example, this image is a good fit if you are performing data analysis and want to work in a containerized environment, or if you are building an application that needs to invoke Dolt from the command line and also needs to run in a container.
creates a container with Dolt installed and starts a Dolt SQL server when running the container. It is similar to MySQL's Docker image. Running this image without any arguments is equivalent to running dolt sql-server --host 0.0.0.0 --port 3306
command locally, which is the default settings for the server in the container.
You can either define server configuration as command line arguments, or you can use yaml configuration file. For the command line argument definition you can simply define arguments at the end of the docker command. See for more details and available options.
Dolt has three primary functions all with different ways to get started.
Run Dolt like you would MySQL or Postgres.
Use the Dolt Command Line Interface like you would the Git Command Line Interface.
Use Dolt as a replica to your primary MySQL server to get version control features without migrating.
We built Dolt as a better way to share data. Along the way, customers wanted an OLTP SQL database with Git features, so that is what Dolt became. Dolt is still a great way to share data but it's also a great SQL database.
Anything you can build with MySQL or Postgres you can build with Dolt.
Dolt really shines when your database can benefit from branches, merges, diffs, or clones. We've written about customers who use Dolt to build better cancer cell simulations, power an application with branches, or add a versioning layer to important spreadsheets. These are just the customers who allowed us to write about their use case.
Other customers use Dolt to manage video game configuration, get an immutable audit log of changes to their database, build reproducibility into machine learning models, ensure data quality using a pull request workflow, and much more.
Dolt is a MySQL compatible database server.
This document will walk you through step-by-step on how to get Dolt running as a MySQL compatible server on your host. You will set up a schema, insert data, and compose read queries using SQL. The document will also cover a number of unique Git-like Dolt features like commits, logs, as of queries, rollback, branches, and merges.
Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt
.
Dolt ships with a MySQL compatible database server built in. To start it you use the command dolt sql-server
. Running this command starts the server on port 3306.
Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there and open a new terminal.
In the new terminal, we will now connect to the running database server using a client.
MySQL comes with a MySQL server called mysqld
and a MySQL client called mysql
. You're only interested in the client. After following the instructions from MySQL's documentation, make sure you have a copy of the mysql
client on your path:
Now, to connect the mysql client to Dolt, you are going to force the MySQL client through the TCP interface by passing in a host and port. The default is the socket interface which Dolt supports, but is only available on localhost
. So, it's better to show off the TCP interface. The MySQL client also requires you specify a user, in this case root
.
To ensure the client actually connected, you should see the following in the dolt sql-server
terminal
As you can see, Dolt supports any MySQL-compatible client.
Now we're actually ready to do something interesting. I'll stay in the mysql
client and execute the following SQL statements to create a database called getting_started
. The getting_started
database will have three tables: employees
, teams
, and employees_teams
.
Dolt supports foreign keys, secondary indexes, triggers, check constraints, and stored procedures. It's a modern, feature-rich SQL database.
The naming of the system tables and stored procedures follows the dolt_<command>
pattern. So dolt add
on the CLI becomes dolt_add
as a stored procedure. Passing options also follows the command line model. For instance, to specify tables to add, send the table names in as options to the dolt_add
procedure. For named arguments like sending a message into the dolt_commit
command use two arguments in sequence like ('-m', 'This is a message')
. If you know Git, the version control procedures and system tables should feel familiar.
So, we add and commit our new schema like so.
There you have it. Your schema is created and you have a Dolt commit tracking the creation, as seen in the dolt_log
system table.
Now, I'm going to populate the database with a few employees here at DoltHub. Then, I'll assign the employees to two teams: engineering and sales. The CEO wears many hats at a start up so he'll be assigned to multiple teams.
Oops, I violated a constraint. It looks like I created the table with teams before employees. You should always specify your columns when you insert, not rely on natural ordering. Serves me right! Dolt comes with the full power of a modern SQL relational database to ensure data integrity.
Looks like everything is inserted and correct. I was able to list the members of the engineering team using that three table JOIN
. Dolt supports up to twelve table JOIN
s. Again, Dolt is a modern SQL relational database paired with Git-style version control.
Now, what if you want to see what changed in your working set before you make a commit? You use the dolt_status
and dolt_diff_<tablename>
system tables.
As you can see from the diff I've added the correct values to the employees
table. The values were previously NULL
and now they are populated.
Let's finish off with another Dolt commit this time adding all modified tables using -am
.
You can inspect the log using dolt_log
and see which tables changed in each commit using an unscoped dolt_diff
. Unscoped dolt_diff
tells you whether schema, data, or both changed in that particular commit for the table.
Dolt supports undoing changes via call dolt_reset()
. Let's imagine I accidentally drop a table.
In a traditional database, this could be disastrous. In Dolt, you're one command away from getting your table back.
Now, to connect you must select MySQL as the connection type. Then enter a name for your connection, getting_started
as your database, and root
as your user.
Click connect and you'll be presented with a familiar database workbench GUI.
To make changes on a branch, I use the dolt_checkout()
stored procedure. Using the -b
option creates a branch, just like in Git.
Tableplus gives me the ability to enter a multiple line SQL script on the SQL tab. I entered the following SQL to checkout a branch, update, insert, delete, and finally Dolt commit my changes.
Here's the result in Tableplus.
Back in my terminal, I cannot see the table modifications made in Tableplus because they happened on a different branch than the one I have checked out in my session.
I can query the branch no matter what I have checked out using SQL as of
syntax.
If I'd like to see the diff between the two branches, I can use the dolt_diff()
table function. It takes two branches and the table name as arguments.
As you can see, you have the full power of Git-style branches and diffs in a SQL database with Dolt.
I can also make schema changes on branches for isolated testing of new schema. I'm going to add a start_date
column on a new branch and populate it.
Changing schema on a branch gives you a new method for doing isolated integration testing of new schema changes.
Let's assume all the testing of the new schema on the schema_changes
branch and data on the modifications
branch completed flawlessly. It's time to merge all our edits together onto main
. This is done using the dolt_merge
stored procedure.
Schema change successful. We now have start dates. Data changes are next.
Data changes successful as well. As you can see, I am now "Timothy" instead of "Tim", Daylon is added, and we all have start dates except for Daylon who was added on a different branch.
I'm also gone from the Sales Team. Engineering is life.
Now, we have a database with all the schema and data changes merged and ready for use.
Which commit changed my first name? With Dolt you have lineage for every cell in your database. Let's use the dolt_history_<tablename>
and dolt_diff_<tablename>
to explore the lineage features in Dolt.
dolt_history_<tablename>
shows you the state of the row at every commit.
dolt_diff_<tablename>
allows you to filter the history down to only commits when the cell in question changed. In this case, I'm interested in the commits that are changing my first name. Note, there are two commits that changed my name because one is the original change and the second is the merge commit.
Dolt provides powerful data audit capabilities down to individual cells. When, how, and why has each cell in your database changed over time?
That should be enough to get you started. We covered installation, starting a SQL server, connecting with various clients, creating a database and schema, inserting and updating data on main, using branches for change isolation, rollback, diffs and logs, merge, and cell lineage. You had the grand tour. Hopefully you are starting to imagine the possibilities for your Dolt-backed applications.
Want to dive even deeper? Here are some links to advanced topics:
Any databases you create will be stored in this directory. So, for this example, a directory named getting_started
will be created here later in this walkthrough, after you run create database getting_started;
in a SQL shell (see section ). Navigating to ~/dolt/getting_started
will then allow you to access this database using the Dolt command line.
Let's grab a copy of MySQL so we can connect with that client. Head over to the documentation and install MySQL on your machine. I used to install MySQL on my Mac.
It's time to use your first Dolt feature. We're going to make a Dolt . A Dolt commit allows you to time travel and see lineage. Make a Dolt commit whenever you want to restore or compare to this point in time.
Dolt exposes version control functionality through a Git-style interface. On the command line, Dolt commands map exactly to their Git equivalent with the targets being tables instead of files. In SQL, Dolt exposes version control read operations as and version control write operations as .
Note, a Dolt commit is different than a standard SQL transaction COMMIT
. In this case, I am running the database with on, so each SQL statement is automatically generating a transaction COMMIT
. If you want a system to generate a Dolt commit for every transaction use the system variable,.
Dolt makes operating databases less error prone. You can always back out changes you have in progress or rewind to a known good state. You also have the ability to undo specific commits using .
Note, undoing changes from a drop database
statement requires a special SQL procedure, .
Hate the command line? Let's use to make some modifications. Tableplus is a free SQL Workbench. Follow the installation instructions from their website.
Are you using spreadsheets to curate production data?
Is the process of merging and reviewing everyone’s changes getting out of hand?
Are bad data changes causing production issues?
Would human review of cell-level data changes help?
Dolt allows you to treat your spreadsheet like code. DoltHub and DoltLab implement a Pull Request workflow on tables, the standard for reviewing code changes. Extend that model to your data changes. Make changes on branches and then have the changes human reviewed. Data diffs are easily consumed by a human reviewer. Add continuous integration tests to data changes. Have dozens or hundreds of changes in flight at one time.
DoltHub and DoltLab support SQL, File Upload (CSV), and a spreadsheet editor for data modification. These interfaces are simple enough that non-technical users can make and review data changes.
Dolt is a MySQL compatible database so exporting the manually created data to production can be as simple as cloning a copy and starting a server for your developers to connect to.
Dolt replaces Excel or Google Sheets for manual data curation. Versioning features allow for more efficient asynchronous collaboration and human review of data changes. The DoltHub interface is still easy enough for non-technical users to contribute and review data changes.
Do you share data with customers?
Do they ask you what changed between versions you share?
Do they want to actively switch versions instead of having data change out from under them?
Or, are customers or vendors sharing data with you?
Are you having trouble maintaining quality of scraped data?
When new data is shared or scraped, do downstream systems break?
Would you like to see exactly what changed between data versions?
Do you want to add automated testing to data shared with you?
Would you like to instantly rollback to the previous version if tests fail?
Dolt was built for sharing. The Git model of code sharing has scaled to thousands of contributors for open source software. We believe the same model can work for data.
Dolt is the world's first version controlled SQL database. Git-style version control allows for decentralized, asynchronous collaboration. Every person gets their own copy of the database to read and write. DoltHub allows you to coordinate collaboration over the internet with permissions, human review, forks and all the other distributed collaboration tools you are used to from GitHub.
Dolt and DoltHub is the best way to share data with customers. Use versions to satisfy both slow and fast upgrading consumers. Let your customers help make your data better. Versions offer better debugging information. Version X works but version Y doesn't. Your customers can even make changes and submit data patches for your review, much like open source.
Dolt and DoltHub are also great if vendors share data with you. When you receive data from a vendor, import the data into Dolt. Examine the diff, either with the human eye or programmatically, before putting the data into production. You can now build integration tests for vendor data. If there's a problem, never merge the import branch into main or roll the change back if a bug was discovered in production. Use the problematic diff to debug with your vendor. The same tools you have for software dependencies, you now have for data dependencies.
Dolt replaces exchanging flat data files like CSVs via email, FTP servers, or other file transfer techniques. Dolt allows data to maintain schema on exchange including constraints, triggers, and views. This more rich format of exchange reduces transfer errors. Dolt also allows you to change the data to fit your needs and still get updates from your source. Dolt will notify you if your changes conflict with the source.
Dolt is ideal for sharing data that does not have an API. But even for data with an API, Dolt is often more convenient. With Dolt, you get all the data and its history. With APIs you often have to assemble the data with multiple API calls. With APIs, the data can change out from under you, whereas with Dolt you can read a version of the data until you are ready to upgrade. DoltHub ships with a SQL API so you can choose the data sharing solution that is right for your use case.
Let us know if you would like us to feature your use of Dolt for data sharing here.
Are you in the business of creating data and models?
Do you want to institute human or automated review on data changes for data quality assurance?
Are you worried about model reproducibility?
Do different people or teams want to work on slightly different versions of the data?
Are long running projects hard to pull off because of parallel data changes?
Would data branches help?
Do you want the ability to query or roll back to a previous version of the data instantly?
Traditional databases were built for a world of transactions and reports. Modern data science tools use data to create models that behave more like software than reports. Models produce user visible outputs and define application behavior. Tuning data to get the right model can be a lot like writing code.
The version control tools we use to build software apply to modern data science. Version control for data did not exist until Dolt, the first and only database you can branch, diff, and merge just like a Git repository.
Modern data science applications require model reproducibility, data quality, and multiple versions of data to perform at their best. Dolt allows for these capabilities directly in your database, in a Git-style version control model most developers understand.
Dolt is used for model reproducibility. If you build a model from a version of the data, make a tag at that commit and refer to that tag in the model metadata. Some of our data and model quality control customers only use Dolt for this simple feature. Dolt shares storage between versions so you can store many more copies of the data using Dolt than say storing copies of the data in S3.
Dolt allows for human or automated review on data changes increasing data quality. If a bad change makes it through review simply roll the data back to a previous version. DoltHub, DoltLab, and the Hosted Dolt Workbench all implement a Pull Request workflow, the standard for human reviewing code changes. Extend that model to your data changes.
Dolt is the only database with branch and merge functionality. Branches allow for long running data projects. Want to add an additional feature to a model but don't want the new feature effecting the production model build? Make branch and run the project on that branch. Occasionally merge production data into that branch so you can stay in touch with changes there. Companies use Dolt branches to increase the number of parallel data projects by an order of magnitude.
Lastly, commits, logs, and diffs can be used for model insights. Did Thursday's model perform better than Tuesday's but had the same model weights? Inspect the data diff to see what changed. Inspect the commit log to see where that new data came from.
It is common practice to store copies of training data or database backups in cloud storage for model reproducibility. A full copy of the data is stored for every training run. This can become quite expensive and limit the amount of models you can reproduce. Dolt stores only the differences between stored versions decreasing the cost of data storage. Additionally, Dolt can produce diffs between versions of training data producing novel model insights.
Dolt can replace any database used to store and query data. Many of our customers switch from other OLTP databases like MySQL or Postgres to improve data and model quality through versioning. Customers have also switched to Dolt from document databases like MongoDB. Dolt's additional unique features like branches, diffs, and merges allow for human review of data changes and multiple parallel data projects.
Is your production MySQL vulnerable to data loss?
If an operator runs a bad query, script, or deployment can your production MySQL can be down for hours or days as you recover data from backups or logs?
Are you worried your backups aren't working?
Does internal audit want an immutable log of what changes on your MySQL instance?
Do you want the ability to copy and sync your production MySQL database for analytics, development, or debugging?
Because Dolt is MySQL-compatible, you can set Dolt up as a versioned replica of your MySQL primary. Every transaction commit on your primary becomes a Dolt commit on the Dolt replica.
On your Dolt replica, you get a full, immutable, queryable audit log of every cell in your database. If an auditor wants guarantees that a cell in your database has not been modified, you can use Dolt to prove it. Diffs can be produced for every transaction.
If an operator makes a bad query, runs a bad script, or makes a bad deployment, you have an additional tool beyond backups and logs to restore production data. Find the bad transactions using Dolt's audit capabilities. Rollback the bad individual transactions. Produce a SQL patch and apply that back to your primary. If there are conflicting writes, Dolt will surface those for you and you can decide how to proceed. A Dolt replica becomes an essential part of your disaster recovery plan, shortening some outages by hours or days or recovering lost production data.
Moreover, Dolt can be added to your serving path as a read-only MySQL replica, so you know that it is always in sync with your primary. Your disaster recovery instance can serve production traffic so you always know it's working.
Additionally, a Dolt replica can be easily cloned (ie. copied) to a developer's machine for debugging purposes. See a data issue in production? Debug locally on your laptop safely.
Dolt as a versioned replica becomes your first line of defense against a bad operator query, script, or deployment. Dolt is online and contains the full history of your database. In a disaster you can use diffs to find a bad query and roll it back. Then you can produce a database patch and apply it to production. You do not need to reinstall from a backup and play the transaction log back to the point of the failure, an extremely time consuming process.
Change Data Capture is a way to add a history of data changes to an existing database. Modern change data capture tools consume replication logs to produce database changes in a consumable stream. Dolt can consume the same logs producing a simpler change data capture solution.
Let us know if you would like us to feature your use of Dolt as a versioned MySQL replica here.
Is your configuration too big and complex for files?
Is your configuration more like code than configuration?
Does configuration have a large production impact?
Are configuration changes hard to review?
Are multiple configuration changes hard to merge together when it’s time to ship?
Are you building a game with lots of assets and configuration?
Configuration is generally structured and managed as large text files. YAML and JSON formatted configuration is very popular. These formats are unordered, meaning standard version control solutions like Git cannot reliably produce diffs and merges. Moreover, configuration can get quite large, running up against the file size limits of tools like Git.
Some configuration is better modeled as tables. Tables by design are unordered. Tables can contain even JSON columns for parts of your configuration you want to remain loosely typed.
Dolt is an ideal solution for version controlling tabular configuration. Dolt allows for all the version control features you came to know and love when your data was small like branches, diffs, and human review via pull requests.
This use case is particularly popular in video games where much of the game functionality is modeled as configuration. Store the likelihood of an item drop or the strength of a particular enemy in Dolt tables. Review and manage changes. When the configuration is ready, use a build process to create whatever format your game needs.
Most large configuration files are stored and versioned in Git. If the files get too large they are store in cloud storage and linked to Git using git-lfs. If the files are stored in git-lfs, you lose the ability to diff the contents of the files. Dolt improves the experience by adding query capabilities and large fine-grained diffs to the data stored in configuration files. The diff and merge experience will be greatly improved in Dolt for this type of data.
Let us know if you would like us to feature your use of Dolt for configuration management here.
Dolt is Git for Data. You can use Dolt's command line interface to version control data like you version control files with Git. Git versions files, Dolt versions tables.
Once you have Dolt installed, type dolt
and you'll start to feel the git
vibes immediately.
That's right, all the git commands your used to like checkout
, diff
, and merge
are all implemented on top of SQL tables instead of files. Dolt really is Git for Data.
After installing Dolt, the first thing you must do is set the user.name
and user.email
config variables. This information will be used to attribute each Dolt commit to you. Defining the Git equivalent variables is also required by Git.
After running these commands you can see a file with them in your ~/.dolt
directory.
Dolt needs a place to store your databases. I'm going to put my databases in ~/dolt
.
Like Git, Dolt relies on directories to store your databases. The directories will have a hidden .dolt
directory where your database is stored after you run dolt init
. So, let's make a directory called git_for_data
that will house our dolt
database, cd
to it, and run dolt init
. The database name will be git_for_data
, the same as the directory name.
You now have a fresh Dolt database. It has a single entry in dolt log
.
Git versions files. Dolt versions tables.
In Git, you usually use a text editor to make files. In Dolt, there a few ways to make tables. You can import a file, like a CSV. You can run SQL offline via the command line. Or you can start a SQL server and run SQL online. I'll walk through examples of each in this document as we go.
Let's make our table initially from a CSV. Dolt supports creating tables via the dolt table import
command. In Dolt, tables have schema and data. With dolt table import
, Dolt automatically infers the schema from the data, making it easier to version CSVs without having to worry about types.
Here's our CSV file. We're going to use a simple list of employees here at DoltHub.
dolt table import
is fairly simple. You pass in a table name and the file path as well as the --create-table
, --replace-table
or --update-table
option. We're going to pass in --create-table
because we're making a new table.
We're going to pass in the id
column as a primary key as well. Primary keys in Dolt make for better diffs. Dolt can identify rows across versions by Primary Key. I'm trying to limit the database talk here being "Getting Started: Git for Data" and all but I'll need to introduce a couple other database concepts as well. Dolt is like Git and MySQL had a baby.
We can make sure it's there using the dolt status
command. Dolt has a staging area just like Git so right now it is in the working set but not staged.
We can inspect the table using SQL on the command line. Dolt allows you to run queries from the command line using dolt sql -q
. This is often more convenient, especially in the Git for Data use case, than starting a server and opening a MySQL client. Dolt supports the MySQL flavor of SQL.
Everything looks good so it's time to add
and commit
our new employees
table. This is just like adding and committing a new file in Git. Tables start off untracked so you must explicitly add them, just like new files in Git.
And inspecting the log it looks like we're good! As you can see, Dolt takes "Git for Data" very literally.
Now, I want to add an employee and change my name from "Tim" to "Timothy", you know, to be professional. I'm going to do that through the command line SQL interface and show you the diff.
That's not right! Diffs in Dolt are a powerful way to ensure you changed exactly what you thought you've changed, ensuring data quality.
Just like with Git, In Dolt I can roll back a number of ways. I can checkout
the table or reset --hard
. Let's checkout
the table.
Now, I'll re-run the correct queries and check the diff tyo make sure I did it right this time.
Looks like I got it right this time. I'll make a commit.
Dolt is also a drop in replacement for MySQL. So, if you like working in a SQL Workbench like TablePlus or Datagrip instead of the command line, I will show you how now. This is the closest you will get to using something like Visual Studio Code with Git.
In your terminal, run:
Your terminal will just hang there. This means the server is running. Any errors will be printed in this terminal. Just leave it there.
Now we can connect with TablePlus. Download and open TablePlus. Click "Create a new connection...". Select MySQL and click "Create". You'll be granted with a set of options. Fill it i n like so.
Click connect and you'll be presented with a familiar database workbench GUI.
Now we want to make some changes on a branch. You can so this by running the following SQL.
Notice how the Git command line is implemented as SQL stored procedures. Write operations like checkout
and commit
are implemented as stored procedures and read operations like diff
and log
are implemented as system tables.
In TablePlus, you click SQL, enter the SQL and the "Run Current" which should generate something that looks the following output.
Alright, now that we've shown you that you can work in server mode, let's get back to the command line like true Gits. Hit Ctrl-C
on the server terminal to kill the server. You'll notice we have two branches:
Let's checkout the branch and see that Taylor is on it.
Branches work the exact same way as Git. Make a branch so that your changes don't effect other people.
Finally, let's merge it all to main and delete our branch.
I got a fast-forward merge, just like Git, since there were no other changes on main.
As you can see, Dolt is Git For Data. The Dolt command line works exactly like the Git command line except the versioning target is tables instead of files.
Do you need to know who changed what, when, why in your SQL database?
Do you want an immutable record of changes going back to the inception of your database?
Is an audit team asking for this information for compliance purposes?
Do you want to be able to query this audit log like any other table in your database?
Do you want the data to be efficiently stored so you can trace changes back to inception?
Moreover, if Dolt is your production database, there is no need for an additional change data capture system. The audit capability is a built-in feature of the production Dolt database.
Let us know if you would like us to feature your use of Dolt for audit here.
The Dolt log is a way to visualize the Dolt commit graph in an intuitive way. When viewing the log, you are seeing a topologically sorted commit order that led to the commit you have checked out. The log is an audit trail of commits.
In Dolt, you can visualize the log of a database, a table, a row, or even a cell.
Log is usually filtered by branch. Any commits not reachable in the graph from the current commit will be omitted from the log.
Logs are useful in reverting the database to a previous state. You determine the state of the database you want via log and then use other Dolt commands to change the database to a different state.
Logs are useful when trying to track down why the database is in a particular state. You use log to find the commits in question and usually follow up with diffs (i.e. differences) between two commits you found in the log.
Logs are useful in audit. If you would like to ensure a particular value in the database has not changed since the last time you read it, log is useful in verifying this.
Conceptually and practically log on the command line is very similar between Git and Dolt. A table is akin to a file in Git.
Dolt has additional log functionality beyond Git. You can produce a log of any cell (i.e. row, column pair) in the database using a SQL query against the dolt_history_<tablename>
system table.
Dolt implements Git-style version control on tables instead of files.
On the command-line, these concepts are exposed as a replica of the Git command line. Where you would type git log
, you now type dolt log
. Where you would type git add
, you type dolt add
. The replication extends to the command arguments.
In this section we explore the following Git concepts and explain how they work in Dolt:
Dolt brings the features of Git-style distributed version control to the SQL database.
Git-style Distributed Version Control allowed the world to collaborate on open source software in a beautiful way. Dolt aspires to bring that distributed collaboration model to data.
SQL is the worldwide standard for data description and querying. SQL has been popular for 50 years. By combining schema and data, SQL gives data a powerful language for data practitioners to communicate with.
Before Dolt, to share a SQL database with a fellow data practitioner, you both needed to share the same view of the data. Only one write could happen at a time. Making a copy implied creating a point in time backup and restoring on a separate running server. Once that copy was made, the two databases could change independently. There was no tractable way to compare the two copies of the database to see what changed. Moreover, there was no easy way to merge the two copies back together. In source code parlance, the copy was a hard fork of the database.
The inability to copy and merge forced databases into a specific model of usage. Data was hard to move and share. As an industry, we built complicated pipelines to move and transform data between databases. We built APIs to allow programmatic, controlled access to data.
Here at DoltHub, we looked at all these systems and thought there must be a better way. What if you could copy a database, make changes, compare the database to any other copy, and merge the changes whenever you wanted? What if thousands of people could do this at the same time? What if you could use Git workflows on databases?
A database with these properties would allow thousands of users to read and write at the same time. If someone made a mistake, no big deal, just roll back the change. Need a copy of the data to run a metrics job on? No problem, just make a clone. Bug in production? Create a copy of the database on your laptop, start your services, change the production data to speed debugging. Want to open your data up to the world? Push it up to a remote that's accessible via the internet.
In order to achieve the above mission, Dolt needed to implement Git concepts in a SQL database. As best we could, we tried to keep things as similar as possible.
We built Dolt using the following axioms:
Git versions files. Dolt versions table schema and table data.
Dolt will copy the Git command line exactly.
Dolt will be MySQL compatible.
Git features in SQL will extend MySQL SQL. Write operations will be procedures. Read operations will be system tables.
Are you expecting your application to make writes locally while offline?
Do these writes need to be synced to a central server or other nodes?
How are you going to detect conflicting writes?
What are you going to do if you detect them?
Would the Git model of clone, push, and pull on your data help?
Dolt brings Git-style decentralization to the SQL database. Just like Git is ideal in no connectivity environments when dealing with files, Dolt is ideal in low connectivity environments when dealing with tables. Most large scale data is stored in tables.
With Dolt you write to the database disconnected. You can have a fully functioning offline application that uses the exact same software and models it would use if it were a standard centralized SQL database.
When it is safe to connect to the internet, Dolt computes the difference between what you have and what a peer database has and only sends these differences both ways. This synchronization process is very efficient, effectively allowing you to get the most information possible in and out in the shortest amount of time. Once the synchronization is complete, go back to disconnected. You and the peer now share a synchronized view with complete, auditable edit history.
Conflicting writes are surfaced quickly and an operator or software can take additional action to resolve.
Be the first
Let us know if you would like us to feature your use of Dolt for data sharing here.
Do your customers want branches and merges in your application?
Do your customers want to review changes in your application before they go live?
Do you want to add a pull request workflow to your application?
Do you want to expose audit log functionality in your application?
Do you want to expose rollback functionality in your application?
Dolt provides a built-in, queryable audit log of every cell in your database. Whenever a is created, the user, time, and optional commit message are recorded along with the data that changed. These commits form an to every cell in your database going back to inception.
Dolt stores these changes efficiently by . Effectively, only the differences are stored between versions of the data.
The audit log created between commits is queryable via standard SQL using custom Dolt and . The results can be filtered and joined using other data in your database.
If you're not ready to switch your primary database to Dolt to get its audit capabilities, you can run MySQL as your primary and set Dolt up as . You lose users and commit messages but you still get a queryable log of every cell in your database.
A technique to add audit capability to an existing database is to add . Soft delete is the use various techniques to mark data as inactive instead of deleting it. This is strictly worse than a version controlled database for audit purposes. With soft deletes, an operator can still modify data or the application can make mistakes. In Dolt, every write is part of the audit log. It is far more difficult for an operator to change Dolt history.
is another way to add audit capability to an existing database. Some change data capture techniques are similar to strategies. Modern change data capture tools consume replication logs to audit database changes. Dolt can consume the same logs in the producing a simpler and thus, more audit-friendly, change data capture solution.
Dolt adopts the Git-interface to version control. There are , , , and all the other Git concepts you are familiar with. If you know Git, Dolt will feel very familiar because conceptually, Dolt is modeled on Git.
In SQL, Dolt becomes a bit more complicated because no Git-equivalent to SQL exists. Git read operations are modeled as . Git write operations are modeled as . But conceptually, all the Git concepts you are familiar with extend to SQL.
In order to achieve the above at scale, we needed to start at the bottom; the storage engine of the database. to offer you the Git experience in a SQL database.
In this section of the documentation, we will explain , , and concepts and how we applied them in Dolt using the above axioms.
Dolt replaces custom code to synchronize your client and server. This code is complicated and hard to get right. The Git model of clone, fetch, push, and pull is a proven synchronization model. Dolt brings this model to the database allowing you to remove most of your synchronization code.
If you have an application that would benefit from , , , , and human review of changes, you can use Dolt to power that application. Dolt gives you branch, diff, and merge at the database layer.
Programmatically access git functionality via , , and . Programmatic control of Git operations combined with the ability to use creates the ideal foundation to add version control to your application.
Dolt ships with standard tools like and . Run Dolt with a hot standby and failover just like MySQL or Postgres.
is a hosted version of Dolt that works like AWS RDS. Let us worry about operating Dolt in the cloud. Write your application against a cloud endpoint.
In the past applications that needed these features required or . These approaches are cumbersome and do not support merge. Dolt gives application the full development power of Git.
A common technique to version your database is to use . When your application would make an update or a delete, you application instead makes an insert and marks the old row invalid. Dolt obviates the need for this technique. You can keep your existing database schema and Dolt ensures every write is non-destructive. Queries against soft deleted rows become Dolt history queries against .
A more advanced technique for versioning databases is . Slowly Changing Dimension is similar to soft deletes. Additional database columns are added to tables to manage versioning. Dolt is slowly changing dimension on every table by default. Queries involving the slowly changing dimension become Dolt history queries against . Moreover, complicated processes can happen at the database layer. Merges must handled by custom code at the application layer with slowly changing dimension.