Replication
Replication
Dolt can replicate data between two or more Dolt servers, or between Dolt and a MySQL server. This page describes the two supported replication modes between a Dolt primary server and Dolt replica servers. See the MySQL to Dolt Replication guide for more information on setting up a Dolt server as a read-replica for a MySQL server, and our blog post on Dolt-to-MySQL replication for instructions on running a MySQL server as a replica of a Dolt primary.
In Remote-Based Replication, Dolt uses a remote as a middleman to facilitate replication between the primary and read replicas. In this mode, Dolt replication triggers on a Dolt commit.
This is the simplest form of replication to configure and administer. Use this form of replication when you do not need the hot-standby support of Direct-to-Standby Replication. See Direct vs Remote Replication for more details on the differences between Remote-Based Replication and Hot Standby Replication.
In Direct-to-Standby Replication, the primary dolt sql-server instance replicates all writes to a set of configured standby servers. In this mode, there is no intermediate remote and all SQL transaction commits are replicated, not just Dolt commits.
Use this form of replication when you have high-availability requirements, and need a hot standby server ready to swap in for the primary. See Direct vs Remote Replication for more details on the differences between Remote-Based Replication and Hot Standby Replication.
The rest of this page describes configuration and considerations for both types of replication, starting with replication through a remote.
Replication Through a Remote
Configuration
Dolt relies on system variables to configure replication. The following system variables affect replication:
@@dolt_replicate_to_remote
. Required for a primary. The primary will push to the remote named on any branch or tag update. If more than one database is being served, each must have a remote with the given name.@@dolt_read_replica_remote
. Required for a replica. The replica will pull from the remote named at transaction start.@@dolt_replicate_heads
. Either this variable or@@dolt_replicate_all_heads
must be set on a replica. Used to configure specific branches (ie. HEADs) to pull. Set to a comma-separated list of branches to be replicated. The wildcard*
may be used to match zero or more characters in a branch name and is useful for selecting multiple branches. Has no effect on a primary.@@dolt_replicate_all_heads
. Either this variable or@@dolt_replicate_heads
must be set on a replica. Pull all branches and tags on a read replica (ie. HEADs). Defaults to 0. Has no effect on a primary.@@dolt_replication_remote_url_template
. Optional. Set to a URL template to configure the replication remote for newly created databases. Without this variable set, only databases that existed at server start time will be replicated.@@dolt_skip_replication_errors
. Makes replication errors warnings, instead of errors. Defaults to 0.@@dolt_transaction_commit
. Makes every transactionCOMMIT
a Dolt commit to force all writes to replicate. Default 0.@@dolt_async_replication
. Set to 1 to make replication pushes asynchronous, which means that read replicas will be eventually consistent with the primary. Defaults to 0.
Configuring a Primary
To set up a primary, you use the @@dolt_replicate_to_remote
system variable. You set that variable to the name of the remote you would like to use for replication.
In this example I am going to use a DoltHub remote to facilitate replication. I created an empty database on DoltHub and configured the appropriate read and write credentials on this host.
Then set the appropriate server variables:
The next time you create a Dolt commit, Dolt will attempt to push the changes to the remote.
And we can see the changes are pushed to the remote.
Note
Replication pushes can be triggered by running commands on the CLI even when no Dolt SQL server is running. CLI commands like dolt commit
, dolt merge
, or other command line invocations on a database configured to be a primary will cause any updated branches or tags to be pushed to the remote specified. On a replica, such commands will cause the replica to pull from the remote before execution.
Stopping Replication
To stop replication unset the configuration variable.
Note, if you have a running SQL server you must restart it after changing replication configuration.
Making every Transaction Commit a Dolt Commit
Often, a primary would like to replicate all transaction COMMIT
s, not just Dolt commits. You can make every transaction COMMIT
a Dolt commit by setting the system variable, @@dolt_transaction_commit
. With this setting, you lose the ability to enter commit messages.
And now on the remote.
Asynchronous replication
By default, replication is synchronous. The push must complete before the transaction commits. You can enable asynchronous replication using the @@dolt_async_replication
system variable. This setting will increase the speed of Dolt commits, but make read replicas eventually consistent.
Configuring a Replica
To start a replica, you first need a clone. I'm going to call my clone read_replica
.
Now, I'm going to configure my read replica to "pull on read" from origin. To do that I use the @@dolt_read_replica_remote system variable
. I also must configure which branches (ie. HEADs) I would like to replicate using either @@dolt_replicate_heads
to pick specific branches or @@dolt_replicate_all_heads
to replicate all branches.
Now on the primary.
And back to the replica.
Replicate all branches
Only one of @@dolt_replicate_heads
or @@dolt_replicate_all_heads
can be set at a time. So I unset @@dolt_replicate_heads
and set @@dolt_replicate_all_heads
.
Now I'm going to make a new branch on the primary and insert a new value on it.
The read replica now has the change when I try and read the new branch.
Replicating multiple databases
By running the SQL server with the --data-dir
option, you can manage multiple Dolt databases in the same server environment. If replication is enabled, all databases are replicated. A remote with the name given by @@dolt_read_replica_remote
(for replicas) or @@dolt_replicate_to_remote
(for primaries) must exist for every database in the server.
Whenever working with more than one database in a server with replication enabled, it's recommended to set @@dolt_replication_remote_url_template
so that newly created databases are replicated as well. Without this setting, newly created databases won't begin replicating until they have an individual remote configured and the server is restarted. With this setting, newly created databases on a primary automatically get a remote configured using the URL template provided and begin pushing to it.
@@dolt_replication_remote_url_template
must be a valid Dolt remote URL, with the replacement token {database}
in it. Some examples:
For some remotes, additional configuration for authorization may be required in your environment. Note: not all remote types support automatic database creation yet. In particular, DoltHub remotes do not yet support automatically creating remotes for new databases.
On read replicas, setting @@dolt_replication_remote_url_template
will cause new databases created on the primary to be cloned to the replica when they are first used.
Deleting branches
Branches deleted on a primary database will also be deleted on any read replicas.
Failover
No automated failover is possible using remote-based replicas, because there is no way to promote a read replica into a primary without a restart. To configure a database cluster that supports automated failover, please use direct-to-standby replication instead.
Multi-Primary
We do not have specific solutions or documentation to run Dolt as an OLTP database with multiple primaries. It is possible to connect several write targets with a common remote middleman, but they would need to reconcile merge conflicts in the same way an offline Dolt database does. Providing a transactional layer to enforce multi-primary (to avoid merge conflicts) or a way to automatically resolve merge conflicts is necessary to run Dolt as a multi-primary database effectively.
Direct to Standby Replication
Configuration
Replication direct to a standby is configured through the sql-server's YAML configuration. For example, if we have two servers, dolt-1.db
and dolt-2.db
, and we want to configure them for high-availability, we will typically configure them to replicate writes to each other.
On dolt-1.db
, we will have a config.yaml
like:
On dolt-2.db
, we will have:
Some important things to note:
On each server, the standby remote URL points to the other server in the cluster.
cluster.remotesapi.port
configures the port that the sql-server will listen on to receive replicated writes. It should match the port appearing in theremote_url_template
.The
cluster.bootstrap_role
between the two servers is different. This configuration says that when thedolt-1.db
server comes up, it will behave as the primary and will be enabled for writes.dolt-2.db
, on the other hand, will be a standby replica; it will be accept read requests and writes replicated from the primary.
The bootstrap_role
and bootstrap_epoch
only apply to a newly run server. Once the server has been running on a host, it will persist its current role and role epoch, and those will take priority over anything configured in the bootstrap configuration.
Bootstrap Remotes
If databases already exist when the dolt sql-server
instance is started, they will need to have corresponding remotes as configured in the cluster.standby_remotes
configuration. Any database created database through SQL CREATE DATABASE
will automatically have remotes created corresponding to the remote_url_templates
. The recommended way to run dolt sql-server
in cluster mode is in a newly empty directory with:
and then to create databases through the SQL interface.
If you want to create databases before hand, you should create corresponding remotes as well. For example, on dolt-1.db
above, I could run:
to initialize a dolt directory with a default database of appdb
, and then I could run sql-server with:
Replication Behavior
All SQL transactions and branch HEAD updates for all dolt databases are replicated. Newly created databases are replicated to the standby remotes. Branch deletes are replicated.
Currently, the following things are not replicated:
DROP DATABASE
. To drop a database, you will need to run the DROP DATABASE command on both the primary and on all standbys.Users and grants. To create or alter a user or a grant, you need to run the corresponding SQL user and grant statements on both the primary and all the standbys.
Replication Role and Epoch
When running with direct replication, each sql-server instance has a configured role it is playing in the cluster. It is either the primary
, which means it accepts writes through SQL and replicates those writes to every configured standby_remote
, or it is a standby
, which means it does not accept writes over SQL but it does accept replication writes from other sql-servers. When a server is configured to be a primary, it will not accept replication writes. When a server is configured to be a standby, it will not attempt to replicate its databases to its configured standby_remotes
.
Every time a server assumes a role, it assumes it at particular configuration epoch. This configuration epoch can only increase — attempting to assume a role at a lower configuration epoch, or a different role at the current configuration epoch, will fail.
A server's configured role can be manually changed by calling a stored procedure, dolt_assume_cluster_role
. This can be used for controlled and lossless failover from a primary to a standby. It can also be used to promote a standby to a primary when a primary is lost, although in that case the failover is not guaranteed to be lossless.
To make the current primary become a standby, run the following:
where 2
is the new configuration epoch and must be higher than the current epoch. The behavior will be the following:
The server will be put into read-only mode.
Every running SQL connection, except for the
CALL dolt_assume_cluster_role
call itself, will be canceled and the connection for the queries will be terminated.The call will block until replication of every database to each
standby_replica
is completed.If the final replications complete successfully, the new role and new configuration epoch are applied. If the final replications time out or fail, the new role is not assumed — the database is placed back into read-write mode and remains a
primary
at the old epoch.If the call is successful, the connection over which it was made will be tainted. It will fail all queries with an error asking the user to reconnect.
To make a current standby become a primary, run the following:
where 2
is the new configuration epoch and must be higher than the current epoch. The behavior will be the following:
The server will be put into read-write mode.
Every running SQL connection, except for the
CALL dolt_assume_cluster_role
call itself, will be canceled and the connection for the queries will be terminated.The new role and epoch will be applied on the server.
The connection over which the call was made will be tainted. It will fail all queries with an error asking the user to reconnect.
In the configured example, if you run the first statement on dolt-1.db
and the second statement on dolt-2.db
, you will have performed an orderly failover from dolt-1.db
to make dolt-2.db
the new primary.
Automatic Role Transitions
It can happen that server instances learn about new role configuration from their peers as they attempt to replicate writes or when they receive a replication request. In some of these cases, the server can automatically transition to a new role based on the incoming traffic or based on what it learns from its standby remote when it attempts to replicate. In particular, the following can happen:
When a
primary
is replicating to astandby_remote
, if it learns that thestandby_remote
is itself currently configured to be aprimary
at a configuration epoch which is higher than the replicating server, the replicating server will immediately transition to be astandby
at the same epoch as the epoch of thestandby_remote
.When a server receives a replication request, and the incoming request is from a configured
primary
which is at a higher configuration epoch than the server itself, the server will immediately transition to be astandby
at the same configuration epoch as the server which is making the incoming replication request.
In both cases, the transition will cause all existing queries and connections to the sql-server to be killed.
One further case can arise. If at any point, two servers communicate and they see that they are both configured as role primary in the same configuration epoch, that represents a fundamental misconfiguration of the cluster. The servers will transition to a special role, detected_broken_config
. In this role, the servers will serve read-only traffic, will reject writes of SQL, and will reject replication requests from other servers which may think they are primary at the same epoch. The role is somewhat viral — if other servers communicate with these servers and see the detected_broken_config
role at the same epoch, those servers will also transition to detected_broken_config
. A server which is in detected_broken_config
will become a standby
if it receives a replication request from a primary at a higher configuration epoch. It can also change its role based on a call to dolt_assume_cluster_role
.
A server never automatically transitions to be a primary.
Monitoring Replication Status
When configured with a cluster:
stanza in its YAML configuration, the sql-server instance will expose replication status through the SQL interface.
The current role and configuration epoch can be accessed through global session variables.
The current status of replication to the standby can be queried in a system table of a system database, dolt_cluster.dolt_cluster_status
.
For monitoring the health of replication, we recommend alerting on:
No configured
primary
in the cluster.NULL or growing
replication_lag_millis
on the primary.Non-NULL
current_error
.Any server in the cluster in role
detected_broken_config
.
A Note on Security
Enabling cluster replication on a dolt sql-server exposes a remotesapi port on the sql-server instance. Attempts are made to authenticate and authorize the traffic on this port so that only servers which are configured to replicate to each other can communicate over it.
On startup, the sql-server instance creates an ephemeral asymmetric encryption key and publishes its public key as a JWKS at an unauthenticated endpoint on the remotesapi.port
. For outgoing requests to any standby remote, the server signs its requests with its private key. For incoming requests, the server trusts any inbound request which is signed by a private key which corresponds to any public key which it was able to fetch from the published JWKSs of any of its configured standy remote peers.
The security of this scheme relies on the security of the network which is used to fetch the trusted keys from the JWKSes of the configured peers and on the inability of the authentication credentials signed with the private keys to be intercepted. The authentication credentials used are relatively short-lived but they are not secure against things like replay attacks.
If the network between the standby replicas is not entirely trusted, server-side TLS can be used to improve the security posture of the communication. In this case, the URLs of the remotes used for standby replication should have scheme https
. The remotesapi:
fragment of the cluster:
configuration is able to configure some server-side and client-side settings for its TLS communication:
Typically only one of server_name_urls
or server_name_dns
will be set. URLs are commonly used for certificates issued as part of SPIFFE, for example, while DNS names are commonly used in WebPKI and typically supported by browsers. If neither are set, but tls_ca
is set, the presented certificate chains of peers will be validated against things like isCa, key usage, validity windows and the signatures chaining to a trusted root, but no assertions will be made against the identity presented in the certificates themselves.
Direct vs. Remote Replication
The above presents two different ways of achieving replication and running read-replicas with dolt sql-server. For some use cases, either one might meet your needs, but they are somewhat different architecturally. Here are some things to consider when choosing how to configure replication.
Direct replication is designed to allow for controlled failover from a primary to a standby. Some inflight requests will fail, but all commited writes will be present on the standby after
CALL dolt_assume_cluster_role('standby', ...)
succeeds on the primary. After that the standby can be promoted to primary. On the other hand, replication through a remote does not currently have a way to promote a read replica in a way that makes it look exactly like the primary which was replicating to it. Replication through a remote is good for scaling out read performance but it is not currently as good for high availability.Direct replication requires distinct configuration on each server in the cluster and it requires tight coupling and deployment of new configuration for any changes to cluster topology. Replication through a remote is much more decoupled. There is no need to change any configuration in order to add a new read replica, for example.
Direct replication may experience lower write latency in certain deployments, since replicating new writes directly to the running sql-server instance on the standby server may be expected to be faster than pushing new files to a remote storage solution and having the read replica download the files from there. On the other hand, direct replication may be less scalable in the number of read replicas which it can gracefully handle, since the primary itself is responsible for pushing each write to each standby server. For read replicas, read latency for direct replication is always faster, since no communication to a remote must take place. You can expect increased read latency on every transaction start of aremote-based read replica.
The ability to replicate writes with direct replication is not coupled with the creation of dolt commits on a dolt branch. This may make it more appropriate for your use case, depending on how your application creates and manages dolt commits.
As mentioned above, the default security posture of replication through a remote and direct replication are currently quite different. While the configuration shown on this page for direct replication is relatively straightforward, to deploy in the real world requires bringing some form of external authentication and authorization, possibly in the form of PKI, certificates and a sidecar, or externally configured firewall rules.
Lastly, depending on your use case, it may be appropriate to utilize both forms of replication at the same time. You might do so, for example, if you need scalable and decoupled read replicas along with hot standbys for high availability. To do so, deploy a small cluster of servers with direct replication between them. Configure those servers to replicate their writes to a single remote. Then, deploy your read replicas as read replicas against that remote, the same as you would have if you had only one primary. When configured in this mode, only the primary replicates its writes to the configured remote — standby servers in the cluster will be available to become primary and take over write responsibilities, at which point the new primary will start replicating new writes to the remote.
MySQL to Dolt Replication
If you have an existing MySQL or MariaDB server, you can configure Dolt as a read-replica. As the Dolt read-replica consumes data changes from the primary server, it creates Dolt commits, giving you a read-replica with a versioned history of your data changes. See the MySQL to Dolt Replication guide for more details on how to configure this.
Dolt to MySQL Replication
If you want to run Dolt as your primary database server, you can still replicate a single branch to a MySQL database. This is often useful for change data capture (CDC) or data warehouse use cases that require pulling data via MySQL's replication protocol. For example, you can use Debezium to monitor changes in a Dolt database for CDC use cases. See our blog post on Dolt-to-MySQL replication for more details on setting up Dolt to MySQL replication.
Last updated