LogoLogo
DoltHubBlogDiscordGitHubDolt
  • Introduction
    • What Is Dolt?
    • Installation
      • Linux
      • Windows
      • Mac
      • Build from Source
      • Application Server
      • Docker
      • Upgrading
    • Getting Started
      • Version Controlled Database
      • Git For Data
      • Versioned MySQL Replica
    • Use Cases
      • Data Sharing
      • Data and Model Quality Control
      • Manual Data Curation
      • Version Control for your Application
      • Versioned MySQL Replica
      • Audit
      • Configuration Management
      • Offline First
  • Concepts
    • Dolt
      • Git
        • Commits
        • Log
        • Diff
        • Branch
        • Merge
        • Conflicts
        • Remotes
        • Working Set
      • SQL
        • Databases
        • Schema
        • Tables
        • Primary Keys
        • Types
        • Indexes
        • Views
        • Constraints
        • Triggers
        • Procedures
        • Users/Grants
        • Transactions
        • System Variables
      • RDBMS
        • Server
        • Backups
        • Replication
    • DoltHub/DoltLab
      • Permissions
      • Pull Requests
      • Issues
      • Forks
  • SQL Reference
    • Running the Server
      • Configuration
      • Access Management
      • Branch Permissions
      • Backups
      • Garbage Collection
      • Metrics
      • Replication
      • Troubleshooting
    • Version Control Features
      • Using Branches
      • Merges
      • Querying History
      • Using Remotes
      • Procedures
      • Functions
      • System Tables
      • System Variables
      • Saved Queries
    • SQL Language Support
      • Data Description
      • Expressions, Functions, Operators
      • Supported Statements
      • MySQL Information Schema
      • Collations and Character Sets
      • System Variables
      • Miscellaneous
    • Supported Clients
      • Programmatic
      • SQL Editors
    • Benchmarks and Metrics
      • Correctness
      • Latency
      • Import
  • CLI Reference
    • Commands
    • Git Comparison
  • Architecture
    • Overview
    • Storage Engine
      • Commit Graph
      • Prolly Trees
      • Block Store
    • SQL
      • Go MySQL Server
      • Vitess
  • Guides
    • Cheat Sheet
    • Contributing
      • dolt
      • go-mysql-server
    • MySQL to Dolt Replication
    • Importing Data
    • Integrations
  • Other
    • FAQ
    • Roadmap
    • Versioning
  • Products
    • Hosted Dolt
      • Getting Started
      • Notable Features
      • SQL Workbench
      • Cloning a Hosted Database
      • Using DoltHub as a Remote
      • Infrastructure
    • DoltHub
      • Data Sharing
      • API
        • Authentication
        • SQL
        • CSV
        • Database
        • Hooks
      • Continuous Integration
        • Getting Started
        • Workflow Reference
      • Transform File Uploads
      • Workspaces
    • DoltLab
    • Dolt Workbench
    • DoltgreSQL
Powered by GitBook
On this page
  • What is a Diff?
  • How to use diffs
  • Difference between Git diffs and Dolt diffs
  • Example
  • Schema
  • Data with Primary Key
  • Data without primary key
  • SQL

Was this helpful?

Edit on GitHub
Export as PDF
  1. Concepts
  2. Dolt
  3. Git

Diff

PreviousLogNextBranch

Last updated 1 year ago

Was this helpful?

What is a Diff?

Diff, short for difference, is used to display the differences between two references, usually commits. Dolt produces diffs of schema and data.

Dolt produces diffs of table schema.

Schema diffs appear as raw textual differences between the CREATE TABLE statements used to define the table schema at each commit. The can be produced on the command line using dolt diff --schema. No way to generate schema diffs in SQL exists yet.

Dolt produces cell-wise diffs for table data. If a primary key exists, rows are identified across commits using the primary key. Changes to columns that are not in the primary key will appear as updates. Changes to primary key columns will appear as inserts and corresponding deletes.

If no primary key exists, all changes look like inserts and deletes. Effectively, for diff purposes, the keys of the table with no primary keys are the entire form.

Dolt can produce diffs on the command line, as tables, or as a SQL patch.

Dolt can produce diffs at scale because the Dolt storage engine breaks the rows in the database down into chunks. Each chunk is content-addressed and stored in a tree called . Thus, to calculate data diffs, Dolt walks the trees at both commits, exposing the chunks that are different. For instance, if nothing has changed, the content address of the root of the table is unchanged.

How to use diffs

Diffs are an invaluable tool for data debugging.

In human readable form, seeing what cells in your database changed can help you instantly spot problems in the data that may have gone overlooked. You can see diffs in human readable form via the or through a SQL query of the .

For instance, are you expecting no NULL cells but have some? This indicates a bug in your data creation process. Simply looking at a summary of how many rows were added, modified, and deleted in a specific change can be fruitful. Expecting only row additions in a change but got some modifications? A deeper dive into that import job may be required.

Programmatically, you can use SQL to explore very large diffs using the .

Difference between Git diffs and Dolt diffs

Git and Dolt diffs are conceptually the same. Display the differences between two sets of files in Git's case and tables in Dolt's case.

The Git diff command supports many more file specific options. Dolt diffs can be queried using SQL. Dolt diffs produce diffs for schema and data. There is no schema diff equivalent in Git.

Example

Schema

docs $ dolt sql -q "alter table docs add column c1 int"
docs $ dolt diff
diff --dolt a/docs b/docs
--- a/docs @ 90tss7r2gfraa2cjugganbbtg5j6kjfc
+++ b/docs @ nt808mhhienne2dss4mjdcj8jrdig6ml
 CREATE TABLE `docs` (
   `pk` int NOT NULL,
+  `c1` int,
   PRIMARY KEY (`pk`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
+-----+----+----+
|  <  | pk |    |
|  >  | pk | c1 |
+-----+----+----+
+-----+----+----+

Data with Primary Key

Addition/deletion

docs $ dolt sql -q "insert into docs values (1,0),(2,1)"
Query OK, 2 rows affected
docs $ dolt diff
diff --dolt a/docs b/docs
--- a/docs @ cj9ln2kg7u6aiprr7i24rf48es4tk1eg
+++ b/docs @ qfiv5iankh0gltpov71h0mcvaqvftvlh
+-----+----+----+
|     | pk | c1 |
+-----+----+----+
|  +  | 1  | 0  |
|  +  | 2  | 1  |
+-----+----+----+
docs $ dolt sql -q "delete from docs where pk=0"
Query OK, 1 row affected
docs $ dolt diff
diff --dolt a/docs b/docs
--- a/docs @ cj9ln2kg7u6aiprr7i24rf48es4tk1eg
+++ b/docs @ 8aca41vfss9kcqkrdhos25be87nlu3b9
+-----+----+----+
|     | pk | c1 |
+-----+----+----+
|  -  | 0  | 0  |
|  +  | 1  | 0  |
|  +  | 2  | 1  |
+-----+----+----+

Update

docs $ dolt sql -q "update docs set c1=1 where pk=1"
Query OK, 1 row affected
Rows matched: 1  Changed: 1  Warnings: 0
docs $ dolt diff
diff --dolt a/docs b/docs
--- a/docs @ 8aca41vfss9kcqkrdhos25be87nlu3b9
+++ b/docs @ 2lcu9e49ia08icjonmt3l0s7ph2cdb5s
+-----+----+----+
|     | pk | c1 |
+-----+----+----+
|  <  | 1  | 0  |
|  >  | 1  | 1  |
+-----+----+----+

Data without primary key

Addition/Deletion

docs $ dolt sql -q "insert into no_pk values (0,0,0),(1,1,1),(2,2,2)"
Query OK, 3 rows affected
docs $ dolt diff
diff --dolt a/no_pk b/no_pk
--- a/no_pk @ df9bd3mf77t2gicphep87nvuobqjood7
+++ b/no_pk @ 7s8jhc9nlnouhai8kdtsssrm1hpegpf0
+-----+----+----+----+
|     | c1 | c2 | c3 |
+-----+----+----+----+
|  +  | 1  | 1  | 1  |
|  +  | 2  | 2  | 2  |
|  +  | 0  | 0  | 0  |
+-----+----+----+----+
docs $ dolt commit -am "Added data to no_pk table" 
commit mjbtf27jidi86jrm32lvop7mmlpgplbg
Author: Tim Sehn <tim@dolthub.com>
Date:   Mon Dec 06 13:38:19 -0800 2021

	Added data to no_pk table

docs $ dolt sql -q "delete from no_pk where c1=0"
Query OK, 1 row affected
docs $ dolt diff
diff --dolt a/no_pk b/no_pk
--- a/no_pk @ 7s8jhc9nlnouhai8kdtsssrm1hpegpf0
+++ b/no_pk @ s23c851fomfcjaiufm25mi9mlnhurh2c
+-----+----+----+----+
|     | c1 | c2 | c3 |
+-----+----+----+----+
|  -  | 0  | 0  | 0  |
+-----+----+----+----+

Update

docs $ dolt sql -q "update no_pk set c1=0 where c1=1"
Query OK, 1 row affected
Rows matched: 1  Changed: 1  Warnings: 0
docs $ dolt diff
diff --dolt a/no_pk b/no_pk
--- a/no_pk @ s23c851fomfcjaiufm25mi9mlnhurh2c
+++ b/no_pk @ 18k2q3pav9a2v8mkk26nhhhss4eda86k
+-----+----+----+----+
|     | c1 | c2 | c3 |
+-----+----+----+----+
|  -  | 1  | 1  | 1  |
|  +  | 0  | 1  | 1  |
+-----+----+----+----+

SQL

docs $ dolt sql -q "select * from dolt_diff_docs"
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
| to_c1 | to_pk | to_commit                        | to_commit_date                    | from_c1 | from_pk | from_commit                      | from_commit_date                  | diff_type |
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
| 0     | 0     | dmmkbaiq6g6mm0vruc07utpns47sjkv7 | 2021-12-06 21:31:54.041 +0000 UTC | NULL    | NULL    | v42og53ru3k3hak3decm23crp5p6kd2f | 2021-12-06 21:27:53.886 +0000 UTC | added     |
| 1     | 1     | 5emu36fgedeurr6qk5uq6mj96k5j53j9 | 2021-12-06 21:36:02.076 +0000 UTC | 0       | 1       | ne14m8g2trlunju5a2mu735kjioocmll | 2021-12-06 21:34:12.585 +0000 UTC | modified  |
| NULL  | NULL  | ne14m8g2trlunju5a2mu735kjioocmll | 2021-12-06 21:34:12.585 +0000 UTC | 0       | 0       | dmmkbaiq6g6mm0vruc07utpns47sjkv7 | 2021-12-06 21:31:54.041 +0000 UTC | removed   |
| 0     | 1     | ne14m8g2trlunju5a2mu735kjioocmll | 2021-12-06 21:34:12.585 +0000 UTC | NULL    | NULL    | dmmkbaiq6g6mm0vruc07utpns47sjkv7 | 2021-12-06 21:31:54.041 +0000 UTC | added     |
| 1     | 2     | ne14m8g2trlunju5a2mu735kjioocmll | 2021-12-06 21:34:12.585 +0000 UTC | NULL    | NULL    | dmmkbaiq6g6mm0vruc07utpns47sjkv7 | 2021-12-06 21:31:54.041 +0000 UTC | added     |
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
docs $ dolt sql -q "select * from dolt_diff_docs where from_pk=0"
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
| to_c1 | to_pk | to_commit                        | to_commit_date                    | from_c1 | from_pk | from_commit                      | from_commit_date                  | diff_type |
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
| NULL  | NULL  | ne14m8g2trlunju5a2mu735kjioocmll | 2021-12-06 21:34:12.585 +0000 UTC | 0       | 0       | dmmkbaiq6g6mm0vruc07utpns47sjkv7 | 2021-12-06 21:31:54.041 +0000 UTC | removed   |
+-------+-------+----------------------------------+-----------------------------------+---------+---------+----------------------------------+-----------------------------------+-----------+
a Prolly Tree
Dolt CLI
dolt_diff_<tablename> system table
dolt_diff_<tablename> system tables