Miscellaneous

Misc features

Component

Supported

Notes and limitations

Information schema

✅

Views

✅

Window functions

🟠

Some functions not supported, see window function docs

Common table expressions (CTEs)

✅

Stored procedures

🟠

Only a few statements are not yet supported, see compound statements

Cursors

✅

Triggers

✅

Client Compatibility

Some MySQL features are client features, not server features. Dolt ships with a client (ie. dolt sql) and a server (dolt sql-server). The Dolt client is not as sophisticated as the mysql client. To access these features you can use the mysql client that ships with MySQL.

Feature

Supported

Notes and limitations

SOURCE

❌

Works with Dolt via the mysql client

LOAD DATA LOCAL INFILE

❌

LOAD DATA INFILE works with the Dolt client. The LOCAL option only works with Dolt via the mysql client

Join hints

Dolt supports the following join hints:

name

supported

detail

JOIN_ORDER(,...)

✅

Join tree in scope should use the following join execution order. Must include all table names.

LOOKUP_JOIN(,)

✅

Use LOOKUP strategy joining two tables.

MERGE_JOIN(,)

✅

Use MERGE strategy joining two tables.

HASH_JOIN(,)

✅

Use HASH strategy joining two tables.

INNER_JOIN(,)

✅

Use INNER strategy joining two tables.

SEMI_JOIN(,)

✅

Use SEMI strategy joining two tables (for EXISTS or IN queries).

ANTI_JOIN(,)

✅

Use ANTI strategy joining two tables (for NOT EXISTS or NOT IN queries).

JOIN_FIXED_ORDER

❌

Join tree uses in-place table order for execution.

NO_ICP

❌

Disable indexed range scans on index using filters.

Join hints are indicated immediately after a SELECT token in a special comment format /*+ */. Multiple hints should be separated by spaces:

SELECT /*+ JOIN_ORDER(arg1,arg2) */ 1
SELECT /*+ JOIN_ORDER(arg1,arg2) NO_ICP */ 1

Join hints currently require a full set of valid hints for all to be applied. For example, if we have a three table join we can enforce JOIN_ORDER on its own, join strategies on their own, or both order and strategy:

SELECT /*+ JOIN_ORDER(xy,uv,ab) LOOKUP_JOIN(xy,uv) HASH_JOIN(uv,ab) */ 1
FROM xy
JOIN uv on x = u
JOIN ab on a = u;

Additional notes:

If one hint is invalid given the execution options, no hints are applied and the engine falls back to default costing.
Join operator hints are order-insensitive
Join operator hints apply as long as the indicated tables are subsets of the join left/right.

Table Statistics

ANALYZE table

Dolt currently supports table statistics for index and join costing.

Statistics are auto-collected by default for servers, but can be manually collected by running ANALYZE TABLE <table, ...>.

Here is an example of how to initialize and observe statistics:

CREATE TABLE xy (x int primary key, y int);
INSERT INTO xy values (1,1), (2,2);
ANALYZE TABLE xy;
SELECT * from information_schema.tables;
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SCHEMA_NAME | TABLE_NAME | COLUMN_NAME | HISTOGRAM                                                                                                                                                                                                                                                                                                                                                      |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tmp4        | xy         | x           | {"statistic": {"avg_size": 0, "buckets": [{"bound_count": 1, "distinct_count": 2, "mcv_counts": [1,1], "mcvs": [[1],[2]], "null_count": 0, "row_count": 2, "upper_bound": [2]}], "columns": ["x"], "created_at": "2023-11-14T11:33:32.250178-08:00", "distinct_count": 2, "null_count": 2, "qualifier": "tmp4.xy.PRIMARY", "row_count": 2, "types:": ["int"]}} |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

create table horses (id int primary key, name varchar(10), key(name));
insert into horses select x, 'Steve' from (with recursive inputs(x) as (select 1 union select x+1 from inputs where x < 1000) select * from inputs) dt;
analyze table horses;
select `index`, `position`, row_count, distinct_count, columns, upper_bound, upper_bound_cnt, mcv1 from dolt_statistics;
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| index   | position | row_count | distinct_count | columns  | upper_bound | upper_bound_cnt | mcv1      |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| primary | 0        | 344       | 344            | ["id"]   | [344]       | 1               | [344]     |
| primary | 1        | 125       | 125            | ["id"]   | [469]       | 1               | [469]     |
| primary | 2        | 249       | 249            | ["id"]   | [718]       | 1               | [718]     |
| primary | 3        | 112       | 112            | ["id"]   | [830]       | 1               | [830]     |
| primary | 4        | 170       | 170            | ["id"]   | [1000]      | 1               | [1000]    |
| name    | 5        | 260       | 1              | ["name"] | ["Steve"]   | 260             | ["Steve"] |
| name    | 6        | 237       | 1              | ["name"] | ["Steve"]   | 237             | ["Steve"] |
| name    | 7        | 137       | 1              | ["name"] | ["Steve"]   | 137             | ["Steve"] |
| name    | 8        | 188       | 1              | ["name"] | ["Steve"]   | 188             | ["Steve"] |
| name    | 9        | 178       | 1              | ["name"] | ["Steve"]   | 178             | ["Steve"] |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+

Disable

Some workloads, like batch imports, perform strictly better without the overhead of statistics collection. In these cases, we can explicitly stop or purge (stop + delete) statistics on a running server:

call dolt_stats_stop();
call dolt_stats_purge();

A stopped-stats server can be restarted, or have a single collection cycle performed by an operator:

call dolt_stats_restart();
call dolt_stats_once();

An environment variable can disable statistics on server reboots:

— on version 1.51.0 or higher
SET @@PERSIST.dolt_stats_enabled = 0;

— up to 1.50.x
SET @@PERSIST.dolt_stats_auto_refresh_enabled = 0;

A rebooted server with stats turned off has no reversal mechanism at the moment. All stats operations are no-ops if a server starts with the above variables set.

Auto-Refresh

Statistics automatically update for servers by default. Stats are stored in a database in .dolt/stats separate from user data. This folder can safely be deleted offline.

Stats throughput can be lowered by raising the the dolt_stats_job_interval variable, which indicates the milliseconds of delay between processing steps. The higher the delay and more branches in a database, the longer it will take for statistic updates to materialize. High delays reduce the fraction of runtime resources diverted to managing background statistics.

Stats can be disabled with the dolt_stats_enabled=0 variable.

Stats persistence can be disabled with the dolt_stats_memory_only=1 variable.

Stats Garbage Collection

The stats in-memory cache accumulates new histograms proportionally to the write rate and stats update rate. Periodically, an update cycle will swap the currently active histogram buckets to a new in-memory map and clear the old set.

Stats garbage collection can be disabled with the dolt_stats_gc_enabled=0 variable.

Garbage collection frequency can be tuned with the dolt_stats_gc_interval variable (default 1 hour).

Stats Controller Functions

Dolt exposes a set of helper procedures for managing statistics collection and use:

dolt_stats_stop: clear queue and disable thread
dolt_stats_restart: clear queue, refresh queue, start thread
dolt_stats_purge: clear queue, refresh queue, clear cache disable thread
dolt_stats_once: collect statistics once, ex: in sql-shell
dolt_stats_wait: block on a full queue cycle
dolt_stats_gc: block waiting for a GC signal
dolt_stats_flush: block waiting for a flush signal
dolt_stats_info: print the current state of the stats provider (optional '-short' flag)

PreviousSystem Variables NextSupported Clients

Last updated 5 days ago

Was this helpful?