Miscellaneous
Misc features
Information schema
✅
Views
✅
Window functions
🟠
Common table expressions (CTEs)
✅
Stored procedures
🟠
Cursors
✅
Triggers
✅
Client Compatibility
Some MySQL features are client features, not server features. Dolt ships with a client (ie. dolt sql
) and a server (dolt sql-server
). The Dolt client is not as sophisticated as the mysql
client. To access these features you can use the mysql
client that ships with MySQL.
SOURCE
❌
Works with Dolt via the mysql
client
LOAD DATA LOCAL INFILE
❌
LOAD DATA INFILE works with the Dolt client. The LOCAL option only works with Dolt via the mysql
client
Join hints
Dolt supports the following join hints:
JOIN_ORDER(,...)
✅
Join tree in scope should use the following join execution order. Must include all table names.
LOOKUP_JOIN(,)
✅
Use LOOKUP strategy joining two tables.
MERGE_JOIN(,)
✅
Use MERGE strategy joining two tables.
HASH_JOIN(,)
✅
Use HASH strategy joining two tables.
INNER_JOIN(,)
✅
Use INNER strategy joining two tables.
SEMI_JOIN(,)
✅
Use SEMI strategy joining two tables (for EXISTS
or IN
queries).
ANTI_JOIN(,)
✅
Use ANTI strategy joining two tables (for NOT EXISTS
or NOT IN
queries).
JOIN_FIXED_ORDER
❌
Join tree uses in-place table order for execution.
NO_ICP
❌
Disable indexed range scans on index using filters.
Join hints are indicated immediately after a SELECT
token in a special
comment format /*+ */
. Multiple hints should be separated by spaces:
SELECT /*+ JOIN_ORDER(arg1,arg2) */ 1
SELECT /*+ JOIN_ORDER(arg1,arg2) NO_ICP */ 1
Join hints currently require a full set of valid hints for all to be applied. For example, if we have a three table join we can enforce JOIN_ORDER on its own, join strategies on their own, or both order and strategy:
SELECT /*+ JOIN_ORDER(xy,uv,ab) LOOKUP_JOIN(xy,uv) HASH_JOIN(uv,ab) */ 1
FROM xy
JOIN uv on x = u
JOIN ab on a = u;
Additional notes:
If one hint is invalid given the execution options, no hints are applied and the engine falls back to default costing.
Join operator hints are order-insensitive
Join operator hints apply as long as the indicated tables are subsets of the join left/right.
Table Statistics
ANALYZE table
Dolt currently supports table statistics for index and join costing.
Statistics are auto-collected by default for servers, but cab be manually collected by running ANALYZE TABLE <table, ...>
.
Here is an example of how to initialize and observe statistics:
CREATE TABLE xy (x int primary key, y int);
INSERT INTO xy values (1,1), (2,2);
ANALYZE TABLE xy;
SELECT * from information_schema.tables;
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SCHEMA_NAME | TABLE_NAME | COLUMN_NAME | HISTOGRAM |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tmp4 | xy | x | {"statistic": {"avg_size": 0, "buckets": [{"bound_count": 1, "distinct_count": 2, "mcv_counts": [1,1], "mcvs": [[1],[2]], "null_count": 0, "row_count": 2, "upper_bound": [2]}], "columns": ["x"], "created_at": "2023-11-14T11:33:32.250178-08:00", "distinct_count": 2, "null_count": 2, "qualifier": "tmp4.xy.PRIMARY", "row_count": 2, "types:": ["int"]}} |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
create table horses (id int primary key, name varchar(10), key(name));
insert into horses select x, 'Steve' from (with recursive inputs(x) as (select 1 union select x+1 from inputs where x < 1000) select * from inputs) dt;
analyze table horses;
select `index`, `position`, row_count, distinct_count, columns, upper_bound, upper_bound_cnt, mcv1 from dolt_statistics;
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| index | position | row_count | distinct_count | columns | upper_bound | upper_bound_cnt | mcv1 |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| primary | 0 | 344 | 344 | ["id"] | [344] | 1 | [344] |
| primary | 1 | 125 | 125 | ["id"] | [469] | 1 | [469] |
| primary | 2 | 249 | 249 | ["id"] | [718] | 1 | [718] |
| primary | 3 | 112 | 112 | ["id"] | [830] | 1 | [830] |
| primary | 4 | 170 | 170 | ["id"] | [1000] | 1 | [1000] |
| name | 5 | 260 | 1 | ["name"] | ["Steve"] | 260 | ["Steve"] |
| name | 6 | 237 | 1 | ["name"] | ["Steve"] | 237 | ["Steve"] |
| name | 7 | 137 | 1 | ["name"] | ["Steve"] | 137 | ["Steve"] |
| name | 8 | 188 | 1 | ["name"] | ["Steve"] | 188 | ["Steve"] |
| name | 9 | 178 | 1 | ["name"] | ["Steve"] | 178 | ["Steve"] |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
Disable
Some workloads, like batch imports, perform strictly better without the overhead of statistics collection. In these cases, we can explicitly stop or purge (stop + delete) statistics on a running server:
call dolt_stats_stop();
call dolt_stats_purge();
A stopped-stats server can be restarted, or have a single collection cycle performed by an operator:
call dolt_stats_starts();
call dolt_stats_once();
An environment variable can disable statistics on server reboots:
— on version 1.51.0 or higher
SET @@PERSIST.dolt_stats_enabled = 0;
— up to 1.50.x
SET @@PERSIST.dolt_stats_auto_refresh_enabled = 0;
A rebooted server with stats turned off has no reversal mechanism at the moment. All stats operations are no-ops if a server starts with the above variables set.
Auto-Refresh
Statistics automatically update for servers by default. Stats are stored in a database in .dolt/stats
separate from user data. This folder can safely be deleted offline.
Stats throughput can be lowered by raising the the dolt_stats_job_interval
variable, which indicates the milliseconds of delay between processing steps. The higher the delay and more branches in a database, the longer it will take for statistic updates to materialize. High delays reduce the fraction of runtime resources diverted to managing background statistics.
Stats can be disabled with the dolt_stats_enabled=0
variable.
Stats persistence can be disabled with the dolt_stats_memory_only=1
variable.
Stats Garbage Collection
The stats in-memory cache accumulates new histograms proportionally to the write rate and stats update rate. Periodically, an update cycle will swap the currently active histogram buckets to a new in-memory map and clear the old set.
Stats garbage collection can be disabled with the dolt_stats_gc_enabled=0
variable.
Garbage collection frequency can be tuned with the dolt_stats_gc_interval
variable (default 1 hour).
Stats Controller Functions
Dolt exposes a set of helper procedures for managing statistics collection and use:
dolt_stats_stop
: clear queue and disable threaddolt_stats_restart
: clear queue, refresh queue, start threaddolt_stats_purge
: clear queue, refresh queue, clear cache disable threaddolt_stats_once
: collect statistics once, ex: in sql-shelldolt_stats_wait
: block on a full queue cycledolt_stats_gc
: block waiting for a GC signaldolt_stats_flush
: block waiting for a flush signaldolt_stats_info
: print the current state of the stats provider (optional'-short'
flag)
Last updated
Was this helpful?