pipon PyPi. You can install it by running
pip install doltpy. See more detailed installation instructions here.
doltpy.climodule contains tools for creating, cloning, and publishing Dolt datasets. We will illustrate using a CSV to create a simple database, pushing it to DoltHub, and then cloning it, making a change, and pushing the change back to the remote. Here is the data we are going to work with:
DoltSQLContext. It has two concrete subclasses,
DoltSQLServerContexthandles standing up the Dolt SQL Server subprocess, and shutting it down. It uses a
ServerConfigto pass server parameters like username and password.
DoltSQLEngineContextassumes the server is running on some host that can be specified in the past
doltpy.sqlto create a Dolt database from a Pandas DataFrame, push it to DoltHub, clone it elsewhere and make modifications via SQL, and the commit and push the result back.
doltpy.etlmodule contains tooling for ETL workflows that write or transform data in Dolt. We will show a simple example of pulling from a free API for FX rates, and then transforming the data into moving averages. This code is taken almost verbatim from an ETL job that actually writes to this database.
pandas.DataFramewith the JSON flattened into a table structure:
doltpy.etlprovides code to do this:
get_df_table_writerwhich itself returns a function. The motivation for this is creating ETL workflows one often wants to import code before it's executed, thus we build a function that will perform the load, but delay executing it. If we wanted to execute it immediately we could just do:
DoltTableWriterfunctions and executes them sequentially. Define one as follows:
get_datafunction to read from the table
raw_data_writeris configured to write to. To ensure that
raw_data_writerexecutes first, we can define our loaders as follows: