Transform File Uploads
DoltHub and DoltLab >= v0.8.1
allow users to optionally transform files before they're imported into a database. To do this, users can provide a Transform Url at the time they're uploading a file.
If a Transform Url is provided, DoltHub/DoltLab will send an http POST
request containing the uploaded file contents. In the http response to this request, the service receiving the request can return new file contents that will be imported into the database in place of the original file contents.
Additionally, on the Database Settings page of DoltHub/DoltLab, users can provide Transform Url along with request headers, which, will automatically be used to transform all file uploads across a database.
Pull requests are automatically created against a database after successful file uploads and imports.
Receiving the POST request
The transform file server can expect a http POST
request containing the contents of an uploaded file along with the following minimum http headers:
content-type
: The type of file contents.content-length
: The size of the file.x-dolthub-owner
: The database owner name.x-dolthub-database
: The database name.x-dolthub-from-branch
: The branch created to import the data.x-dolthub-to-branch
: The target branch of the pull request.x-import-filename
: The name of the uploaded file.x-import-md5
: The md5 of the uploaded file.
If the uploaded file has a content-type
of text/csv
, text/psv
, application/json
or application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
, the following headers may also be included on the request:
x-import-table
: The name of the table the import affects.x-import-primary-keys
: The primary keys of the table.x-import-operation
: The type of import operation to perform. Possible values are:overwrite
: Used to create or overwrite a table with contents of file.update
: Used to update a table with contents of file using table's existing schema.replace
: Used to replace a table with contents of file using table's existing schema.
x-import-force
: Ifx-import-operation=overwrite
and data already exists in the destination table, a value oftrue
here allows the target table to be overwritten.
Sending the POST response
In the http response to this POST
, transform file servers should provide the transformed file contents in the response body. Additionally, depending on the type of import desired, the response should include some combination of the following http headers:
content-type
: Required. The type of file contents to be imported into the database. Supported values are:text/csv
application/sql
application/json
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
content-length
: Required. The size of the file.x-import-filename
: Required. The name of the file.x-import-md5
: Required. The md5 of the file.x-import-table
: Required ifcontent-type != application/sql
. The name of the table the import affects.x-import-operation
: Required ifcontent-type != application/sql
. The type of import operation to perform. Possible values are:overwrite
: Used to create or overwrite a table with contents of file.update
: Used to update a table with contents of file using table's existing schema.replace
: Used to replace a table with contents of file using table's existing schema.
x-import-primary-keys
: Optional. The primary keys of the table.x-import-force
: Optional. Ifx-import-operation=overwrite
and data already exists in the destination table, a value oftrue
here allows the target table to be overwritten.
Example Transform Server
We've created a simple example golang transform server to demonstrate how you can run one yourself.
This example server receives the POST
request and prints the request headers like so:
Running the example server its default mode will always respond to a transform request with static CSV content and attach the following headers to the http response:
content-type=text/csv
x-import-md5=<static csv md5>
x-import-filename=transformed.csv
x-import-table=csv_table
x-import-operation=overwrite
x-import-primary-keys=pk,col1
The pull request opened on DoltHub/DoltLab for this import will reflect this static CSV content, regardless of the contents of the file uploaded originally.
Create Multiple Imports from a Single Transform Request
DoltHub and DoltLab's automated file import process will create pull requests between new branches created during the database import process and the target branch chosen during file upload.
As a result, you can create N imports, and corresponding pull requests, from a single transform request by responding with SQL that creates a new branch for each import and pull request you want.
Let's look at how this works in the example transform server. When this server is run with the --sql
flag, it will always respond to transform requests with the following static sql content:
This content will run against a checkout of the DoltHub/DoltLab database's x-dolthub-to-branch
. For this example the x-dolthub-to-branch=main
.
When this SQL file runs in the database, first, it will create and checkout a new branch import-branch-1
, create table t1
, and insert rows into that table. Then, it will add and commit those changes.
Next, it checks out the x-dolthub-to-branch
branch again, before creating another new branch and adding a table and some rows to that branch. It also adds and commits these changes.
DoltHub and DoltLab's import process will detect these new branches and open pull request with these changes against the x-dolthub-to-branch
.
Finally, the above content shows that the x-dolthub-to-branch
is checked out once again and table t3
is created directly on this branch, but these changes are not added or committed in this SQL content.
Instead, the automated import process will detect any outstanding changes made to the x-dolthub-to-branch
during import and commit them to the x-dolthub-from-branch
. It will also create a pull request from the x-dolthub-from-branch
to the x-dolthub-to-branch
.
Last updated