Machine Learning
In Machine Learning use cases, data is code. Andrej Karpathy call this Software 2.0. What if you treated data like code?
Using Dolt to manage and share your Machine Learning data amongst your data analysts, engineers, and scientists make collaboration easy. Dolt gives you human and machine readable diffs. Diffs are useful providing data oriented insights into your ML models. Why is this model performing better than this one? What in the data changed? Dolt provides data lineage as a first class entity in your Machine Learning pipelines. Dolt provides model reproducibility by storing each version of the data you use to train a model. Dolt is especially useful in Natural Language Processing (NLP) where the data is mostly text.
Dolt is fully MySQL compatible so it integrates seamlessly with Pandas Dataframes. Dolt integrates easily with other machine learning tools. We've partnered on integrations with Kedro, Flyte, and Metaflow.
Dolt is the database for machine learning.
Last modified 1mo ago
Export as PDF
Copy link
Edit on GitHub