Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Technology

    DVC (Data Version Control)

    Updated: 2/10/2026

    Open-source tool for data and model versioning that extends Git workflows to ML artifacts.

    Quick Summary

    DVC extends Git with data and model versioning for ML projects – with pipeline tracking, experiment comparisons, and cloud storage integration.

    Explanation

    DVC versions large files (datasets, models) separately from Git, manages ML pipelines as DAGs, and supports experiment comparisons. Storage backends include S3, GCS, and Azure.

    Marketing Relevance

    DVC is the leading tool for Git-based ML data and experiment versioning.

    Common Pitfalls

    Storage costs for large datasets. Learning curve for Git-inexperienced data scientists. Remote storage must be configured.

    Origin & History

    Iterative.ai released DVC in 2017 as "Git for Data." CML (Continuous Machine Learning) was released in 2020 as a CI/CD companion. DVC Studio followed as a web UI. Today DVC has over 13,000 GitHub stars.

    Comparisons & Differences

    DVC (Data Version Control) vs. Git LFS

    Git LFS stores large files in Git; DVC additionally offers ML pipelines, experiment tracking, and flexible storage backends.

    DVC (Data Version Control) vs. MLflow

    DVC focuses on data versioning with Git workflow; MLflow on experiment tracking and model registry.

    Related Services

    Related Terms

    👋Questions? Chat with us!