Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Datasheets for Datasets

    Also known as:
    Data Cards
    Dataset Documentation
    Data Nutrition Labels
    Updated: 2/11/2026

    Standardized documentation for ML datasets describing provenance, composition, collection methods, recommended use, and known limitations.

    Quick Summary

    Datasheets for Datasets standardize ML dataset documentation – like nutrition labels for data, essential for bias audits and compliance.

    Explanation

    Inspired by datasheets in the electronics industry. Contains: Motivation, composition, collection process, preprocessing, usage recommendations, distribution, maintenance. Google calls them "Data Cards," Hugging Face integrates them as Dataset Cards.

    Marketing Relevance

    Foundation for responsible AI: Without dataset documentation, bias audits, reproducibility, and compliance are impossible.

    Common Pitfalls

    Datasheets often incomplete or outdated. No binding standard. Effort is underestimated. Datasheets exist but are not read.

    Origin & History

    Gebru et al. proposed Datasheets for Datasets in 2018. Google introduced Data Cards, Hugging Face standardized Dataset Cards. The EU AI Act requires comparable documentation for high-risk training data.

    Comparisons & Differences

    Datasheets for Datasets vs. Model Cards

    Model Cards document the model (architecture, performance, bias); Datasheets document the dataset (provenance, composition, limitations).

    Datasheets for Datasets vs. Data Governance

    Data Governance is the process; Datasheets are the documentation artifact within that process.

    Related Services

    Related Terms

    👋Questions? Chat with us!