Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Preference Data

    Also known as:
    Human Preferences
    Preference Pairs
    Comparison Data
    Ranked Responses
    Updated: 2/10/2026

    Datasets where humans (or AI judges) indicate which of two model responses is better – the training material for RLHF, DPO, and similar alignment methods.

    Quick Summary

    Preference Data = "response A is better than B" – the training material for RLHF and DPO. Data quality directly determines the alignment quality of the model.

    Explanation

    Preference data consists of triplets: (prompt, chosen response, rejected response). Quality and diversity of data determine alignment quality.

    Marketing Relevance

    Without high-quality preference data, no good alignment. Data quality determines whether a model becomes more helpful, safer, or just "smoother".

    Common Pitfalls

    Inter-annotator disagreement. Annotator bias. Preference hacking (model learns length instead of quality). Expensive to create.

    Origin & History

    InstructGPT (2022) used ~40k preference comparisons. Anthropic HH-RLHF became the open standard dataset. Open-source alternatives like UltraFeedback and Nectar followed in 2023.

    Comparisons & Differences

    Preference Data vs. SFT Data (Instruction Data)

    SFT data shows good responses; Preference data shows which response is better – relative comparison instead of absolute quality.

    Preference Data vs. RLAIF Data

    Human preference data is expensive but authentic; RLAIF generates preferences automatically via AI judge – scalable but with bias risk.

    Related Services

    Related Terms

    👋Questions? Chat with us!