Preference Data
Datasets where humans (or AI judges) indicate which of two model responses is better – the training material for RLHF, DPO, and similar alignment methods.
Preference Data = "response A is better than B" – the training material for RLHF and DPO. Data quality directly determines the alignment quality of the model.
Explanation
Preference data consists of triplets: (prompt, chosen response, rejected response). Quality and diversity of data determine alignment quality.
Marketing Relevance
Without high-quality preference data, no good alignment. Data quality determines whether a model becomes more helpful, safer, or just "smoother".
Common Pitfalls
Inter-annotator disagreement. Annotator bias. Preference hacking (model learns length instead of quality). Expensive to create.
Origin & History
InstructGPT (2022) used ~40k preference comparisons. Anthropic HH-RLHF became the open standard dataset. Open-source alternatives like UltraFeedback and Nectar followed in 2023.
Comparisons & Differences
Preference Data vs. SFT Data (Instruction Data)
SFT data shows good responses; Preference data shows which response is better – relative comparison instead of absolute quality.
Preference Data vs. RLAIF Data
Human preference data is expensive but authentic; RLAIF generates preferences automatically via AI judge – scalable but with bias risk.