Alignment Tax
The performance loss caused by alignment and safety training – a model becomes safer but potentially less capable.
Alignment Tax = performance loss from safety training. Safer models may be less creative or capable – a conscious trade-off that better alignment methods minimize.
Explanation
RLHF, content filters, and guardrails can limit a model's creativity and capability. "Alignment tax" describes this trade-off between safety and performance.
Marketing Relevance
Companies must consciously accept the alignment tax: How much capability do they sacrifice for safety? Too much alignment makes models too conservative.
Common Pitfalls
Using alignment tax as argument against safety. Hard to quantify. Changes with better alignment techniques.
Origin & History
The term emerged in the AI safety community around 2022. OpenAI and Anthropic showed that InstructGPT/Claude remain competitive despite RLHF. Newer methods like DPO and Constitutional AI reduce the alignment tax.
Comparisons & Differences
Alignment Tax vs. Alignment
Alignment is the goal (model does what's intended); Alignment Tax is the price for it (performance loss).
Alignment Tax vs. Guardrails
Guardrails block outputs after generation; Alignment Tax arises from training that changes the model itself.