Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Alignment Tax

    Also known as:
    Safety Tax
    Alignment Cost
    Safety-Performance Tradeoff
    Updated: 2/10/2026

    The performance loss caused by alignment and safety training – a model becomes safer but potentially less capable.

    Quick Summary

    Alignment Tax = performance loss from safety training. Safer models may be less creative or capable – a conscious trade-off that better alignment methods minimize.

    Explanation

    RLHF, content filters, and guardrails can limit a model's creativity and capability. "Alignment tax" describes this trade-off between safety and performance.

    Marketing Relevance

    Companies must consciously accept the alignment tax: How much capability do they sacrifice for safety? Too much alignment makes models too conservative.

    Common Pitfalls

    Using alignment tax as argument against safety. Hard to quantify. Changes with better alignment techniques.

    Origin & History

    The term emerged in the AI safety community around 2022. OpenAI and Anthropic showed that InstructGPT/Claude remain competitive despite RLHF. Newer methods like DPO and Constitutional AI reduce the alignment tax.

    Comparisons & Differences

    Alignment Tax vs. Alignment

    Alignment is the goal (model does what's intended); Alignment Tax is the price for it (performance loss).

    Alignment Tax vs. Guardrails

    Guardrails block outputs after generation; Alignment Tax arises from training that changes the model itself.

    Related Services

    Related Terms

    👋Questions? Chat with us!