AI Safety
The research field focused on making AI systems safe, controllable, and aligned with human values.
AI Safety researches how AI stays safe, controllable, and value-aligned. Covers alignment, robustness, interpretability, and control – becomes more critical as AI capability increases.
Explanation
AI Safety encompasses: Alignment (models do what we want), robustness (behave correctly under stress), interpretability (understand what models do), control (can stop models). Becomes more important as AI capability increases.
Marketing Relevance
Marketing AI must be safe: No discriminatory outputs, no brand-damaging hallucinations, no manipulation. Safety features become selling point.
Example
OpenAI invests 20% of resources in safety research: Red-teaming, RLHF for value alignment, monitoring for dangerous use.
Common Pitfalls
Safety vs. capability trade-off. Overcensoring reduces usefulness. Safety theater without real protection. Race to bottom in competition.
Origin & History
Nick Bostrom's "Superintelligence" (2014) made AI Safety mainstream. OpenAI was founded in 2015 with safety mission. Anthropic (2021) and DeepMind have dedicated safety teams.
Comparisons & Differences
AI Safety vs. AI Ethics
AI Ethics asks "what is right/wrong?"; AI Safety asks "how do we prevent technical harm?" – philosophy vs. engineering.
AI Safety vs. Cybersecurity
Cybersecurity protects systems from external attackers; AI Safety protects from the AI system itself (misbehavior, misalignment).