Certified Defense
Defense methods against adversarial attacks that provide mathematically provable robustness guarantees.
Certified defenses provide mathematically provable guarantees that a model is robust against attacks within a defined perturbation radius.
Explanation
Certified defenses use randomized smoothing, abstract interpretation, or convex relaxation to prove that no perturbation within an ε-radius can change the prediction.
Marketing Relevance
For safety-critical AI applications (fraud detection, content moderation), certified defenses provide formal security guarantees.
Example
An image classifier proves that no ℓ₂ perturbation with ε<0.5 can change the result from "safe" to "unsafe".
Common Pitfalls
Certified defenses are compute-intensive and scale poorly to large models. Guarantees only apply to specific perturbation types.
Origin & History
Cohen et al. (2019) established randomized smoothing as a scalable certified defense. Wong & Kolter (2018) showed convex relaxation-based approaches. The field has expanded to LLM safety by 2025.
Comparisons & Differences
Certified Defense vs. Adversarial Training
Adversarial training provides empirical robustness (can be broken); certified defenses provide formal, mathematical guarantees.