Model Extraction Attack
An attack where an adversary creates a functionally equivalent copy of an ML model through systematic API queries.
Model extraction attacks copy ML models through systematic API queries – a growing IP risk for AI-as-a-Service.
Explanation
The attacker sends crafted inputs to the API and uses outputs to train a surrogate model. Decision-based and score-based attacks exist. Countermeasures: rate limiting, output perturbation, watermarking.
Marketing Relevance
For API-based AI products (chatbots, classifiers), model extraction is an IP risk – competitors can copy models cost-effectively.
Example
A competitor uses 100,000 API calls to your sentiment classifier to train a local model with 95% agreement – without their own training data.
Common Pitfalls
Complete protection is impossible with public APIs. Rate limiting alone isn't enough. Watermarking can be removed through fine-tuning.
Origin & History
Tramèr et al. (2016) demonstrated model extraction against BigML and Amazon ML. Orekondy et al. (2019) demonstrated Knockoff Nets. Krishna et al. (2020) extracted BERT models. The topic grows with LLM APIs.
Comparisons & Differences
Model Extraction Attack vs. Membership Inference
Membership inference checks if data was in training; model extraction clones the entire model.