SMOTE (Synthetic Minority Over-sampling Technique)
Algorithm that generates synthetic examples for the minority class by interpolating between existing data points.
SMOTE generates synthetic data points for underrepresented classes by interpolating between neighbors – the standard solution for class imbalance.
Explanation
SMOTE selects a data point, finds its k nearest neighbors of the same class, and generates new points along the connecting line.
Marketing Relevance
SMOTE is the most widely used technique against class imbalance and available by default in many ML libraries.
Common Pitfalls
Applying SMOTE before train/test split causes leakage. Works poorly with high-dimensional or overlapping classes.
Origin & History
Introduced in 2002 by Chawla, Bowyer, Hall & Kegelmeyer. Variants like Borderline-SMOTE, ADASYN, and SMOTE-ENN have since emerged.
Comparisons & Differences
SMOTE (Synthetic Minority Over-sampling Technique) vs. Random Oversampling
Random oversampling duplicates existing points exactly; SMOTE creates new synthetic points, avoiding exact duplicates.
SMOTE (Synthetic Minority Over-sampling Technique) vs. ADASYN
SMOTE samples uniformly; ADASYN focuses on hard-to-classify regions and generates more synthetic points there.