Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (Klassenungleichgewicht)

    Class Imbalance

    Also known as:
    Class Imbalance
    Imbalanced Data
    Skewed Classes
    Label Imbalance
    Updated: 2/10/2026

    Situation where one class in the training dataset occurs significantly more frequently than others.

    Quick Summary

    Class imbalance occurs when one class heavily dominates the dataset – standard models then ignore rare classes. SMOTE, weighting, and F1 over accuracy help.

    Explanation

    Models tend to predict the majority class and ignore minority classes. Countermeasures: resampling, weighting, SMOTE.

    Marketing Relevance

    Class imbalance is the norm in real datasets – fraud detection, disease diagnosis, churn prediction often have <1% positive cases.

    Common Pitfalls

    Accuracy as metric with imbalance is misleading. Oversampling before train/test split causes data leakage.

    Origin & History

    The problem was formalized in the 2000s by Japkowicz & Stephen. SMOTE (2002) was a milestone. Modern approaches include Focal Loss (2017) and cost-sensitive methods.

    Comparisons & Differences

    Class Imbalance vs. Data Augmentation

    Data augmentation expands all classes evenly through transformations. Class imbalance techniques specifically target the minority class.

    Class Imbalance vs. Cost-Sensitive Learning

    Resampling changes data distribution. Cost-sensitive learning modifies the loss function to penalize errors on the minority class more.

    Related Services

    Related Terms

    👋Questions? Chat with us!