Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence

    Expected Calibration Error (ECE)

    Also known as:
    ECE Metric
    Calibration Error
    Updated: 2/11/2026

    The standard metric for measuring classifier calibration quality – the weighted average of the difference between confidence and accuracy across bins.

    Quick Summary

    ECE measures how far model confidence values deviate from actual frequencies – the standard metric for calibration quality.

    Explanation

    ECE divides predictions into confidence bins (e.g., 0-10%, 10-20%, ...) and measures the difference between average confidence and actual accuracy for each bin. Perfect calibration = ECE of 0.

    Marketing Relevance

    ECE is the standard metric for model calibration in every ML deployment – from lead scoring to churn prediction.

    Example

    A model with ECE=0.15 is on average 15 percentage points off from actual frequency – it needs recalibration.

    Common Pitfalls

    ECE is sensitive to the number of bins. Adaptive ECE or KDE-based variants are more robust. ECE alone isn't enough – also check reliability diagrams.

    Origin & History

    Naeini et al. (2015) formalized ECE and proposed binning-based calibration. Guo et al. (2017) showed systematic miscalibration of modern DNNs using ECE. Adaptive variants followed from 2019.

    Comparisons & Differences

    Expected Calibration Error (ECE) vs. Brier Score

    Brier Score measures overall quality (accuracy + calibration combined); ECE measures calibration exclusively.

    Related Services

    Related Terms

    👋Questions? Chat with us!