Skip to main content
    Skip to main contentSkip to navigationSkip to footer
    Artificial Intelligence
    (Mish)

    Mish Activation Function

    Also known as:
    Mish Activation
    Mish Function
    Updated: 2/12/2026

    Mish = x · tanh(softplus(x)) – a smooth, self-regularizing activation function used in YOLOv4 and some CNNs.

    Quick Summary

    Mish = x · tanh(softplus(x)) – a smooth activation that beat ReLU in YOLOv4, but too computationally expensive for LLMs.

    Explanation

    Mish combines softplus (log(1 + eˣ)) with tanh for an unbounded upper, bounded lower, smooth, and non-monotonic function. Empirically often better than ReLU and Swish in CNNs, but more computationally expensive.

    Marketing Relevance

    Popular in the computer vision community, especially through adoption in YOLOv4/v5.

    Origin & History

    Diganta Misra (2019) introduced Mish. YOLOv4 (Bochkovskiy et al., 2020) adopted Mish as the default activation. In the LLM world, however, SiLU/SwiGLU prevailed.

    Comparisons & Differences

    Mish Activation Function vs. SiLU/Swish

    Swish = x·sigmoid(x); Mish = x·tanh(softplus(x)). Mish is smoother and slightly more expensive; results are often comparable.

    Related Services

    Related Terms

    👋Questions? Chat with us!