Mish Activation Function
Mish = x · tanh(softplus(x)) – a smooth, self-regularizing activation function used in YOLOv4 and some CNNs.
Mish = x · tanh(softplus(x)) – a smooth activation that beat ReLU in YOLOv4, but too computationally expensive for LLMs.
Explanation
Mish combines softplus (log(1 + eˣ)) with tanh for an unbounded upper, bounded lower, smooth, and non-monotonic function. Empirically often better than ReLU and Swish in CNNs, but more computationally expensive.
Marketing Relevance
Popular in the computer vision community, especially through adoption in YOLOv4/v5.
Origin & History
Diganta Misra (2019) introduced Mish. YOLOv4 (Bochkovskiy et al., 2020) adopted Mish as the default activation. In the LLM world, however, SiLU/SwiGLU prevailed.
Comparisons & Differences
Mish Activation Function vs. SiLU/Swish
Swish = x·sigmoid(x); Mish = x·tanh(softplus(x)). Mish is smoother and slightly more expensive; results are often comparable.