Meet Mish — New State of the Art AI Activation Function. The successor to ReLU?

Less Wright
6 min readAug 27, 2019

A new paper by Diganta Misra titled “Mish: A Self Regularized Non-Monotonic Neural Activation Function” introduces the AI world to a new deep learning activation function that shows improvements over both Swish (+.494%) and ReLU (+ 1.671%) on final accuracy.

Our small FastAI team used Mish in place of ReLU as part of our efforts to beat the previous accuracy scores on the FastAI global leaderboard. Combining Ranger optimizer, Mish activation, Flat + Cosine anneal and a self attention layer, we were able to capture 12 new leaderboard records!

6 of our 12 leaderboard records. Every record was set using Mish instead of ReLU. (Blue highlight is showing that the 400 epoch accuracy of 94.6 is just a tad higher than our 20 epoch accuracy of 93.8 :)

As part of our own testing, for 5 epoch testing on the ImageWoof dataset, we can say that:

Mish beats ReLU at a high significance level (P < 0.0001). (FastAI forums, @ Seb)

Mish has been tested on over 70 benchmarks, ranging from Image Classification, Segmentation and Generation and compared against 15 other activation functions.

From the paper- a comparison of the output landscape from ReLU and Mish. The smooth gradients from Mish is a likely driver of it’s outperformance.

--

--

Less Wright

PyTorch, Deep Learning, Object detection, Stock Index investing and long term compounding.