Meet AdaMod: a new deep learning optimizer with memory

AdaMod is a new deep learning optimizer that builds on Adam, but provides an automatic warmup heuristic and long term learning rate buffering. From initial testing, AdaMod is a top 5 optimizer and readily beats or exceeds vanilla Adam, while being much less sensitive to the learning rate hyperparameter, smoother training curve, and requires no warmup mode.

Image for post
Image for post
AdaMod converges to same point even using up to 2 order magnitude different learning rates vs SGDM and Adam end up with different results.
Image for post
Image for post
Blue line is Adam with no warmup…and poor results.
Image for post
Image for post
Very large learning rates with no warmup vs much lower and controlled steps with warmup.
Image for post
Image for post
AdaMod outperforming Adam with warmup
Image for post
Image for post
DenseNet training. AdaMod outperforms Adam but SGDM converges (much later) at better accuracy.
Image for post
Image for post
core algorithm for AdaMod (note line 9, which is the long term average, and 10, which clips if needed).
Image for post
Image for post
Results can vary based on Beta3 selection.
Image for post
Image for post

PyTorch, Deep Learning, Object detection, Stock Index investing and long term compounding.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store