New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both.

Less Wright
9 min readAug 20, 2019

The Ranger optimizer combines two very new developments (RAdam + Lookahead) into a single optimizer for deep learning. As proof of it’s efficacy, our team used the Ranger optimizer in recently capturing 12 leaderboard records on the FastAI global leaderboards (details here).

Lookahead, one half of the Ranger optimizer, was introduced in a new paper in part by the famed deep learning researcher Geoffrey Hinton (“LookAhead optimizer: k steps forward, 1 step back” July 2019). Lookahead was inspired by the recent advances in the understanding of neural network loss surfaces and presents a whole new way of stabilizing deep learning training and speed of convergence. Building on the breakthrough in variance management for deep learning achieved by RAdam (Rectified Adam), I find that combining RAdam plus LookAhead together (Ranger) produces a dynamic dream team and an even better optimizer than RAdam alone.

The Ranger optimizer is a single codebase for ease of use and efficiency (load/save and one loop handling for all param updates), integration into FastAI, and — Ranger source code is available for your immediate use. https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

Vanilla Adam, SGD and Look Ahead + Adam/SGD compared for an LSTM (from the LookAhead paper).

Why RAdam and LookAhead are complementary:

RAdam arguably provides the best base for an optimizer to build on at the start of training. RAdam leverages a dynamic rectifier to adjust the adaptive momentum of Adam based on the variance and effectively provides an automated warm-up, custom tailored to the current dataset to ensure a solid start to training.

LookAhead was inspired by recent advances in the understanding of loss surfaces of deep neural networks, and provides a breakthrough in robust and stable exploration during the entirety of training.

To quote the LookAhead team — LookAhead “lessens the need for extensive hyperparameter tuning” while achieving “faster convergence across different deep learning tasks with minimal computational overhead”.

Hence, both provide breakthroughs in different aspects of deep learning optimization, and the combination is highly synergistic, possibly providing the best of both improvements for…

--

--

Less Wright

PyTorch, Deep Learning, Object detection, Stock Index investing and long term compounding.