Meet ALBERT: a new ‘Lite BERT’ from Google & Toyota with State of the Art NLP performance and 18x fewer parameters.
TL;DR = your previous NLP models are parameter inefficient and kind of obsolete. Have a great day.
[*Updated November 6 with Albert 2.0 and official source code release]
Google Research and Toyota Technological Institute jointly released a new paper that introduces the world to what is arguably BERT’s successor, a much smaller/smarter Lite Bert called ALBERT. (“ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”).
ALBERT’s results are of themselves impressive in terms of final results (setting new state of the art for GLUE, RACE, SQuAD) but …the real surprise is the dramatic reduction in model/parameter size.
A combination of two key architecture changes and a training change allow ALBERT to both outperform, and dramatically reduce the model size. Consider the size comparison below — BERT x-large has 1.27 Billion parameters, vs ALBERT x-large with 59 Million parameters!
There’s a lot to unpack in this paper, and I’ll attempt to delve into all the highlights…