Pretrained language model with 100B parameters created by Yandex
About Yandex YaLM
YaLM 100B is a GPT-like neural network for generating and processing text. It can be used freely by developers and researchers from all over the world.
The model leverages 100 billion parameters. It took 65 days to train the model on a cluster of 800 A100 graphics cards and 1.7 TB of online texts, books, and countless other sources in both English and Russian.
Training details and best practices on acceleration and stabilizations can be found on Medium (English) and Habr (Russian) articles.
They used DeepSpeed to train the model and drew inspiration from Megatron-LM example. However, the code in this repo is not the same code that was used to train the model. Rather it is stock example from DeepSpeed repo with minimal changes needed to infer the model.