Megatron NLG

The Largest and Most Powerful Monolithic Transformer Language NLP Model Triple the Size of OpenAI’s GPT-3

About Megatron NLG

Microsoft and NVIDIA present the Megatron-Turing Natural Language Generation model (MT-NLG), powered by DeepSpeed and Megatron, the largest and robust monolithic transformer language model trained with 530 billion parameters. MT-NLG is the successor to Turing NLG 17B and Megatron-LM. The scale of this model is three times that of the largest of its kind. It can do natural language tasks with high accuracy, including prediction, reading comprehension, common sense reasoning, natural language reasoning, and word meaning disambiguation.

The model is trained on the Selene supercomputer, built on NvidiaDGX SuperPOD, and includes mixed-precision training. There are 560 DGX A100 servers on the supercomputer. HDR InfiniBand with full-fat tree extension is used to connect these servers. Each DGX A100 includes eight A100s, each with an 80GB Tensor Core GPU connected via NVLink and NVSwitch.

Source: https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/