A Family of Open, Compute-efficient, Large Language Models
Cerebras open-sourced seven GPT-3 models from 111 million to 13 billion parameters. Trained using the Chinchilla formula, these models set new benchmarks for accuracy and compute efficiency.
Artificial intelligence has the potential to transform the world economy, but its access is increasingly gated. The latest large language model – OpenAI’s GPT4 – was released with no information on its model architecture, training data, training hardware, or hyperparameters. Companies are increasingly building large models using closed datasets and offering model outputs only via API access.
For LLMs to be an open and accessible technology, we believe it’s important to have access to state-of-the-art models that are open, reproducible, and royalty free for both research and commercial applications. To that end, Cerebras trained a family of transformer models using the latest techniques and open datasets called Cerebras-GPT. These models are the first family of GPT models trained using the Chinchilla formula and released via the Apache 2.0 license.
Cerebras trains the GPT-3 architecture using the optimal compute schedule implied by Chinchilla, and the optimal scaling indicated by μ-parameterization. This outperforms existing GPT-3 clones by a wide margin and represents the first confirmed use of μ-parameterization “in the wild”. These models are trained from scratch, meaning the community no longer depends on LLaMA.