EleutherAI launches GPT-NeoX-20B, the biggest public-access language model
The world of AI is buzzing as EleutherAI launches its latest large language model (LLM) GPT-NeoX-20B, consisting of 20 billion parameters. Built on a Coreweave GPU, the language model comes pre-trained with the GPT-Neox framework. In competition with heavy hitters like Microsoft-NVIDIA’s Megatron-Turing Natural Language Generation model (MT-NLG) that is trained with 530 billion parameters, OpenAI’s GPT-3 with 175 billion parameters and Google’s switch transformer technique to train over a trillion parameters, EleutherAI boasts of their GPT-NeoX-20B to be the largest language model available for public access and is capable of performing an array of tasks.
With the release of their 20 billion parameter model, EleutherAI aims to make models of such sizes accessible to everyone and aid them in their research towards the safe use of AI systems, encouraging anyone in this line of work to reach out to them. Now let us go into the details of GPT-NeoX-20B and its works.
GPT-NeoX-20B, the new kid in the block
The GPT-NeoX-20B is an autoregressive transformer decoder model that is designed along the lines of GPT-3. Given below is a table of the specifications of the model where Params is referred to the parameters. Non-embedding implies parameter count for scaling laws research.
Figure: A basic specification table for GPT-NeoX-20B by EleutherAI
In this architecture, we employ rotary embeddings, which are a form of static relative positional embedding. In short, they compress the embedding space so that the attention of a token ‘m’ to its position at ‘n’ is linearly dependent on m-n. They are formally utilised to modify standard multiheaded attention equations like:
Here xm, xn are embeddings of tokens at positions m and n, respectively and WqT, Wk are annotations for query and key weights, respectively, to
Here Rd?,x is a d x d block diagonal matrix for hyperparameters ?. The equations above are visually represented in the gradient below:
Figure: Pictorial representation of rotary embeddings from EleutherAI
Training – The model is trained on a custom codebase that is built on Megatron and Deepspeed to facilitate straightforward training of LLMs with tens of billions of parameters. The training is compiled on the official PyTorch v1.10.0 release binary package with CUDA 11.1.
The significance of GPT-NeoX-20B
The GPT-NeoX-20B ushers an era of explosive development in the future as EleutherAI has made it publicly accessible for free. One of the biggest challenges of its predecessors was its restricted access and high training costs. With GPT-NeoX-20B, EleutherAI was able to overcome such hurdles and provide the benefits of a balanced large language model to all.
Moreover, the Engine’s codebase offers a simple and robust configuration using YAML files that enables users to launch training runs across a variety of GPUs with a single line of bash script.
Built on the cluster of 96 state-of-the-art NVIDIA A100 Tensor core GPUs for distributed training, the GPT-NeoX-20B performs quite well in comparison to its counterparts that are available for public access.
Figure: Accuracy task table on standard language models by substack.com
The performance of GPT-NeoX-20B on standard accuracy tasks showcases its custom tokenisation through training on a curated dataset of 825 GB called the Pile.
The language model holds itself quite well when subjected to a test of factual knowledge on various subject groups.
Figure: Subject group comparison table by substack.com
The future of NLP models
The release of GPT-NeoX-20B marks the emergence of a new generation of language models that demonstrate what powerful AI models could look like. To understand the safety of such rapidly evolving models, EleutherAI strives to remove the conventional barriers and boost its development. Connor Leahy, the cofounder of EleutherAI, states: “From spam and astroturfing to chatbot addiction, there are clear harms that can manifest from the use of these models already today, and we expect the alignment of future models to be of critical importance. We think the acceleration of safety research is extremely important, and the benefits of having an open-source model of this size and quality available for that research outweigh the risks.”
EleutherAI is also planning to open up a channel called #20b on Discord for discussions on this model.