Archives for compute optimal large models

05 Apr

How to train compute optimal large language models?

New research from DeepMind attempts to investigate the optimal model size and the number of tokens for training a transformer language model under a given compute budget.