New research from DeepMind attempts to investigate the optimal model size and the number of tokens for training a transformer language model under a given compute budget.