PaLM showed a training efficiency of 57.8% hardware FLOPs utilisation - the highest that a large language model at this scale has reached.