Archives for Tensor Programs research

09 Mar

Microsoft releases µTransfer, a new technique for hypertuning large neural networks

The total compute used to tune GPT-3 turned out to be a mere 7 per cent of the compute used to pretrain the model.