Archives for µTransfer


Recently, researchers – Edward Hu, Greg Yang, Jianfeng Gao from Microsoft, introduced µ-Parametrization, which offers maximal feature learning even in infinite-width limit.


The total compute used to tune GPT-3 turned out to be a mere 7 per cent of the compute used to pretrain the model.