Recently, researchers – Edward Hu, Greg Yang, Jianfeng Gao from Microsoft, introduced µ-Parametrization, which offers maximal feature learning even in infinite-width limit.