Archives for billion parameters

20 Mar

Complete Guide to DeLighT: Deep and Light-weight Transformer

Rajkumar Lakshmanamoorthy attention mechanism

DeLighT is a deep and light-weight transformer that distributes parameters efficiently among transformer blocks and layers

The post Complete Guide to DeLighT: Deep and Light-weight Transformer appeared first on Analytics India Magazine.

10 Jul

Training Models With Over 100 Billion Parameters

Sejuti Das 100 billion parameters

Following the announcement of open source release of DeepSpeed library and Zero Redundancy Optimiser (ZeRO), Microsoft, in mid of this year, announced its upgrade, in order to train large neural networks, with ZeRO-2. Training large scale models often comes with several challenges, such as hardware limitations and tradeoffs with computation and efficiency. Thus, to overcome…

The post Training Models With Over 100 Billion Parameters appeared first on Analytics India Magazine.