The team performed training iterations on models with a trillion parameters at 502 petaFLOP/s on 3072 GPUs by combining three techniques.

The post Behind NVIDIA’s Megatron appeared first on Analytics India Magazine.