Archives for Data parallelism

15 May

Automating model parallelism with just one line of code

Avi Gopani Data parallelism

The difference between these two approaches maps naturally to the heterogeneity of a typical compute cluster.

The post Automating model parallelism with just one line of code appeared first on Analytics India Magazine.

25 Apr

Data parallelism vs. model parallelism – How do they differ in distributed training?

Poulomi Chatterjee Data parallelism

Model parallelism seemed more apt for DNN models as a bigger number of GPUs was added.

14 Apr

The interesting strategy behind training Google’s PaLM

Shraddha Goled Data parallelism

PaLM is not only trained with the much-publicised Pathway system from Google (introduced last year), but it also avoids using pipeline parallelism, a strategy used traditionally for large language models.

29 Jun

Top Distributed Training Frameworks In 2021

Shraddha Goled BigDL

In distributed training, the workload is shared between mini processors called the worker nodes. The nodes run in parallel to speed up the model training. Traditionally, distributed training has been used for machine learning models. But of late, it’s making inroads into compute-intensive tasks such as deep learning to train deep neural networks. Below, we…

The post Top Distributed Training Frameworks In 2021 appeared first on Analytics India Magazine.

22 Apr

Behind NVIDIA’s Megatron

Shraddha Goled Data parallelism

The team performed training iterations on models with a trillion parameters at 502 petaFLOP/s on 3072 GPUs by combining three techniques.

The post Behind NVIDIA’s Megatron appeared first on Analytics India Magazine.