GSPMD separates programming an ML model from parallelization and is capable of scaling most deep learning network architectures