Just as a large transformer model can be trained on language, similar models can be trained on pixel sequences to generate coherent image completions and samples.