We can think of MetaFormer as the transformer/MLP-like model where the token mixer module is not defined and replaces the token mixer with attention or spatial MLP