Hugging Face’s newly added Perceiver IO to Transformers works on all modalities like text, images, audio, etc.