Multimodal AI, especially the sub-field of visual question answering (VQA), has made a lot of progress in recent years. Multimodal systems, with access to both sensory and linguistic modes of intelligence, process information the way humans do. What is multimodal interaction?  As human beings, we experience the world as multimodal: we can feel texture, hear […]