CEO Mark Zuckerberg announced Project CAIRaoke, a fully end-to-end neural model for building on-device assistants, at Meta’s first virtual event since the rebranding. Alborz Geramifard, a senior research manager at Meta AI, expanded on how the company is taking conversational AI to a different level with Project CAIRaoke.

“At Meta AI, we’re working on a system that could be personalised, embedded, embodied, and interact with you in a contextual multimodal fashion. That way your interactions are as frictionless as possible,” he said. 

In the future, the assistant may even follow the person in the metaverse, but currently, the focus is on voice-only interactions.

“It can see what you see from your first-person perspective, hear what you hear and, most importantly, understand the context of the situations you are in,” he said. 

Despite advancements in natural language understanding, the supercharged assistants have yet to become a reality.

The team has combined the modules from natural language understanding to the natural language generation in a single model. For building the contextual aspect, Meta is relying on the AI model called BART. 

Meta said it was using the model in its video-calling Portal device. “We plan to augment the project CAIRaoke model to handle multilingual and then multimodal input and outputs as we hope its single model architecture allows for a scooter upgrade process,” said Geramifard.

Meta has plans to integrate it in augmented and virtual reality devices to enable even more immersive and multimodal interactions.