DeepMind researchers have introduced a novel method where agents are endowed with prior knowledge in the form of abstractions that are derived from large vision language models which are pretrained on image captioning data.