Google, the tech giant behind the Gemini family of AI models, has introduced conversational image segmentation capabilities in Gemini 2.5. 

Conversational image segmentation is a technique that enables an AI model to identify and outline (segment) specific parts of an image based on natural language queries, rather than simple labels or predefined categories. 

“Rather than just identifying ‘a car’, what if we could identify ‘the car that is farthest away?’” Google stated, citing an example. “Today, Gemini’s advanced visual understanding brings a new level of conversational image segmentation. Gemini now understands what you’re asking it to see.”

Gemini can identify objects based on their complex relationships with other objects in their vicinity. This includes relational understanding, ordering, comparative attributes, conditional logic, abstract concepts, image text, and more. 

For instance, one can ask the AI model to identify a person holding the umbrella, a book that stands third from left in the stack, a flower that is most wilted in a bouquet or a shadow cast by a particular building, among other such queries. 

Source: Google

The company also stated several other examples in the blog post.

An interesting example that Google highlighted was the model’s ability to assist in workplace safety. Gemini 2.5 can now identify employees on a factory floor who are not wearing the correct safety gear. 

Source: Google

“Move beyond rigid, predefined classes. The natural language approach gives you the flexibility to build solutions for the ‘long tail’ of visual queries that are specific to your industry and users,” Google said. 

Users can learn how this works via the Spatial Understanding demo on Google AI Studio. These capabilities are also available via the Gemini API for developers. 

The post Google’s Gemini 2.5 Can Now Identify Very Specific Parts of an Image appeared first on Analytics India Magazine.