On March 14, 2023 OpenAI released GPT 4 among much fanfare exhibiting its multimodal features. Months have passed since that and there seems to be no buzz or interest around it anymore. 

It was said that GPT-4 is capable of generating text and accepting both image and text inputs, making it an improvement over its predecessor, GPT-3.5, which only accepted text input. Currently, ChatGPT Plus is not multimodal.

Surprisingly, OpenAI recently filed for the GPT 5 trademark with the United States Patent and Trademark Office (USPTO). Trademark attorney Josh Gerben took to Twitter on July 31 to reveal that this action by the company hints at the possibility of them working on a fresh iteration of their language model.

Prior to proceeding with GPT-5, OpenAI is yet to deliver on its promises concerning GPT-4. Users were expecting easy interaction with a chatbot using images, but this multimodal functionality hasn’t been fully realized. On the internet, conversations have been buzzing with questions about the status of GPT-4 multimodal functionality. 

During the GPT 4 demo livestream, several impressive capabilities of the model were showcased. It was able to interpret a funny image and accurately describe what made the image humorous. Additionally, Greg Brockman, President and Co-Founder of OpenAI demonstrated how he could effortlessly create a website by simply inputting a photo of an idea from his notebook, with GPT 4 providing the necessary assistance.

However, he specifically mentioned these features will take time but now the wait has been too long. Right now only Bing Search based on GPT 4 lets you make searches using images but it needs refinement and is not up to the mark with its responses. So what exactly  is holding back OpenAI to explore multimodal features and come up with its own product. 

 Multimodal features aren’t available in the API

While introducing GPT 4, OpenAI  said  that they are introducing  GPT-4’s text input capability through ChatGPT and the API, and are working on making the image input capability more widely available by collaborating closely with ‘be my eyes’.  As of now this collaboration is in  closed beta and is being tested for feedback among a small subset of our users. No official update has been released yet on the same. 

Also as of now, the Multimodal features of GPT 4 are not accessible in the APIs. OpenAI’s blog post mentions  that users can currently only make text-only requests to the gpt-4 model, and the capability to input images is still in a limited alpha stage. 

However, OpenAI assures users that they will automatically update to the recommended stable model as new versions are released over time. This indicates that more advanced features and capabilities may become available to users as the model continues to evolve and improve.

OpenAI recently introduced Code Interpreter in ChatGPT Plus. Many termed it as GPT 4.5 moment but interestingly it was just old-school OCR from Python libraries and didn’t use multimodal for image generation. 

GPU Scarcity 

Due to a shortage of GPUs, OpenAI is facing challenges in allowing users to process more data through their large language models like ChatGPT. This shortage has also affected their plans to introduce new features and services as per their original schedule. 

A month back, Sam Altman acknowledged this concern and explained that most of the issue was a result of GPU shortages, according to a blog post by Raja Habib, CEO and Cofounder at Human Loop, which was later taken down on OpenAI’s request. The blog specifically mentioned that multimodality which was demoed as part of the GPT-4 release can’t be extended to everyone until more GPUs come online.

GPT-4 was probably trained using around 10,000 to 25,000  Nvidia’s A100s . For GPT-5, Elon Musk suggested it might require 30,000 to 50,000 H100s . In February 2023, Morgan Stanley predicted GPT-5 would use 25,000 GPUs. With such an amount of GPU’s required and Nvidia the only reliable supplier in the market, it boils down to availability of GPUs. 

Focus on Dall E-3? 

Going by the developments, we can say that OpenAI is presently focussing on text to image generation. Recently, Youtuber MatVidPro shared details of OpenAI’s next project which is likely to be Dall E 3.  

OpenAI’s future plans for the alleged model’s public access and its official name remain uncertain. Currently, the unreleased model is in the testing phase, available to a select group of around 400 people worldwide on an invite-only basis, as per Matt’s information.

Conclusively, only time will tell whether OpenAI will better  GPT 4 or come up with GPT 5. While there’s a saying, “what is in a name,” we are hopeful that OpenAI will deliver the much-awaited multimodality feature to its users soon and in an improved and advanced manner.

The post What Happened to Multimodal GPT 4? appeared first on Analytics India Magazine.