OpenAI Wins Again
Google has been talking about Gemini for a while now, and people are growing impatient. It appears that Google is all talk and no action. Meanwhile, OpenAI grabbed the opportunity and recently announced that it plans to integrate Dall-E 3 with ChatGPT Plus and ChatGPT Enterprise.
This is surely a game changer move by OpenAI as it would make GPT-4 the first functional multimodal out in the market which creates text and image both, similar to what Gemini aspires.
To make up for the absence of Gemini, Google recently added extensions to Bard along with the ability to upload images with Lens and get Search images in responses. It was Google’s attempt to make Bard multimodal. However, only time will tell if it will be able to withstand upcoming competition from Dall-E integrated ChatGPT Plus, scheduled to come in October.
That being said, OpenAI has the potential to impact not only Google Bard and Gemini but also put pressure on other text-to-image generation models out there like Midjourney and Stable Diffusion as Dall-E 3 has shown promise by creating images of higher quality.
Users Advantage
Integrating Dall-E 3 with ChatGPT Plus gives OpenAI an edge as compared to other image generation tools as it has the largest user base as compared to all other models out there in any segment.
At the moment, ChatGPT is one of the world’s popular websites, attracting a staggering 1.4 billion visits globally in August. Meanwhile, during the same month, Bard received 183.5 million visits. On the other hand, Midjourney has over 15 million active users and saw 21 million visits in August. Stable Diffusion has more than 10 million daily active users across all channels, according to Stability AI chief Emad Mostaque.
Also, viewing it from the perspective of users, Dall-E 3 on ChatGPT gives them freedom to generate text as well as image on a single platform. If the users are getting the benefit of text generation and image generation on the same platform, they would any day prefer ChatGPT.
If we look at the numbers, it is very evident that ChatGPT boasts of a huge user base who won’t shy away from using newer versions of ChatGPT plus at a price of $20 or a little more. MidJourney, however, has a huge price difference and sells monthly plans ranging from $10 to $120,
It can be said that OpenAI is paving the way for a unified multimodal model which is capable of handling a wide range of tasks. Additionally, there have been user complaints regarding the user interface of Midjourney, which is presently hosted on Discord.
Multimodal Market is Scattered
If we examine the currently available multimodal models, we find that they are quite scattered. There isn’t a single model that can perform all tasks. Alongside closed-source models, there are also various open-source models claiming to be multimodal. It is however still not clear which model deserves to claim that it is the real multimodal.
For example, Hugging Face recently introduced a multimodal model named IDEFICS. It has the ability to process both text and image inputs and generate descriptions for the images. Similarly, Bard also possesses the capability to accept image inputs.
Also, Meta recently launched SeamlessM4T, a foundational speech/text translation and transcription model with all-in-one system that performs multiple tasks such as speech-to-speech, speech-to-text, text-to-text translation, and speech recognition. OpenAI and Google have also developed their own speech-to-text models, namely Whisper and AudioPaLM-2, respectively.
If OpenAI adds text-to-speech and speech-to-text features as well to ChatGPT Plus, it could race ahead of other models, making it challenging for others to catch up. Meanwhile, OpenAI doesn’t seem to have any plans to stop here. According to recent reports, it is also planning to integrate GPT-Vision into GPT-4, indicating that it is here to stay.
The post OpenAI Wins Again appeared first on Analytics India Magazine.



