‘iPhone is the Greatest Piece of Technology Humanity has Ever Made, Says OpenAI’s Sam Altman

Ahead of OpenAI’s most-anticipated partnership with Apple, chief Sam Altman recently lauded the Cupertino-based tech giant for its technology prowess, saying, “iPhone is the greatest piece of technology humanity has ever made”, and it’s tough to get beyond it as “the bar is quite high”.

This is not some new-found love for the company; Altman has always been an Apple fanboy.

Recently, OpenAI hired Jony Ive, the renowned designer of the iPhone, to discuss new AI hardware. “We’ve been discussing ideas,” said Atlman, in a recent episode of All-In Podcast, touching upon the possibility of running LLMs on smartphones and if it is going to be affordable when that happens.

“Almost everyone’s willing to pay for a phone anyway,” added Altman, saying that cheaper is not the answer. “Even if a cheaper device could be made, I think the barrier to carry or use a second device is pretty high,” he added, hinting at how smartphones would not be obsolete anytime soon.

This is contrary to Yann LeCun’s, Meta chief AI scientist, opinion that smartphones will become obsolete in the next 10-15 years and that people will use augmented reality glasses and bracelets to interact with intelligent assistants.

But, Altman disagrees.

“There are a bunch of societal and interpersonal issues that are all very complicated about wearing a computer on your face,” said Altman, citing concerns about Meta’s smart glasses.

The change in Altman’s opinion comes as Apple nears its deal with OpenAI to integrate ChatGPT into iOS 18 as part of its strategy to enhance AI capabilities across its devices.

Also, OpenAI is planning to make some big announcements today where the company is likely to announce an AI voice assistant, alongside unveiling GPT-4 Lite, GPT-4-Auto, and GPT-4-Auto Lite series models. The new model would be capable of conversing with people using both sound and text, while also being able to recognise objects and images.

Many are speculating if this is going to be OpenAI’s ‘Her’ moment. Altman’s comments in the recent podcast sort of resonate with this development, as he said voice is the clue to what the next big thing might be.

“If you can perfect voice interaction, it feels like a whole new way of using a computer,” quipped Altman.

“What I want is just this always-on, super-low-friction thing where I can either, by voice or by text, ideally just kind of know what I want,” said Altman, adding that this AI assistant would help him throughout the day with as much context as possible.

Altman also said that OpenAI is currently developing an AI assistant designed to function like a senior AI employee. Users would be able to delegate tasks to this assistant, including managing emails.

OpenAI recently introduced a Voice Engine model which can generate natural-sounding speech from text input and a mere 15-second audio sample. The Voice Engine project began in late 2022 and initially focused on powering preset voices within OpenAI’s text-to-speech API, ChatGPT Voice, and Read Aloud features.

Meanwhile, decoding the tech behind it, Jim Fan, a senior scientist at NVIDIA said that all voice AI goes through three stages:

1. Speech recognition or “ASR”: audio -> text1, think Whisper;

2. LLM that plans what to say next: text1 -> text2;

3. Speech synthesis or “TTS ”: text2 -> audio, think ElevenLabs or VALL-E.

LLMs with Voice Matters

OpenAI is not alone. Earlier this year, Hume AI released Empathic Voice Interface, or EVI, which can engage in conversations just like humans, understanding and expressing emotions based on the user’s tone of voice. It can interpret nuanced vocal modulations and generate empathetic responses, leading to many calling it the next ‘ChatGPT moment’.

“We believe voice interfaces will soon be the default way we interact with AI. Speech is four times faster than typing; frees up the eyes and hands; and carries more information in its tune, rhythm, and timbre,” said Alan Cowen, founder of Hume AI.

The company’s EVI API marks the debut of the first emotionally intelligent voice AI API. It is now available, offering the ability to receive live audio input and provide both generated audio and transcripts enriched with indicators of vocal expression.

What is India’s OpenAI up to?

Indian AI startup Sarvam AI is also planning to release Indic Voice LLM in the next four to six months. “We believe that in India, people will experience generative AI through the medium of voice,” said Vivek Raghavan, cofounder, Sarvam AI in an exclusive interview with AIM.

He added that it is difficult to input text in Indian languages and that in India, people tend to prefer voice communication over text.

The company is also working on building agentic systems, allowing users to not only receive information but also take action. “I hope in the next few months we’ll see some of these things being announced and released in the marketplace,” said Raghavan.

Sarvam AI will support 10 languages and hopes to expand in the future. The company’s focus on voice-based interfaces has numerous practical applications in the country, such as in customer support and gathering feedback, where voice-based models can efficiently handle large-scale feedback listening.

The post ‘iPhone is the Greatest Piece of Technology Humanity has Ever Made, Says OpenAI’s Sam Altman appeared first on Analytics India Magazine.