OpenAI Launches ChatGPT Omni: AI with Audio, Video & Text

OpenAI’s spring showcase has left no one indifferent. This time, a new Artificial Intelligence model has been introduced, named ChatGPT Omni, capable of integrating text, audio, and video within a single AI model. Thus, the camera and microphone of our devices become new senses, seamlessly fused in an astonishing manner, as demonstrated during the live presentation just hours ago.

Video 1. Presentation of ChatGPT-4o Omni

In comparison with Google Gemini, the responses provided appear to be delivered in real time, with reaction times comparable to those of a human. It can articulate, relate, and process our conversations, requests, images, and video inputs as they are provided. It is astonishing to observe how it adapts its intonation to the mood of the conversation, pausing its speech when interrupted and resuming precisely where the interlocutor left off. It also appears capable of interpreting statistical graphs and source code, and commenting on them vocally. This remarkable integration will be available for paid versions in the coming weeks.

Open Access to ChatGPT-4

On the other hand, OpenAI has decided to release the ChatGPT-4 version, which in itself constitutes a revolution. If what could be achieved with ChatGPT-3.5 already seemed incredible, where might we reach with version 4? However, like other business/enterprise models (in the search engine and social media sectors), open access and usage do not imply that it is free. Let us not forget that all data and communications from interactions could be stored on remote servers and used to characterize, profile, and ultimately generate a digital twin of users. Despite any advancement, the issue of privacy will always persist—an aspect addressed in the most recent ConocimIA seminar.

Video 2. ConocimIA Conference. AI in Your Hands

We Expected More...

Although the advancements presented are almost unqualified, it is true that more was expected. Specifically, rumors circulated about the launch of a general-purpose search engine integrated with ChatGPT, which hypothetically would compete directly with Google. This did not materialize, and for now, this possibility has not been unveiled. However, one can speculate that the most advanced version of ChatGPT, potentially ChatGPT-5—the long-awaited AGI or Artificial General Intelligence—may arrive before the end of the year. Certainly, the integration of sensory capabilities in Omni is moving toward achieving this goal.

References

OpenAI. (2024). ChatGPT-4 Omni. https://openai.com/index/hello-gpt-4o
Gizmodo. OpenAI presents the voice capabilities of GPT-4 Omni, and they are literally incredible. https://es.gizmodo.com/openai-presenta-las-capacidades-de-voz-de-gpt-4-omni-y-1851473546
El Confidencial. How the new version of ChatGPT that sees and hears everything around it works. https://www.elconfidencial.com/tecnologia/2024-05-14/nueva-version-chatgpt-ve-y-oye-todo_3883244
ADSLZone. ChatGPT-4o arrives, the ultra-enhanced version of ChatGPT for images, video, audio, and text. https://www.adslzone.net/noticias/ia/lanzamiento-gpt-4o