OpenAI GPT-4 Omni model can interpret audio, video, and text in real time

Published , by Ozzie Mejia

OpenAI has issued an update for its ChatGPT bot. The GPT-4o update promises greater ease of use for all users, as well as increased speed across the board.

"GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs," reads the OpenAI website. "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models."

OpenAI technology chief Mira Murati spoke during a livestream on Monday about the latest ChatGPT additions. She demonstrated some of its capabilities, including some new translation features. With the latest update, ChatGPT can now operate across 50 different languages.

As noted by CNBC, Murati made sure to thank NVIDIA CEO Jensen Huang for helping power OpenAI's technology. NVIDIA has a significant amount of money invested in the AI sector, which has helped power that company to better-than-expected earnings.