OpenAI GPT-4 Omni model can interpret audio, video, and text in real time

The latest iteration of ChatGPT promises to be the most advanced one yet.

OpenAI
46

OpenAI has issued an update for its ChatGPT bot. The GPT-4o update promises greater ease of use for all users, as well as increased speed across the board.

"GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs," reads the OpenAI website. "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models."

OpenAI technology chief Mira Murati spoke during a livestream on Monday about the latest ChatGPT additions. She demonstrated some of its capabilities, including some new translation features. With the latest update, ChatGPT can now operate across 50 different languages.

As noted by CNBC, Murati made sure to thank NVIDIA CEO Jensen Huang for helping power OpenAI's technology. NVIDIA has a significant amount of money invested in the AI sector, which has helped power that company to better-than-expected earnings.

Senior Editor

Ozzie has been playing video games since picking up his first NES controller at age 5. He has been into games ever since, only briefly stepping away during his college years. But he was pulled back in after spending years in QA circles for both THQ and Activision, mostly spending time helping to push forward the Guitar Hero series at its peak. Ozzie has become a big fan of platformers, puzzle games, shooters, and RPGs, just to name a few genres, but he’s also a huge sucker for anything with a good, compelling narrative behind it. Because what are video games if you can't enjoy a good story with a fresh Cherry Coke?

Filed Under
From The Chatty
  • reply
    May 13, 2024 11:35 AM

    Ozzie Mejia posted a new article, OpenAI GPT-4 Omni model can interpret audio, video, and text in real time

    • reply
      May 13, 2024 10:33 AM

      https://openai.com/index/spring-update/

      No OpenAI thread?

      "Any sufficiently advanced technology is indistinguishable from magic"

      Yep, this ChatGPT demo.


      https://x.com/BenBajarin/status/1790070846473523390

      OpenAI announces Her.

      • reply
        May 13, 2024 10:38 AM

        What am I missing here…?

      • reply
        May 13, 2024 10:47 AM

        is there a demo or anything? OP is just same text as this post.

        • reply
          May 13, 2024 10:58 AM

          About 9m30s in is where it starts to get... Wild.

        • reply
          May 13, 2024 11:12 AM

          this is both amazing and also annoying as fuck when it pretends to be human by sighing when responding to rapidly changing instructions, or claiming "i got too excited" when it did something wrong and is then corrected

          the future of people falling in love with this is shit is closer than ever

          • reply
            May 13, 2024 11:16 AM

            yeah that's definitely going to happen

            and if that's the worst thing to emerge from this i'll consider us very fortunate

            • reply
              May 13, 2024 11:17 AM

              oh no, global thermonuclear war is the worst thing that will emerge from this, the condescending dialog model that talks to you like you're in kindergarten is just background noise

              • reply
                May 14, 2024 5:48 AM

                Pfffftt. You WISH a nice clean fission blast and relatively quick radiation death was the worst thing.

            • reply
              May 13, 2024 12:33 PM

              Honestly, I'm very hopeful for that future. I can see a few situations where this is useful:

              - People who didn't have positive socialization skills growing up or never learned how to connect safely.
              - Poly/Mono couples.
              - Recovery from an abusive relationship.
              - People who just don't want to have a relationship with another human.

              Like all things outside assistance will be needed, but this bridges the gap between "I need someone to confide in" and "I can't face a therapist / cohort directly."

          • reply
            May 13, 2024 11:17 AM

            Yeah I already disliked seeing how chatgpt would add all kinds of bullshit fluff around it’s apologies and mistakes but not actually, say, change approaches

            (I’d get it in a loop where it would give two wrong answers and apologize and give the other one, and repeat infinitely)

            • Zek legacy 10 years legacy 20 years
              reply
              May 13, 2024 11:21 AM

              You can coach it on how you want it to respond, there's a text field in the user settings for that purpose. So if you prefer it to be more concise and clinical you can do that.

          • reply
            May 13, 2024 11:51 AM

            you can obviously just ask it to have a completely flat affect and speak in a robot voice if you want an autistic computer assistant but they wanted to show off how much more massively human it's capable of sounding now which is useful for all kinds of stuff

            • reply
              May 13, 2024 12:18 PM

              Uh. That's... Not a great usage of the word autistic there, bud.

            • reply
              May 13, 2024 12:24 PM

              Yeah I get that it’s great for demos but it’s actively awful for real-world use. It doesn’t need to be monotone but it also doesn’t need to use 20 words when 3 will do. Time is money, and that shit is annoying.

        • reply
          May 13, 2024 11:22 AM

          That was a FAKE DEMO. The dialog was overlapping and she got ahead of the announcer!!!!!!!

          • Zek legacy 10 years legacy 20 years
            reply
            May 13, 2024 11:29 AM

            They keep interrupting it because it'll keep talking a while if you let it, but it's clearly not faked. It makes quite a few mistakes.

          • reply
            May 13, 2024 11:37 AM

            Do you want me to open an official Shacknews investigation??? The moderation team is standing by!!

          • reply
            May 13, 2024 11:37 AM

            faking AI demos is more of Google’s thing

            • reply
              May 13, 2024 11:40 AM

              yeah and Google I/O is not until tomorrow

          • reply
            May 13, 2024 12:58 PM

            The demo was performed in Fallout 4

        • reply
          May 13, 2024 11:45 AM

          I like that they always have to interrupt it because it just babbles on and on and on

          • reply
            May 13, 2024 11:50 AM

            just like people

          • reply
            May 13, 2024 11:55 AM

            Sounds like MY WIFE. kidding not really

          • Zek legacy 10 years legacy 20 years
            reply
            May 13, 2024 12:29 PM

            Yeah I think that might just be something we have to learn to get used to in talking to AI. It can't really do all the subtleties that humans use to indicate when they want to speak.

      • reply
        May 13, 2024 11:18 AM

        Yep, we're not far from Her. This is so wild and awesome.

        • reply
          May 13, 2024 11:37 AM

          I'm unclear whether this is going to hasten the erosion of communication between people online, or help the loneliness problem with an always on friend that can do instant research & feedback for you.

          This thing is a few generations away from ushering us in to the post-people era where ya'll are obsolete and my assistant loves me more than my parents did.

          • reply
            May 13, 2024 11:48 AM

            Oh I definitely agree, I just try to take a neutral approach to it

            • reply
              May 13, 2024 11:59 AM

              I'm concerned because we saw the floodgates open with social media with barely any consideration towards the mental well-being of it users.

              It sounds like the ol' move fast & break things is back on the menu which bothers me.

              Her is totally a great story about giving up on people and leaning in to a digital companion.

              • reply
                May 13, 2024 12:11 PM

                Oh no, I have serious concerns, I've just decided there's not much I can personally do about it so fuck it

          • reply
            May 13, 2024 11:51 AM

            I think cellphonea have done or are doing that more than chat gpt. My kids, to my parents, all just pull out their phones whenever at a family gathering to browse shit. There’s no social norms anymore it’s fine to just completely zone out of wherever you are as a person. When it becomes just glasses or eye contacts people will just drop out even more

        • reply
          May 13, 2024 11:38 AM

          I'm looking forward to the point when the tech is mature enough to be a full-on digital assistant. The Voice conversations with ChatGPT are already remarkably helpful for me when it comes to brainstorming and organization, but if I could tie it into my calendar, all my documents, my home automation stuff, etc., that'd be awesome.

          • reply
            May 13, 2024 12:15 PM

            I think the concern is people wanting to have a parasocial relationship with it which will be the exact opposite of helpful. That is undoubtedly where this thing is heading.

        • reply
          May 13, 2024 12:12 PM

          Awesome? Do you remember how that movie ended?

      • reply
        May 13, 2024 11:33 AM

        Man. That’s good stuff.

      • reply
        May 13, 2024 11:38 AM

        ...her?

        • reply
          May 13, 2024 11:47 AM

          Probably a reference to the movie with the same name - about a female AI that the main character fell in love with. Great movie.

        • reply
          May 13, 2024 11:51 AM

          I understood that reference. I would say that often when people would mention the movie, and it would drive them crazy.

          • reply
            May 13, 2024 11:52 AM

            It is a reference as Ann as the nose on Plain's face

          • reply
            May 13, 2024 12:39 PM

            Yeah but correct me if I'm wrong but doesn't joaquin choose physical connection over digital because the AI runs off with the other AI or something?

            I mean I thought it was an allegory about how real life connection is superior because you can understand nuance and emotion better from a real person rather than a disembodied voice.

        • reply
          May 13, 2024 11:57 AM

          check out whos on the hog in the rearview mirror

        • reply
          May 13, 2024 12:10 PM

          is she funny or something?

        • reply
          May 13, 2024 5:06 PM

          They can call it Cortana instead.

      • reply
        May 13, 2024 12:07 PM

        There is a little bit of delay in responses that you don't see in this demo, but it's totally manageable. I do enjoy it. Would like to have it running on some little piece of hardware in my living room.

        • reply
          May 13, 2024 12:19 PM

          Looks like there is a mac app, so that problem might be solved.

        • Zek legacy 10 years legacy 20 years
          reply
          May 13, 2024 12:34 PM

          I don't think the new model is fully launched yet, response latency is one of its big improvements.

          • reply
            May 13, 2024 12:36 PM

            ChatGPT Plus users can use it right now, it's deployed

            • reply
              May 13, 2024 12:43 PM

              On the app or just on the website? Because when I updated the app earlier I didn't see the new model available.

              • reply
                May 13, 2024 12:50 PM

                it's available for me both places

              • reply
                May 13, 2024 12:50 PM

                Hmm, so now the new model is available but it doesn't have any of the functionality they were showing off. I'm guessing that will come with an app update.

                • reply
                  May 13, 2024 12:56 PM

                  "We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks."

            • reply
              May 13, 2024 1:01 PM

              Is it worth the $20 to try it out? I'm thinking about it after seeing the demo.

              • reply
                May 13, 2024 1:04 PM

                the pricing page says the free tier will have limited access to GPT-4o. if you've only used GPT 3.5 you'll be blown away because 3.5 is from a million years ago (in AI years). nobody should base their opinion about ChatGPT on 3.5

                • reply
                  May 13, 2024 1:11 PM

                  So would I get access to what I saw in the demo in terms of real time voice responses from that $20 subscription?

                  It's only $20 but I don't want to throw them money unless I can play around with their toys.

                  • reply
                    May 13, 2024 1:12 PM

                    right now nobody has that (see yakz's post below)

                    • reply
                      May 13, 2024 1:18 PM

                      well you need to get on the horn and tell them to release it while i still have $20 left to spend, electroly

          • reply
            May 13, 2024 12:51 PM

            It’s there but timing out for me at the moment.

      • reply
        May 13, 2024 12:11 PM

        i gave it a try and i still don't like using the voice mode even with GPT-4o. i don't mind me speaking but it takes too long for ChatGPT to get to the point in the responses. i'd still rather skim a text response

        • reply
          May 13, 2024 12:32 PM

          That’s the impression I was getting from watching this

        • reply
          May 13, 2024 12:35 PM

          Feel like 1/3 of my prompts to ChatGPT are "please be more concise/less verbose"

          • reply
            May 13, 2024 12:39 PM

            you can just put that in your custom instructions if you find yourself including it in the prompts. then it'll always apply. it's under the settings menu (click your name in the corner)

          • reply
            May 13, 2024 4:53 PM

            Damn I just watched an episode of TNG where Picard said this verbatim to Data.

            (S1 “Justice,” truly an awful episode.)

        • reply
          May 13, 2024 12:36 PM

          You can coax it into the type of responses you need via prompts and memory. Tell it you're rarely interested in verbose responses and prefer to the point and will ask when you need more assistance.

        • reply
          May 13, 2024 12:54 PM

          they haven't actually released the new voice mode, they're afraid to.

          "In the future, improvements will allow for more natural, real-time voice conversation and the ability to converse with ChatGPT via real-time video. For example, you could show ChatGPT a live sports game and ask it to explain the rules to you. We plan to launch a new Voice Mode with these new capabilities in an alpha in the coming weeks, with early access for Plus users as we roll out more broadly"

          "Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities."

          • reply
            May 13, 2024 12:55 PM

            gotcha, i'll give it another shot when the new voice mode comes out

          • reply
            May 13, 2024 1:26 PM

            Thanks for the explanation because I was about to blow $20 to see if that demo was real or not. I'm cautiously optimistic about the future of this.

      • reply
        May 13, 2024 12:18 PM

        That demo was creepy as fuck. Burn it with fire.

        • reply
          May 13, 2024 1:12 PM

          “All it’s doing” is vocalizing what it can already do with text.

          • reply
            May 13, 2024 1:19 PM

            It just feels like it's tipped over into uncanny valley territory for me. Listening to it seriously gave me super weird and uncomfortable feelings.

          • reply
            May 13, 2024 2:25 PM

            That’s not the case. This version is significantly more capable with text, but more importantly it’s natively multimodal. And the voice responses are massively faster and with more human affect and capabilities.

            • reply
              May 13, 2024 2:47 PM

              I get that, that’s why I put it quotes. I think what I said is true, conceptually. It shouldn’t freak anyone out or blow their mind if they’ve been using ChatGPT thus far.

      • reply
        May 13, 2024 1:20 PM

        maybe google will show something similar tomorrow, might be why this presentation was thrown together and released today but they haven't actually released the functionality... maybe they don't have a good way to stop legions of halfwits from falling in love with it, or using it to make zany tiktoks that will cause reputational harm (*starts camera* "hey chatgpt sing me a song about deez nuts")

        • reply
          May 13, 2024 2:04 PM

          God I hope so.

        • reply
          May 13, 2024 2:07 PM

          Trying to remember the last time Google was ahead of anyone on anything and not just being a complete clusterfuck

          • reply
            May 13, 2024 2:24 PM

            I can't remember

          • reply
            May 13, 2024 5:09 PM

            Gmail

          • reply
            May 13, 2024 5:14 PM

            2004. Search.

          • reply
            May 13, 2024 5:25 PM

            Maps. When it first came out we had Mapquest before that. Zooming in the way you could on Google maps was awesome.

          • reply
            May 13, 2024 6:48 PM

            Google Photos. Still the best out there, even after they fucked up the desktop client by removing 2-way sync.

          • reply
            May 13, 2024 6:51 PM

            You're thinking of their public products and I get where you're at with that.

            If you look at the tech ology built at Google, they have been instrumental in moving internet technology forward. For example, they are the ones who published the paper that lead to all this gen ai craziness

          • reply
            May 14, 2024 5:12 AM

            Gmail notifier was next level

        • Zek legacy 10 years legacy 20 years
          reply
          May 13, 2024 3:11 PM

          The timing does feel very intentional. Google has clearly been behind in the race up until now, but they do have some intrinsic advantages that I think are starting to come online now. Like the fact that they've already had mature infrastructure for in-house ML chips (TPUs) for many years.

        • reply
          May 13, 2024 3:14 PM

          releasing in a few weeks after an unveil isn’t some weird schedule and is in line with the usual rollout speed when they announce new stuff (excluding SoRa)

          • reply
            May 13, 2024 3:17 PM

            they might also be waiting til after wwdc if there actually is a major partnership coming down the pipe

            but regardless it’d be totally sane to release it slowly just to get a better handle on what bad actors will do with it, especially api access

            • reply
              May 13, 2024 3:20 PM

              It’s pretty rare for WWDC to have anything for broad release. Any potential iOS integration surely does not become available until the fall with the new iPhone launch.

              • reply
                May 13, 2024 4:03 PM

                i just meant something like apple might want to demo ‘smart siri’ or some other upcoming integration “powered by openai” without a million youtube chuds having posted 3 weeks of videos begging chatgpt to marry them

                • reply
                  May 13, 2024 4:48 PM

                  Adds more color to stories of frustration on the Apple Vision Pro team that Siri is so useless as a tool or interface for the device. Likewise can see why Zuck is so bearish on the Meta glasses given where they are on AI research

        • reply
          May 13, 2024 5:10 PM

          This will be bard tomorrow

          https://youtu.be/ksHw6ybWJHg?si=RaQGay9l8KOa8M9k

    • reply
      May 13, 2024 5:01 PM

      I had it read my palms.

    • reply
      June 1, 2024 12:32 AM

      ChatGPT är en fantastisk teknik
      Du kan använda den på: https://chatgptsv.se/

Hello, Meet Lola