Voice AI: when is the "Her" moment? — Neil Zeghidour, Gradium AI

Voice AI: when is the "Her" moment? — Neil Zeghidour, Gradium AI https://video.ut0pia.org/videos/watch/16399f27-d5b3-47ba-9bee-24b542b7dcd4 The "Her" moment has been promised so many times it's become a joke. Every new demo, every smooth-sounding voice agent gets called it. Neil Zeghidour, CEO of Gradium AI and one of the researchers behind Moshi — the first full-duplex voice model — uses this talk to be honest about where the gap actually is and why it keeps not closing. The core tension: cascaded systems (speech-to-text, LLM, text-to-speech) are practical and getting smarter, but they're architecturally incapable of feeling like a real conversation. Latency from tool calls alone can be 500ms to 4 seconds — while humans process and respond in around 200ms total. Speech-to-speech models solve some of that but trade it for a different problem: they're still half-duplex, meaning they're either listening or talking but never both, which makes backchanneling impossible and the interaction feel robotic in a different way. Moshi showed that full-duplex is solvable. What it didn't solve was making the model useful. And cost is a wall hiding behind the latency problem — TTS at scale is expensive enough that some teams burn through their fundraising before they can grow a user base. The most underrated thread in the talk is paralinguistic understanding: voice carries tone, hesitation, discomfort, and cultural signals that get entirely stripped out the moment you transcribe to text. Getting to Her means building models that don't just produce natural-sounding speech but actually understand what the voice is carrying — and that's a science problem, not a prompt engineering one. Speaker info: https://x.com/neilzegh, https://www.linkedin.com/in/neil-zeghidour-a838aaa7/ Sun, 10 May 2026 16:35:21 GMT https://validator.w3.org/feed/docs/rss2.html PeerTube - https://video.ut0pia.org Voice AI: when is the "Her" moment? — Neil Zeghidour, Gradium AI https://video.ut0pia.org/lazy-static/avatars/0287a09a-aae7-4840-9843-b416426e7046.webp https://video.ut0pia.org/videos/watch/16399f27-d5b3-47ba-9bee-24b542b7dcd4 All rights reserved, unless otherwise specified in the terms specified at https://video.ut0pia.org/about and potential licenses granted by each content's rightholder.