ChatGPT voice mode still runs on GPT-4o while Codex gets the frontier models

Simon Willison highlights an underappreciated gap in OpenAI’s product surface: the voice mode most casual users interact with reports an April 2024 knowledge cutoff, marking it as a GPT-4o-era model rather than anything close to current frontier capability. Users naturally assume the conversational interface represents the smartest version of the AI, but in practice it is one of the weakest deployed tiers.

Karpathy’s quoted observation frames why the divergence keeps widening. Coding and similar domains offer verifiable reward signals — unit tests pass or fail — making them ideal targets for reinforcement learning. They also dominate B2B revenue, so engineering effort concentrates there. Open-ended tasks like voice conversation lack clean reward functions and pull less commercial weight, leaving them to stagnate on older base models.

The practical consequence is a public perception problem. Someone judging AI by Advanced Voice Mode in an Instagram reel and someone watching Codex restructure a codebase or surface vulnerabilities over an hour are evaluating fundamentally different systems sold under the same brand.