scripod.com

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

In this conversation, Ashvin Nair traces his evolution from robotics to the forefront of language model development, offering a candid look at the shifting landscape of AI research and deployment. He reflects on pivotal moments in his career and the broader industry, revealing how expectations, methodologies, and organizational structures have adapted—or failed to adapt—to rapid technological change.
Ashvin Nair discusses the limitations of early reinforcement learning and the pivot toward language models, where market potential far exceeds that of robotics. He recounts his time at OpenAI before ChatGPT's breakout, noting how progress in code generation outpaced expectations while benchmarks for AGI remain elusive. Organizational instability, especially around governance and leadership, has influenced technical direction more than pure capability. Despite setbacks, RL saw a resurgence in 2023 due to promising reasoning capabilities in smaller models, fueling rapid iteration across labs. At Cursor, tight integration between product and model development enables efficient, continuous learning from real user interactions. The company prioritizes practical improvements in data and reward systems over theoretical debates, focusing on tangible performance gains in coding tasks.
00:00
00:00
Robotics people are more grounded than other AI researchers.
14:27
14:27
OpenAI has abandoned the 'one model fits all' approach as of this year.
17:08
17:08
The current reasoning paradigm in AI may stem from organizational misalignment rather than technical limits.
29:32
29:32
Human-level intelligence might be reached around 2030.
39:23
39:23
Learning from a small number of deployment tokens doesn't overload a model trained on trillions.
42:10
42:10
Why is Off-policy RL unstable? is suggested as a strong interview question.