scripod.com

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

How I AI

Feb 11
How I AI

How I AI

Feb 11

Shownote

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some ...

Highlights

In this episode, Claire Vo puts the latest AI coding models—OpenAI’s GPT-5.3 Codex and Anthropic’s Claude Opus 4.6 (including the faster variant)—to the test on real engineering tasks she’s actively shipping.
00:04
OpenAI released GPT-5.3 Codex and Anthropic released Claude Opus 4.6 and Opus 4.6 Fast
02:14
The speaker selects redesigning the ChatPRD marketing site as a task to compare new models
06:11
Codex treats skills as first-class citizens with better presentation and recommended skills
09:07
GPT-5.3 Codex was used to redesign ChatPRD’s marketing site for enterprise appeal
10:40
GPT-5.3 Codex overfits to prompts and takes them too literally
15:06
The GPT model only redesigned the homepage and enterprise page instead of the whole site as requested
19:01
The new design matches the brand aesthetic, uses graphics, calls out numbers, and highlights reviews
20:56
Side-by-side comparisons alone are insufficient to gauge real-world model performance
21:42
Released MCP connectors on chat PRD, enabling access to GitHub, Linear, and Claude in product work
23:04
Used Claude Opus 4.6's plan mode with Cursor to build extensible front-end components
24:32
Codex replicates the experience of a principal software engineer
26:57
Opus 4.6 Fast enabled the team to ship 44 PRs efficiently
28:52
Opus 4.6 is best for creative work; GPT-5.3 Codex excels at bug-catching and high-quality code

Chapters

Introduction to new AI coding models
00:00
My test methodology for comparing models
02:13
Codex’s unique features: Git primitives, skills, and automations
03:30
Testing GPT-5.2 Codex on a website redesign task
09:05
Challenges with Codex’s literal interpretation of prompts
10:40
Comparing the before and after with Codex
15:00
Testing Opus 4.6 on the same website redesign task
16:23
Comparing the visual results of both models
20:56
Real-world engineering impact: 44 PRs in five days
21:30
Refactoring components with Opus 4.6
23:03
Using Codex for code review and architectural analysis
24:30
Cost considerations for Opus 4.6 Fast
26:55
Conclusion
28:52

Transcript

Claire Vo: Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we're going to bring you up to date on all the new coding model releases from OpenAI and Anthropic....