scripod.com

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

How I AI

Feb 11
How I AI

How I AI

Feb 11
In this episode, Claire Vo puts the latest AI coding models—OpenAI’s GPT-5.3 Codex and Anthropic’s Claude Opus 4.6 (including the faster variant)—to the test on real engineering tasks she’s actively shipping.
Vo benchmarks both models using her ChatPRD marketing site, redesigning it for enterprise customers rather than PLG users. While Codex excels at precise code review, architectural analysis, and Git-integrated workflows—especially with features like work trees and scheduled automations—it struggles with creative, open-ended tasks due to overly literal prompt interpretation and limited self-editing. In contrast, Opus 4.6 shines in generative, long-horizon development: it successfully refactored gnarly components, rebuilt key pages with brand-aligned visuals, and enabled rapid iteration toward production-ready output. Together, they form a complementary stack—Opus handling ideation and implementation, Codex ensuring correctness and polish. This synergy powered a massive engineering sprint: 44 PRs, 98 commits, and over 1,000 files touched in five days. Vo also cautions about cost trade-offs, noting Opus 4.6 Fast delivers speed but demands careful token budgeting.
00:04
00:04
OpenAI released GPT-5.3 Codex and Anthropic released Claude Opus 4.6 and Opus 4.6 Fast
02:14
02:14
The speaker selects redesigning the ChatPRD marketing site as a task to compare new models
06:11
06:11
Codex treats skills as first-class citizens with better presentation and recommended skills
09:07
09:07
GPT-5.3 Codex was used to redesign ChatPRD’s marketing site for enterprise appeal
10:40
10:40
GPT-5.3 Codex overfits to prompts and takes them too literally
15:06
15:06
The GPT model only redesigned the homepage and enterprise page instead of the whole site as requested
19:01
19:01
The new design matches the brand aesthetic, uses graphics, calls out numbers, and highlights reviews
20:56
20:56
Side-by-side comparisons alone are insufficient to gauge real-world model performance
21:42
21:42
Released MCP connectors on chat PRD, enabling access to GitHub, Linear, and Claude in product work
23:04
23:04
Used Claude Opus 4.6's plan mode with Cursor to build extensible front-end components
24:32
24:32
Codex replicates the experience of a principal software engineer
26:57
26:57
Opus 4.6 Fast enabled the team to ship 44 PRs efficiently
28:52
28:52
Opus 4.6 is best for creative work; GPT-5.3 Codex excels at bug-catching and high-quality code