scripod.com

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

Overview

Shownote

Highlights

Transcript

Chapters

Pins

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

How I AI

Feb 11

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

Claude Opus 4.6 vs. GPT-5.3 Codex: How I shipped 93,000 lines of code in 5 days

How I AI

How I AI

Feb 11

Overview Shownote Highlights Transcript Chapters Pins

Shownote

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some ...

Highlights

In this episode, Claire Vo puts the latest AI coding models—OpenAI’s GPT-5.3 Codex and Anthropic’s Claude Opus 4.6 (including the faster variant)—to the test on real engineering tasks she’s actively shipping.

00:04

OpenAI released GPT-5.3 Codex and Anthropic released Claude Opus 4.6 and Opus 4.6 Fast

02:14

The speaker selects redesigning the ChatPRD marketing site as a task to compare new models

06:11

Codex treats skills as first-class citizens with better presentation and recommended skills

09:07

GPT-5.3 Codex was used to redesign ChatPRD’s marketing site for enterprise appeal

10:40

GPT-5.3 Codex overfits to prompts and takes them too literally

15:06

The GPT model only redesigned the homepage and enterprise page instead of the whole site as requested

19:01

The new design matches the brand aesthetic, uses graphics, calls out numbers, and highlights reviews

20:56

Side-by-side comparisons alone are insufficient to gauge real-world model performance

21:42

Released MCP connectors on chat PRD, enabling access to GitHub, Linear, and Claude in product work

23:04

Used Claude Opus 4.6's plan mode with Cursor to build extensible front-end components

24:32

Codex replicates the experience of a principal software engineer

26:57

Opus 4.6 Fast enabled the team to ship 44 PRs efficiently

28:52

Opus 4.6 is best for creative work; GPT-5.3 Codex excels at bug-catching and high-quality code

Chapters

Introduction to new AI coding models

00:00

My test methodology for comparing models

02:13

Codex’s unique features: Git primitives, skills, and automations

03:30

Testing GPT-5.2 Codex on a website redesign task

09:05

Challenges with Codex’s literal interpretation of prompts

10:40

Comparing the before and after with Codex

15:00

Testing Opus 4.6 on the same website redesign task

16:23

Comparing the visual results of both models

20:56

Real-world engineering impact: 44 PRs in five days

21:30

Refactoring components with Opus 4.6

23:03

Using Codex for code review and architectural analysis

24:30

Cost considerations for Opus 4.6 Fast

26:55

Conclusion

28:52

Transcript

Claire Vo: Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we're going to bring you up to date on all the new coding model releases from OpenAI and Anthropic....