scripod.com

Claude Opus 4.8 is here. Is it as good as they say?

How I AI

23 HOURS AGO
How I AI

How I AI

23 HOURS AGO

Shownote

I got a few hours of early-access testing with Anthropic’s newly released model Opus 4.8. I walk through real coding, design, and strategy tasks across Claude Code and Claude Cowork, and give you my unfiltered view on what impressed me and what didn’t. — ...

Highlights

In this episode, Claire Vo shares her early hands-on experience with Anthropic's newly released Claude Opus 4.8 model. She walks through a series of real-world tests in coding, design, and business strategy, offering an unfiltered look at where the model shines and where it falls short.
00:03
Claude Opus 4.8 excels in coding but struggles with edge cases
00:44
Claude Opus 4.8 is exciting on paper.
01:53
Struggled with the last 10% of tasks
03:00
Excels at initial implementation but struggles with edge cases and hallucinates during bug hunting
03:29
Model fabricates information based on hypotheses, not data
04:23
Excels at one-shot features on new surface areas
05:24
lacked ambition and struggled with the last 10%
07:03
Opus 4.7 used data contextually and zoomed out
09:17
Opus 4.7 provides specific, grounded strategy.
09:28
Overconfidence and narrow focus lead to inaccuracies.

Chapters

Introduction to Opus 4.8
00:00
Benchmark performance and pricing
00:44
First coding test: Building a prototyping tool
01:53
Where it failed: The last 10% problem
03:00
The hallucination problem
03:27
Testing Opus 4.8 on existing codebases
04:23
The ambition test: Building games for a 9-year-old
05:24
Business strategy test: 4.7 vs 4.8
07:03
The roadmap test
08:23
Final verdict
09:17

Transcript

Claire Vo: Welcome back to How I AI. I'm Claire Vo, product leader and AI obsessive, here on a mission to help you build better with these new tools. Today, we have a very special mini episode because Anthropic just dropped Claude Opus 4.8, their latest st...