scripod.com

Claude Opus 4.8 is here. Is it as good as they say?

How I AI

1 DAYS AGO
How I AI

How I AI

1 DAYS AGO
In this episode, Claire Vo shares her early hands-on experience with Anthropic's newly released Claude Opus 4.8 model. She walks through a series of real-world tests in coding, design, and business strategy, offering an unfiltered look at where the model shines and where it falls short.
Claire found Opus 4.8 impressive for greenfield prototypes and one-shot feature implementation, autonomously coding a working prototype in about 20 minutes. However, it consistently struggled with the 'last 10%' of tasks, including edge cases, finishing touches, and bug hunting, where it often hallucinated by fabricating information. On existing codebases, it required multiple cycles to fix bugs and lacked ambition in creative coding tasks like building a game. In business strategy tests, Opus 4.7 outperformed 4.8 by using data contextually and providing grounded insights, while 4.8 over-rotated on small data points and produced hand-wavy roadmaps. Claire recommends Opus 4.8 for fast, one-shot tasks and new projects but advises caution with existing codebases and strategic work. New features like dynamic workflows and effort control are also highlighted.
00:03
00:03
Claude Opus 4.8 excels in coding but struggles with edge cases
00:44
00:44
Claude Opus 4.8 is exciting on paper.
01:53
01:53
Struggled with the last 10% of tasks
03:00
03:00
Excels at initial implementation but struggles with edge cases and hallucinates during bug hunting
03:29
03:29
Model fabricates information based on hypotheses, not data
04:23
04:23
Excels at one-shot features on new surface areas
05:24
05:24
lacked ambition and struggled with the last 10%
07:03
07:03
Opus 4.7 used data contextually and zoomed out
09:17
09:17
Opus 4.7 provides specific, grounded strategy.
09:28
09:28
Overconfidence and narrow focus lead to inaccuracies.