@nick_kango: One more task to add to my twitter benchmark collection:) Btw, Opus 4.8 and all the SOTA models passed when i tried tha…

X AI KOLs Timeline News

Summary

Nick Kang adds a new task to his Twitter benchmark collection; Claude Opus 4.8 and other SOTA models pass, while Sonnet 4.6 and Grok 4.3 fail. Alfin remarks on Opus 4.8's dangerous capabilities.

One more task to add to my twitter benchmark collection:) Btw, Opus 4.8 and all the SOTA models passed when i tried that, but sonnet 4.6 & Grok 4.3 didn't https://kaggle.com/benchmarks/tasks/nicholaskanggoog/days-with-d-puzzle…
Original Article
View Cached Full Text

Cached at: 05/31/26, 11:06 AM

One more task to add to my twitter benchmark collection:) Btw, Opus 4.8 and all the SOTA models passed when i tried that, but sonnet 4.6 & Grok 4.3 didn’t https://kaggle.com/benchmarks/tasks/nicholaskanggoog/days-with-d-puzzle…

Alfin (@AlfinCodes): Claude Opus 4.8 is insane.

Nothing will be the same after this model.

Anthropic should not have released something this dangerous.

Similar Articles