opus 4.8 is still very much blind - EyeBench-V3 visual benchmark (similar to IBench)

Reddit r/singularity News

Summary

EyeBench-V3 visual benchmark evaluates Claude Opus 4.8, finding it still fails basic vision tasks, similar to IBench. The benchmark is introduced via a Twitter thread by Adonis Singh.

https://preview.redd.it/22texjo58l4h1.png?width=3340&format=png&auto=webp&s=73039f304a4ee253ca214b3378cc14a83909fc62 [https://x.com/adonis\_singh/status/2060133072482324521](https://x.com/adonis_singh/status/2060133072482324521) [https://x.com/search?q=eyebench-v3%20(from%3Aadonis\_singh)&f=top&src=typed\_query](https://x.com/search?q=eyebench-v3%20(from%3Aadonis_singh)&f=top&src=typed_query) [https://x.com/adonis\_singh/status/2031516746570469837](https://x.com/adonis_singh/status/2031516746570469837) \- benchmark introduction post
Original Article

Similar Articles

GLM-5.2 is the new leading open weights model on Artificial Analysis

Hacker News Top

Z ai's GLM-5.2 has become the new leading open weights model on the Artificial Analysis Intelligence Index, scoring 51 and outperforming competitors like MiniMax-M3 and DeepSeek V4 Pro. The model features 744B total parameters, 40B active, MIT license, and 1M context window.