opus 4.8 is still very much blind - EyeBench-V3 visual benchmark (similar to IBench)

Reddit r/singularity 06/01/26, 03:13 AM News

benchmark visual-ai model-evaluation eyebench opus ai-vision

Summary

EyeBench-V3 visual benchmark evaluates Claude Opus 4.8, finding it still fails basic vision tasks, similar to IBench. The benchmark is introduced via a Twitter thread by Adonis Singh.

https://preview.redd.it/22texjo58l4h1.png?width=3340&format=png&auto=webp&s=73039f304a4ee253ca214b3378cc14a83909fc62 [https://x.com/adonis\_singh/status/2060133072482324521](https://x.com/adonis_singh/status/2060133072482324521) [https://x.com/search?q=eyebench-v3%20(from%3Aadonis\_singh)&f=top&src=typed\_query](https://x.com/search?q=eyebench-v3%20(from%3Aadonis_singh)&f=top&src=typed_query) [https://x.com/adonis\_singh/status/2031516746570469837](https://x.com/adonis_singh/status/2031516746570469837) \- benchmark introduction post

Original Article

Similar Articles

@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…

X AI KOLs Timeline

User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.

A 4b model is now beating 30b ones at web research and the reason is not size

Reddit r/artificial

A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.

@antirez: OpenAI may delay GPT6 (or even 5.6) before making sure could not be blocked like Fable. Or they could play it smart, pu…

X AI KOLs Following

Salvatore Sanfilippo speculates that OpenAI might delay GPT-6 (or 5.6) to avoid being blocked like the Fable incident, suggesting they could selectively publish benchmarks and release a censored model for cybersecurity.

GLM-5.2 is the new leading open weights model on Artificial Analysis

Hacker News Top

Z ai's GLM-5.2 has become the new leading open weights model on the Artificial Analysis Intelligence Index, scoring 51 and outperforming competitors like MiniMax-M3 and DeepSeek V4 Pro. The model features 744B total parameters, 40B active, MIT license, and 1M context window.

GLM-5.2 (max) is currently the third best model available, across both open and proprietary.

Reddit r/LocalLLaMA

GLM-5.2 (max) is currently ranked as the third best AI model overall according to Artificial Analysis' Intelligence Index, with detailed analysis of intelligence, openness, cost, and token usage.