opus 4.8 is still very much blind - EyeBench-V3 visual benchmark (similar to IBench)
Summary
EyeBench-V3 visual benchmark evaluates Claude Opus 4.8, finding it still fails basic vision tasks, similar to IBench. The benchmark is introduced via a Twitter thread by Adonis Singh.
Similar Articles
@ItsmeAjayKV: Achievement Unlocked: Running Qwen3.6-27b dense Thanks to the RTX 3090, now I can do this. Running @Alibaba_Qwen Qwen 3…
User benchmarks Qwen3.6-27B on an RTX 3090 using llama.cpp, achieving 35 tok/s generation and 1247 tok/s prompt processing.
A 4b model is now beating 30b ones at web research and the reason is not size
A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.
@antirez: OpenAI may delay GPT6 (or even 5.6) before making sure could not be blocked like Fable. Or they could play it smart, pu…
Salvatore Sanfilippo speculates that OpenAI might delay GPT-6 (or 5.6) to avoid being blocked like the Fable incident, suggesting they could selectively publish benchmarks and release a censored model for cybersecurity.
GLM-5.2 is the new leading open weights model on Artificial Analysis
Z ai's GLM-5.2 has become the new leading open weights model on the Artificial Analysis Intelligence Index, scoring 51 and outperforming competitors like MiniMax-M3 and DeepSeek V4 Pro. The model features 744B total parameters, 40B active, MIT license, and 1M context window.
GLM-5.2 (max) is currently the third best model available, across both open and proprietary.
GLM-5.2 (max) is currently ranked as the third best AI model overall according to Artificial Analysis' Intelligence Index, with detailed analysis of intelligence, openness, cost, and token usage.