Tag
EyeBench-V3 visual benchmark evaluates Claude Opus 4.8, finding it still fails basic vision tasks, similar to IBench. The benchmark is introduced via a Twitter thread by Adonis Singh.