@eglyman: we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6. normal corporate card company stuff.

X AI KOLs Following 05/07/26, 09:58 PM Models

Summary

A developer trained a 350M-parameter model capable of navigating spreadsheets better than Anthropic's Opus 4.6.

we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6. normal corporate card company stuff.

Original Article

View Cached Full Text

Cached at: 05/08/26, 09:55 AM

we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6.

normal corporate card company stuff.

Similar Articles

@startupideaspod: https://x.com/startupideaspod/status/2069494373604282771

X AI KOLs Timeline

GLM 5.2 is an open-source AI model with a 1M token context window and strong benchmark performance, narrowly trailing Opus 4.8. The episode provides a practical setup guide for local or cloud use with tools like Cursor and Codex, and emphasizes chaining models for cost efficiency.

I don’t believe this benchmark 27b size model next opus 4.5! Anyone can confirm testing with real agentic workflow?

Reddit r/LocalLLaMA

A 27B parameter model reportedly outperforms Opus 4.5 on a benchmark, prompting community skepticism and requests for real-world agentic workflow validation.

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

Hacker News Top

This technical report introduces VibeThinker-3B, a 3B parameter dense model that achieves frontier-level reasoning performance on benchmarks like AIME26 and LiveCodeBench, matching or exceeding much larger models such as DeepSeek V3.2 and GLM-5 through a combination of curriculum-based SFT, multi-domain RL, and offline self-distillation.

A 4b model is now beating 30b ones at web research and the reason is not size

Reddit r/artificial

A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.

@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…