@eglyman: we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6. normal corporate card company stuff.
Summary
A developer trained a 350M-parameter model capable of navigating spreadsheets better than Anthropic's Opus 4.6.
View Cached Full Text
Cached at: 05/08/26, 09:55 AM
we trained a .35b-parameter model to navigate spreadsheets better than opus 4.6.
normal corporate card company stuff.
Similar Articles
@startupideaspod: https://x.com/startupideaspod/status/2069494373604282771
GLM 5.2 is an open-source AI model with a 1M token context window and strong benchmark performance, narrowly trailing Opus 4.8. The episode provides a practical setup guide for local or cloud use with tools like Cursor and Codex, and emphasizes chaining models for cost efficiency.
I don’t believe this benchmark 27b size model next opus 4.5! Anyone can confirm testing with real agentic workflow?
A 27B parameter model reportedly outperforms Opus 4.5 on a benchmark, prompting community skepticism and requests for real-world agentic workflow validation.
VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO
This technical report introduces VibeThinker-3B, a 3B parameter dense model that achieves frontier-level reasoning performance on benchmarks like AIME26 and LiveCodeBench, matching or exceeding much larger models such as DeepSeek V3.2 and GLM-5 through a combination of curriculum-based SFT, multi-domain RL, and offline self-distillation.
A 4b model is now beating 30b ones at web research and the reason is not size
A 4 billion parameter open model from the Apodex family outperforms 30 billion parameter models on web research benchmarks, attributed to careful training data and self-verification techniques rather than raw scale, suggesting a more democratic trajectory for AI capability.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.