reasoning-benchmarks

Tag

Cards List
#reasoning-benchmarks

HRM Seems To Be Going Off Right Now

Reddit r/LocalLLaMA · 2026-05-19 Cached

Sapient Intelligence has released HRM-Text, a 1B parameter text generation model, trained on only 0.04 trillion tokens (costing approximately $1000), surpassing much larger models trained on 100-1000 times more data on multiple reasoning benchmarks, marking the beginning of a new paradigm for AI training.

0 favorites 0 likes
#reasoning-benchmarks

@daniel_mac8: babe, wake up. new continual learning breakthrough just dropped. fast-slow training (fst) treats model params as "slow"…

X AI KOLs Timeline · 2026-05-17 Cached

This tweet announces Fast-Slow Training (FST), a new continual learning method that treats model parameters as slow weights and optimized context as fast weights, reportedly outperforming weights-only training on math, code, and general reasoning benchmarks.

0 favorites 0 likes
#reasoning-benchmarks

@dair_ai: // Harnessing Agentic Evolution // Pay attention to this one if you run iterative agentic search loops. (bookmark it) A…

X AI KOLs Following · 2026-05-14 Cached

AEvo is a meta-editing framework that improves iterative agentic search by separating proposal and evaluation into two roles and using accumulated memory to guide future search. It achieves a 26% relative gain over baselines and state-of-the-art results on open-ended optimization tasks.

0 favorites 0 likes
#reasoning-benchmarks

From 0-Order Selection to 2-Order Judgment: Combinatorial Hardening Exposes Compositional Failures in Frontier LLMs

arXiv cs.CL · 2026-05-11 Cached

This paper introduces LogiHard, a framework that uses combinatorial hardening to expose compositional failures in frontier LLMs, demonstrating significant accuracy drops in logical reasoning tasks.

0 favorites 0 likes
← Back to home

Submit Feedback