New bench designed for smaller models: ObviousBench.com
Summary
ObviousBench is a new benchmark designed specifically for evaluating smaller AI models.
Similar Articles
Introducing BenchBench (5 minute read)
Introduces BenchBench, a benchmark that tests AI models' ability to create effective benchmarks for other models, with GPT 5.2 being the only successful winner so far while frontier models like GPT 5.5 and Opus 4.6 struggled.
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
This paper introduces MLS-Bench, a benchmark designed to assess whether AI systems can invent generalizable and scalable machine learning methods rather than just performing engineering tuning.
ProgramBench (5 minute read)
ProgramBench is a new benchmark that evaluates AI agents' ability to reconstruct complete software projects from compiled binaries and documentation without access to source code or decompilation tools.
HuggingFace benchmark datasets now let you filter by model size
HuggingFace benchmark datasets now allow filtering by model size, enabling comparisons like 'best model under 32B on swebenchverified'.
Introducing HealthBench
OpenAI introduces HealthBench, a new benchmark for evaluating AI systems in healthcare contexts, created with 262 physicians across 60 countries. The benchmark includes 5,000 realistic health conversations with physician-written rubrics to assess model performance on meaningful, trustworthy, and improvable metrics.