Agentic harness for theoretical physics research
Summary
Hugging Face releases 'physics-intern', an agentic framework for theoretical physics research that doubles the performance of Gemini models on the CritPt benchmark and sets a new state-of-the-art compared to GPT-5.5 Pro.
Similar Articles
@dlouapre: Meet physics-intern, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on Crit…
Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.
@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…
Released physics-intern, a simple harness that significantly boosts the performance of reasoning models like Gemini 3.1 Pro on science problems, from 17.7 to 31.4, outperforming GPT 5.5 Pro.
@RoundtableSpace: HUGGING FACE JUST AUTOMATED THEIR ENTIRE POST-TRAINING TEAM WITH AN AGENT. It reads papers, runs GPU experiments, itera…
Hugging Face replaced its post-training team with an autonomous agent that reads papers, runs GPU experiments, and improves models, achieving a 22-point benchmark jump in under 10 hours and beating Codex on HealthBench by 60%.
InternScience/Agents-A1 · Hugging Face
Agents-A1 is a 35B Mixture-of-Experts agentic model from InternScience that achieves competitive performance against frontier-scale systems like GPT-5.5 and DeepSeek-V4-pro using long-horizon trajectory scaling and multi-teacher multi-domain distillation.
HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry
HarnessX is a foundry for composable, adaptive, and evolvable AI agent harnesses that uses compositional primitives and trace-driven evolution to improve agent performance. Across five benchmarks, it achieves an average gain of +14.5% (up to +44.0%), demonstrating that runtime interface evolution is a complementary lever to model scaling.