Agentic harness for theoretical physics research

Reddit r/LocalLLaMA 05/12/26, 05:23 PM Models

Summary

Hugging Face releases 'physics-intern', an agentic framework for theoretical physics research that doubles the performance of Gemini models on the CritPt benchmark and sets a new state-of-the-art compared to GPT-5.5 Pro.

Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to mimic the research process and decomposes the work into several focused tasks that are dispatched to dedicated subagents (computing, reviewing claims, challenging the research strategy...) Using the physics-intern, we were able to double the performance of Gemini models on the CritPt benchmark and set a new SOTA compared to models like GPT-5.5 Pro, while being significantly cheaper :) We wrote up how our framework was built in a blog post and hope it's useful for the community to build on: [https://huggingface.co/spaces/huggingface/physics-intern](https://huggingface.co/spaces/huggingface/physics-intern)

Original Article

Similar Articles

@dlouapre: Meet physics-intern, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on Crit…

X AI KOLs Following

Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.

@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…

X AI KOLs Following

Released physics-intern, a simple harness that significantly boosts the performance of reasoning models like Gemini 3.1 Pro on science problems, from 17.7 to 31.4, outperforming GPT 5.5 Pro.

@RoundtableSpace: HUGGING FACE JUST AUTOMATED THEIR ENTIRE POST-TRAINING TEAM WITH AN AGENT. It reads papers, runs GPU experiments, itera…

X AI KOLs Following

Hugging Face replaced its post-training team with an autonomous agent that reads papers, runs GPU experiments, and improves models, achieving a 22-point benchmark jump in under 10 hours and beating Codex on HealthBench by 60%.

InternScience/Agents-A1 · Hugging Face

Reddit r/LocalLLaMA

Agents-A1 is a 35B Mixture-of-Experts agentic model from InternScience that achieves competitive performance against frontier-scale systems like GPT-5.5 and DeepSeek-V4-pro using long-horizon trajectory scaling and multi-teacher multi-domain distillation.

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

Hugging Face Daily Papers

HarnessX is a foundry for composable, adaptive, and evolvable AI agent harnesses that uses compositional primitives and trace-driven evolution to improve agent performance. Across five benchmarks, it achieves an average gain of +14.5% (up to +44.0%), demonstrating that runtime interface evolution is a complementary lever to model scaling.

Similar Articles

@dlouapre: Meet physics-intern, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on Crit…

@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…

@RoundtableSpace: HUGGING FACE JUST AUTOMATED THEIR ENTIRE POST-TRAINING TEAM WITH AN AGENT. It reads papers, runs GPU experiments, itera…

InternScience/Agents-A1 · Hugging Face

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

Submit Feedback