@lvwerra: We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -…
Summary
Released physics-intern, a simple harness that significantly boosts the performance of reasoning models like Gemini 3.1 Pro on science problems, from 17.7 to 31.4, outperforming GPT 5.5 Pro.
View Cached Full Text
Cached at: 05/21/26, 05:35 PM
We released physics-intern: a simple harness for science problems!
It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro.
The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models.
While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well.
Interestingly, the exception we found that GPT 5.5 Pro actually didn’t benefit from the physics-intern harness!
Read more about it here: https://huggingface.co/spaces/huggingface/physics-intern…
PS: I think the Harness[Model] notation is kind of nice.
physics-intern: an Autonomous Agent for Physics Research - a Hugging Face Space by huggingface
Source: https://huggingface.co/spaces/huggingface/physics-intern Fetching metadata from the HF Docker repository...
Similar Articles
@dlouapre: Meet physics-intern, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on Crit…
Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.
Agentic harness for theoretical physics research
Hugging Face releases 'physics-intern', an agentic framework for theoretical physics research that doubles the performance of Gemini models on the CritPt benchmark and sets a new state-of-the-art compared to GPT-5.5 Pro.
Gemini 3.1 Pro: A smarter model for your most complex tasks
Google releases Gemini 3.1 Pro, an upgraded AI model with significantly improved reasoning capabilities for complex tasks, rolling out to developers, enterprises, and consumers.
Advancing science and math with GPT-5.2
OpenAI releases GPT-5.2, featuring GPT-5.2 Pro and GPT-5.2 Thinking variants optimized for scientific and mathematical work. The models achieve state-of-the-art performance on benchmarks like GPQA Diamond (93.2%) and FrontierMath (40.3%), demonstrating improved reasoning capabilities designed to accelerate scientific research across physics, chemistry, biology, and mathematics.
Start building with Gemini 3
Google has launched Gemini 3 Pro, a new AI model designed to outperform previous versions in coding, agentic workflows, and multimodal reasoning. The model is available via the Gemini API, Google AI Studio, and the new Google Antigravity development platform.