Agentic harness for theoretical physics research

Reddit r/LocalLLaMA 05/12/26, 05:23 PM Models

Summary

Hugging Face releases 'physics-intern', an agentic framework for theoretical physics research that doubles the performance of Gemini models on the CritPt benchmark and sets a new state-of-the-art compared to GPT-5.5 Pro.

Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to mimic the research process and decomposes the work into several focused tasks that are dispatched to dedicated subagents (computing, reviewing claims, challenging the research strategy...) Using the physics-intern, we were able to double the performance of Gemini models on the CritPt benchmark and set a new SOTA compared to models like GPT-5.5 Pro, while being significantly cheaper :) We wrote up how our framework was built in a blog post and hope it's useful for the community to build on: [https://huggingface.co/spaces/huggingface/physics-intern](https://huggingface.co/spaces/huggingface/physics-intern)

Original Article

Similar Articles

@dlouapre: Meet physics-intern, our agentic framework for theoretical physics. It takes Gemini 3.1 Pro from 17.7% to 31.4% on Crit…

X AI KOLs Following

Physics-intern is an agentic framework for theoretical physics that improves Gemini 3.1 Pro's performance on the CritPt benchmark from 17.7% to 31.4%, achieving a new state-of-the-art.

@RoundtableSpace: HUGGING FACE JUST AUTOMATED THEIR ENTIRE POST-TRAINING TEAM WITH AN AGENT. It reads papers, runs GPU experiments, itera…

X AI KOLs Following

Hugging Face replaced its post-training team with an autonomous agent that reads papers, runs GPU experiments, and improves models, achieving a 22-point benchmark jump in under 10 hours and beating Codex on HealthBench by 60%.

Gemini api showing agentic gemini models

Reddit r/singularity

Google's Gemini API now exposes agentic models, enabling developers to build autonomous AI agents with enhanced reasoning and action capabilities.

Introducing Gemini 2.0: our new AI model for the agentic era

Google DeepMind Blog

Google DeepMind introduces Gemini 2.0, a new agentic AI model with native image and audio output, enhanced tool use, and multimodal capabilities designed for the next era of AI agents. Gemini 2.0 Flash is now available to developers with wider availability planned for early 2025.

Accelerating Mathematical and Scientific Discovery with Gemini Deep Think