Agentic harness for theoretical physics research

Reddit r/LocalLLaMA Models

Summary

Hugging Face releases 'physics-intern', an agentic framework for theoretical physics research that doubles the performance of Gemini models on the CritPt benchmark and sets a new state-of-the-art compared to GPT-5.5 Pro.

Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to mimic the research process and decomposes the work into several focused tasks that are dispatched to dedicated subagents (computing, reviewing claims, challenging the research strategy...) Using the physics-intern, we were able to double the performance of Gemini models on the CritPt benchmark and set a new SOTA compared to models like GPT-5.5 Pro, while being significantly cheaper :) We wrote up how our framework was built in a blog post and hope it's useful for the community to build on: [https://huggingface.co/spaces/huggingface/physics-intern](https://huggingface.co/spaces/huggingface/physics-intern)
Original Article

Similar Articles

InternScience/Agents-A1 · Hugging Face

Reddit r/LocalLLaMA

Agents-A1 is a 35B Mixture-of-Experts agentic model from InternScience that achieves competitive performance against frontier-scale systems like GPT-5.5 and DeepSeek-V4-pro using long-horizon trajectory scaling and multi-teacher multi-domain distillation.

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

Hugging Face Daily Papers

HarnessX is a foundry for composable, adaptive, and evolvable AI agent harnesses that uses compositional primitives and trace-driven evolution to improve agent performance. Across five benchmarks, it achieves an average gain of +14.5% (up to +44.0%), demonstrating that runtime interface evolution is a complementary lever to model scaling.