OpenThoughts-Agent: Data Recipes for Agentic Models
Summary
This paper introduces OpenThoughts-Agent, an open-source data curation pipeline for training agentic language models, achieving a 44.8% average accuracy across seven benchmarks and outperforming prior open datasets through systematic experiments.
View Cached Full Text
Cached at: 06/24/26, 05:47 AM
Paper page - OpenThoughts-Agent: Data Recipes for Agentic Models
Source: https://huggingface.co/papers/2606.24855 Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Abstract
An open-source data curation pipeline for training agentic language models is presented, demonstrating superior performance through systematic experimentation and scalable training data.
Agentic language modelsdramatically expand the applications of AI yet little is publicly known about how to curatetraining datafor broadly capable agents. Existing open efforts such as SWE-Smith, SERA, and Nemotron-Terminal typically target a single benchmark, leaving open the question of how to train models that generalize across diverse agentic tasks. The OpenThoughts-Agent (OT-Agent) project addresses this gap with a fully opendata curation pipelinefor training agentic models. We conduct more than 100controlled ablation experimentsto systematically investigate each stage of the pipeline, yielding insights on the importance of task sources and diversity. We then assemble a training set of 100K examples from our pipeline andfine-tuneQwen3-32B on this dataset, which yields an average accuracy of 44.8% across seven agenticbenchmarksand a 3.9 percentage point improvement over the strongest existing open data agentic model (Nemotron-Terminal-32B, 40.9%). Moreover, ourtraining dataexhibits strongscaling properties, outperforming alternative open datasets at every training set size in compute-controlled comparisons. We publicly release our training sets, data pipeline, experimental data, and models at openthoughts.ai to support future open research on agentic model training.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2606\.24855
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.24855 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.24855 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.24855 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Is it agentic enough? Benchmarking open models on your own tooling
This blog post introduces a benchmark methodology for evaluating how well open models perform on agentic coding tasks, focusing not just on accuracy but on the efficiency of the agent's process. It provides a customizable tooling harness using the pi coding agent and tests across models and library revisions.
Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse
This paper benchmarks agentic AI systems on the task of loading, understanding, and reformatting fragmented neuroscience data, finding that while agents perform well on subtasks, they rarely achieve fully error-free end-to-end solutions and human oversight remains necessary.
@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…
Karpathy's autoresearch repository has sparked a trend where agents train AI models to build state-of-the-art agentic systems, highlighting current limitations in LLM-driven hypothesis generation.
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.
Experiments in Agentic AI for Science
This paper presents two agentic AI frameworks, DeepTS/DeepCollector and DeepScribe, that automate scientific workflows including time-series data curation and conversion of physics lectures into structured reports, using a hybrid local-cloud architecture with LLMs.