@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…

X AI KOLs Timeline 05/09/26, 06:40 AM Models

open-source search-agents training-data large-language-models reinforcement-learning fine-tuning ai-research

Summary

OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.

Fully open sources training data for 30B scale search agents https://t.co/T3YsKzKGLe https://t.co/20GUqwJt43

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/09/26, 07:44 AM

Fully open sources training data for 30B scale search agents

https://t.co/T3YsKzKGLe https://t.co/20GUqwJt43

PolarSeeker/OpenSeeker

Source: https://github.com/PolarSeeker/OpenSeeker

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

📰 News

2026.05.06 📣 Our OpenSeeker-v2 achieves state-of-the-art performance across four benchmarks among 30B-scale ReAct-based search agents with simple SFT: 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity’s Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch, which is trained with a heavy CPT+SFT+RL pipeline. Our code is coming soon!
2026.03.17 🚀 We open-sourced OpenSeeker-v1 (all data and models). Using 11.7K training examples, we fine-tuned Qwen3-30B-A3B-Thinking-2507 and achieved scores of 48.4% on BrowseComp-ZH, 29.5% on BrowseComp, 74.0% on xbench-DeepSearch, and 59.4% on WideSearch.

Overview

OpenSeeker is an open-source search agent system that democratizes access to frontier search capabilities by fully open-sourcing its training data. This project enables researchers and developers to build, evaluate, and deploy advanced search agents for complex information-seeking tasks.

Quick Start

Installation

Clone the repository and set up the environment:

# Clone repository
git clone https://github.com/rui-ye/OpenSeeker.git
cd OpenSeeker

# Create conda environment
conda create --name openseeker python=3.10
conda activate openseeker
pip install -r requirements.txt

Model Setup

Download and deploy the OpenSeeker model:

# 1. Install git-xet (required for downloading the model)
brew install git-xet
git xet install

# 2. Clone the OpenSeeker model repository
git clone https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT

# 3. Update MODEL_PATH in run_openseeker.sh to point to the downloaded model directory
# Edit run_openseeker.sh and set MODEL_PATH="/path/to/OpenSeeker-v1-30B-SFT"

# 4. Deploy the model server
bash run_openseeker.sh

Configuration

# Edit setup_env.sh with your API endpoints and keys
source setup_env.sh

Usage

Generate answers and evaluate results:

# Generate answers for your dataset
python eval/generate_answer.py \
    --dataset_path /path/to/your/dataset.jsonl \
    --out_dir /path/to/output/directory

# Evaluate the generated results
python eval/eval.py \
    --data_path /path/to/output/directory/result_tool200.jsonl \
    --max_workers 20

Project Structure

OpenSeeker/
├── eval/                    # Evaluation scripts
│   ├── eval.py             # Main evaluation script
│   ├── generate_answer.py  # Answer generation script
│   └── prompt.py           # Prompt templates
├── src/                     # Core source code
│   ├── llm_tool_openseeker.py  # LLM tool interface
│   ├── config/             # Configuration files
│   │   └── chat_template.jinja  # Chat template configuration
│   └── tools/               # Tool implementations
│       ├── search.py       # Search tool
│       └── visit.py        # Web visit tool
├── run_openseeker.sh       # Model server startup script
├── setup_env.sh            # Environment variable template
└── README.md               # This file

📚 Citation

If you find OpenSeeker useful in your research, please consider citing:

@article{du2026openseeker,
  title={OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data},
  author={Du, Yuwen and Ye, Rui and Tang, Shuo and Zhu, Xinyu and Lu, Yijun and Cai, Yuzhu and Chen, Siheng},
  journal={arXiv preprint arXiv:2603.15594},
  year={2026}
}

@article{du2026openseekerv2,
  title={OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories},
  author={Du, Yuwen and Ye, Rui and Tang, Shuo and Huang, Keduan and Zhu, Xinyu and Cai, Yuzhu and Chen, Siheng},
  journal={arXiv preprint arXiv:2605.04036},
  year={2026}
}

@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…

PolarSeeker/OpenSeeker

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

📰 News

Overview

Quick Start

Installation

Model Setup

Configuration

Usage

Project Structure

📚 Citation

⭐ Star History

Similar Articles

@tom_doerr: Open-source agent for long-horizon deep research https://github.com/TIGER-AI-Lab/OpenResearcher…

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

BrowseComp: a benchmark for browsing agents

DeepSeek-V4: a million-token context that agents can actually use

Submit Feedback

Similar Articles

@tom_doerr: Open-source agent for long-horizon deep research https://github.com/TIGER-AI-Lab/OpenResearcher…

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

BrowseComp: a benchmark for browsing agents

DeepSeek-V4: a million-token context that agents can actually use