@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…
Summary
OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.
View Cached Full Text
Cached at: 05/09/26, 07:44 AM
Fully open sources training data for 30B scale search agents
https://t.co/T3YsKzKGLe https://t.co/20GUqwJt43
PolarSeeker/OpenSeeker
Source: https://github.com/PolarSeeker/OpenSeeker
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data
📰 News
-
2026.05.06 📣 Our OpenSeeker-v2 achieves state-of-the-art performance across four benchmarks among 30B-scale ReAct-based search agents with simple SFT: 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity’s Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch, which is trained with a heavy CPT+SFT+RL pipeline. Our code is coming soon!
-
2026.03.17 🚀 We open-sourced OpenSeeker-v1 (all data and models). Using 11.7K training examples, we fine-tuned Qwen3-30B-A3B-Thinking-2507 and achieved scores of 48.4% on BrowseComp-ZH, 29.5% on BrowseComp, 74.0% on xbench-DeepSearch, and 59.4% on WideSearch.
Overview
OpenSeeker is an open-source search agent system that democratizes access to frontier search capabilities by fully open-sourcing its training data. This project enables researchers and developers to build, evaluate, and deploy advanced search agents for complex information-seeking tasks.
Quick Start
Installation
Clone the repository and set up the environment:
# Clone repository
git clone https://github.com/rui-ye/OpenSeeker.git
cd OpenSeeker
# Create conda environment
conda create --name openseeker python=3.10
conda activate openseeker
pip install -r requirements.txt
Model Setup
Download and deploy the OpenSeeker model:
# 1. Install git-xet (required for downloading the model)
brew install git-xet
git xet install
# 2. Clone the OpenSeeker model repository
git clone https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT
# 3. Update MODEL_PATH in run_openseeker.sh to point to the downloaded model directory
# Edit run_openseeker.sh and set MODEL_PATH="/path/to/OpenSeeker-v1-30B-SFT"
# 4. Deploy the model server
bash run_openseeker.sh
Configuration
# Edit setup_env.sh with your API endpoints and keys
source setup_env.sh
Usage
Generate answers and evaluate results:
# Generate answers for your dataset
python eval/generate_answer.py \
--dataset_path /path/to/your/dataset.jsonl \
--out_dir /path/to/output/directory
# Evaluate the generated results
python eval/eval.py \
--data_path /path/to/output/directory/result_tool200.jsonl \
--max_workers 20
Project Structure
OpenSeeker/
├── eval/ # Evaluation scripts
│ ├── eval.py # Main evaluation script
│ ├── generate_answer.py # Answer generation script
│ └── prompt.py # Prompt templates
├── src/ # Core source code
│ ├── llm_tool_openseeker.py # LLM tool interface
│ ├── config/ # Configuration files
│ │ └── chat_template.jinja # Chat template configuration
│ └── tools/ # Tool implementations
│ ├── search.py # Search tool
│ └── visit.py # Web visit tool
├── run_openseeker.sh # Model server startup script
├── setup_env.sh # Environment variable template
└── README.md # This file
📚 Citation
If you find OpenSeeker useful in your research, please consider citing:
@article{du2026openseeker,
title={OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data},
author={Du, Yuwen and Ye, Rui and Tang, Shuo and Zhu, Xinyu and Lu, Yijun and Cai, Yuzhu and Chen, Siheng},
journal={arXiv preprint arXiv:2603.15594},
year={2026}
}
@article{du2026openseekerv2,
title={OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories},
author={Du, Yuwen and Ye, Rui and Tang, Shuo and Huang, Keduan and Zhu, Xinyu and Cai, Yuzhu and Chen, Siheng},
journal={arXiv preprint arXiv:2605.04036},
year={2026}
}
⭐ Star History
Similar Articles
@tom_doerr: Open-source agent for long-horizon deep research https://github.com/TIGER-AI-Lab/OpenResearcher…
TIGER-AI-Lab releases OpenResearcher, an open-source agent designed to automate long-horizon deep research tasks.
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
DR-Venus-4B is a 4B-parameter deep-research agent trained on only 10K open samples via agentic SFT+RL with turn-level rewards, outrunning prior sub-9B agents and rivaling 30B models on research benchmarks while staying deployable on edge devices.
BrowseComp: a benchmark for browsing agents
OpenAI released BrowseComp, a benchmark of 1,266 challenging problems designed to measure AI agents' ability to locate hard-to-find information across the internet, available in their simple evals GitHub repository.
DeepSeek-V4: a million-token context that agents can actually use
DeepSeek releases V4, a MoE model with a 1M-token context window optimized for agentic tasks through hybrid attention and reduced KV cache requirements.