search-agents

Tag

Cards List
#search-agents

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

arXiv cs.AI · 2026-06-12 Cached

DailyReport is an open-ended benchmark for evaluating search agents on daily search tasks, featuring 150 tasks and 3,546 rubrics for interpretable, user-centric evaluation.

0 favorites 0 likes
#search-agents

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

arXiv cs.CL · 2026-06-12 Cached

This paper introduces EvoBrowseComp, a dynamic benchmark of 400 English and 400 Chinese complex questions that are synthesized via live-web traversal to evaluate search agents without test-set contamination, ensuring robustness against parametric memorization.

0 favorites 0 likes
#search-agents

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

arXiv cs.CL · 2026-06-12 Cached

LoHoSearch is a new benchmark for evaluating long-horizon search agents, built from a knowledge graph of 7 million Wikipedia entities. It introduces questions with large search spaces and structural complexity to exceed human-authored difficulty ceilings, and shows that the best model achieves only 34.74% accuracy.

0 favorites 0 likes
#search-agents

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Hugging Face Daily Papers · 2026-06-11 Cached

EvoBrowseComp is an evolving benchmark with 800 contamination-free questions for evaluating search agents, designed to prevent parametric memorization and maintain temporal freshness through a three-agent framework.

0 favorites 0 likes
#search-agents

FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents

Hugging Face Daily Papers · 2026-06-10 Cached

FORT-Searcher introduces a framework for synthesizing shortcut-resistant training data for deep search agents by identifying and mitigating four shortcut risks. The resulting agent, trained via supervised fine-tuning, achieves state-of-the-art performance among comparable open-source search agents.

0 favorites 0 likes
#search-agents

@patpcj: Thanks again for your interest in our work! Links here so they don’t get buried under “show more”: Paper : https://arxi…

X AI KOLs Following · 2026-06-08 Cached

Harness-1 is a 20B search agent trained with reinforcement learning using a stateful search harness, achieving strong results on retrieval benchmarks and outperforming other open search subagents.

0 favorites 0 likes
#search-agents

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

arXiv cs.CL · 2026-06-03 Cached

ARBOR introduces a reusable rubric buffer to provide online process rewards for LLM-based search agents, improving training efficiency when outcome-only rewards are insufficient. It outperforms GRPO and DAPO on multi-hop QA benchmarks, converting up to 42% of zero-gradient training groups into informative ones.

0 favorites 0 likes
#search-agents

@dair_ai: // State-Externalizing Harnesses // A new paradigm is emerging on how to effectively build agents and harnesses. If the…

X AI KOLs Following · 2026-06-02 Cached

Harness-1 introduces a state-externalizing harness that separates routine bookkeeping from policy decisions in search agents, enabling a 20B model to outperform larger frontier searchers across multiple benchmarks.

0 favorites 0 likes
#search-agents

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

arXiv cs.AI · 2026-06-01 Cached

Proposes COMPASS, a cognitive MCTS-guided process alignment framework to enhance safety in LLM-powered search agents by synthesizing attack trajectories and isolating risky actions, achieving a favorable safety-utility trade-off with less training data.

0 favorites 0 likes
#search-agents

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Hugging Face Daily Papers · 2026-06-01 Cached

Introduces Harness-1, a 20B open search agent trained with state-externalizing harnesses, achieving strong retrieval performance and outperforming larger frontier models on several benchmarks.

0 favorites 0 likes
#search-agents

GrepSeek: Training Search Agents for Direct Corpus Interaction

arXiv cs.CL · 2026-05-29 Cached

GrepSeek trains LLM search agents to directly interact with a text corpus using shell commands like grep, using a two-stage training pipeline with cold-start dataset construction and GRPO refinement, achieving strong F1 and Exact Match on open-domain QA benchmarks.

0 favorites 0 likes
#search-agents

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

arXiv cs.AI · 2026-05-25 Cached

EVE-Agent introduces a framework for self-evolving search agents that ensure evidence verifiability by generating questions, answers, and evidence spans, and training on marginal accuracy gain of evidence. This improves grounded correctness without human annotations.

0 favorites 0 likes
#search-agents

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Hugging Face Daily Papers · 2026-05-22 Cached

QUEST is an open family of deep research agents trained with synthetic data and reinforcement learning, achieving strong performance across diverse long-horizon search tasks, approaching frontier closed-source agents.

0 favorites 0 likes
#search-agents

@tom_doerr: Fully open sources training data for 30B scale search agents https://github.com/PolarSeeker/OpenSeeker…

X AI KOLs Timeline · 2026-05-09 Cached

OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.

1 favorites 1 likes
#search-agents

Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents

arXiv cs.AI · 2026-05-08 Cached

This paper introduces a method using knowledge-graph paths as intermediate supervision to improve self-evolving search agents. It addresses bottlenecks in Search Self-Play by grounding question construction in relational context and introducing a Waypoint Coverage Reward for graded partial credit.

0 favorites 0 likes
#search-agents

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Hugging Face Daily Papers · 2026-05-06 Cached

OpenSearch-VL is an open-source framework and paper introducing a recipe for training frontier multimodal search agents using reinforcement learning, featuring specialized data curation and a novel training algorithm.

0 favorites 0 likes
← Back to home

Submit Feedback