contamination-free

#contamination-free

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

arXiv cs.CL ↗ · 18h ago Cached

This paper introduces EvoBrowseComp, a dynamic benchmark of 400 English and 400 Chinese complex questions that are synthesized via live-web traversal to evaluate search agents without test-set contamination, ensuring robustness against parametric memorization.

0 favorites 0 likes

#contamination-free

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Hugging Face Daily Papers ↗ · yesterday Cached

EvoBrowseComp is an evolving benchmark with 800 contamination-free questions for evaluating search agents, designed to prevent parametric memorization and maintain temporal freshness through a three-agent framework.

0 favorites 0 likes

contamination-free

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Submit Feedback