evolving-knowledge

Tag

Cards List
#evolving-knowledge

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

arXiv cs.CL · 18h ago Cached

This paper introduces EvoBrowseComp, a dynamic benchmark of 400 English and 400 Chinese complex questions that are synthesized via live-web traversal to evaluate search agents without test-set contamination, ensuring robustness against parametric memorization.

0 favorites 0 likes
#evolving-knowledge

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Hugging Face Daily Papers · yesterday Cached

EvoBrowseComp is an evolving benchmark with 800 contamination-free questions for evaluating search agents, designed to prevent parametric memorization and maintain temporal freshness through a three-agent framework.

0 favorites 0 likes
← Back to home

Submit Feedback