daily-search

Tag

Cards List
#daily-search

DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks

arXiv cs.AI · 17h ago Cached

DailyReport is an open-ended benchmark for evaluating search agents on daily search tasks, featuring 150 tasks and 3,546 rubrics for interpretable, user-centric evaluation.

0 favorites 0 likes
← Back to home

Submit Feedback