Tag
DailyReport is an open-ended benchmark for evaluating search agents on daily search tasks, featuring 150 tasks and 3,546 rubrics for interpretable, user-centric evaluation.