archive-grounded

Tag

Cards List
#archive-grounded

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

arXiv cs.CL · yesterday Cached

AGORA is a new benchmark for evaluating large language models on archive-grounded reasoning tasks across workplace documents, comprising 362 questions over 9,664 real documents. The strongest model achieves only 59.4% accuracy, highlighting substantial room for improvement.

0 favorites 0 likes
← Back to home

Submit Feedback