archive-grounded

#archive-grounded

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

arXiv cs.CL ↗ · yesterday Cached

AGORA is a new benchmark for evaluating large language models on archive-grounded reasoning tasks across workplace documents, comprising 362 questions over 9,664 real documents. The strongest model achieves only 59.4% accuracy, highlighting substantial room for improvement.

0 favorites 0 likes

archive-grounded

AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning

Submit Feedback