long-document-reasoning

#long-document-reasoning

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

arXiv cs.CL ↗ · 2d ago Cached

MemoryDocDataSet is a new synthetic benchmark of 50 micro-worlds and 1,000 QA pairs designed to evaluate AI systems on the joint task of conversational memory and long-document reasoning simultaneously. The best baseline (RAG-Both) achieves only 0.358 overall F1, highlighting a significant gap in current systems' ability to unify conversational memory with long-document navigation.

0 favorites 0 likes

long-document-reasoning

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

Submit Feedback