Tag
This paper introduces a four-condition diagnostic protocol to separate no-evidence answerability, oracle-evidence recoverability, full-context utilization, and retrieval-conditioned utilization in long-context and retrieval-augmented language models, tested on five open-weight models across multiple datasets.
The paper introduces NEI-CAP, a diagnostic protocol to evaluate how 'Not Enough Information' examples are constructed in fact verification benchmarks, revealing that models trained on shortcut-prone NEI constructions fail to transfer to harder, semantically related insufficient evidence cases.
Introduces a four-condition diagnostic protocol to identify whether failures in long-context memory systems stem from write-side compression discarding evidence or retrieval-side missing stored information. The analysis reveals write-side gaps dominate for most baselines, motivating the proposed Expected Predictive Compression (EPC) method that improves preservation of relevant evidence.