EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
Summary
EviMem combines IRIS for evidence-gap detection and LaceMem for layered memory to improve long-term conversational memory retrieval, achieving higher accuracy on temporal and multi-hop questions with lower latency.
View Cached Full Text
Cached at: 05/14/26, 04:17 AM
Paper page - EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory
Source: https://huggingface.co/papers/2604.27695
Abstract
EviMem combines IRIS for detecting evidence gaps through sufficiency evaluation and LaceMem for layered memory hierarchy to improve conversational question answering accuracy while reducing latency.
Long-termconversational memoryrequires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal andmulti-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leavingquery refinementuntargeted. We present EviMem, combiningIRIS(Iterative Retrievalvia Insufficiency Signals), a closed-loop framework that detects evidence gaps throughsufficiency evaluation, diagnoses what is missing, and drives targetedquery refinement, withLaceMem(Layered Architecturefor Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions at 4.5x lower latency. Code: https://github.com/AIGeeksGroup/EviMem.
View arXiv pageView PDFGitHub1Add to collection
Get this paper in your agent:
hf papers read 2604\.27695
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.27695 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.27695 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.27695 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Cognis: Context-Aware Memory for Conversational AI Agents
Lyzr Cognis introduces a unified, open-source memory system for conversational AI that fuses BM25 and Matryoshka vector search with version-aware ingestion, achieving SOTA on LoCoMo and LongMemEval benchmarks.
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
MemEye is a visual-centric evaluation framework that assesses multimodal agent memory by measuring visual evidence granularity and retrieval complexity across 8 life-scenario tasks, revealing that current architectures struggle to preserve fine-grained visual details and reason about state changes over time.
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
MemLens is a new benchmark for evaluating memory capabilities in large vision-language models through multi-session conversations. It compares long-context and memory-augmented approaches, revealing limitations in both and motivating hybrid architectures.
From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents
Researchers introduce Memora, a benchmark that evaluates LLMs’ ability to retain, update, and forget long-term user memories over weeks-to-months conversations, revealing frequent reuse of obsolete memories.
MEME: Multi-entity & Evolving Memory Evaluation
The MEME benchmark evaluates AI memory systems across multiple entities and evolving conditions, revealing significant challenges in dependency reasoning that persist even with advanced retrieval techniques.