Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Hugging Face Daily Papers 05/25/26, 12:00 AM Papers

memory-systems personalization llm-agents benchmark long-horizon storage-gating

Summary

This paper introduces PerMemBench, the first benchmark for evaluating personalized memory systems in LLM-based agents, and proposes a session-level storage gating framework that adapts memory policies to individual user contexts.

Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limited memory budget on transient interactions while failing to preserve critical context for long horizon tasks. To address this gap, we investigate an underexplored question: can LLM based memory systems learn personalized memory policies? We introduce PerMemBench, the first benchmark for evaluating personalized memory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposing session level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.

Original Article

View Cached Full Text

Cached at: 05/27/26, 02:48 AM

Paper page - Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Source: https://huggingface.co/papers/2605.25535

Abstract

Large language model-based memory systems can benefit from personalized policies that adapt to individual user contexts, though accurate implementation remains challenging.

Existinglarge language model(LLM) basedmemory systemsapply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limitedmemory budgetontransient interactionswhile failing to preserve critical context forlong horizon tasks. To address this gap, we investigate an underexplored question: can LLM basedmemory systemslearnpersonalized memory policies? We introducePerMemBench, the first benchmark for evaluating personalizedmemory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposingsession level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.

View arXiv page View PDF GitHub1 Add to collection

Get this paper in your agent:

hf papers read 2605\.25535

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.25535 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.25535 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.25535 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Paper page - Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

MemGym: a Long-Horizon Memory Environment for LLM Agents

AdMem: Advanced Memory for Task-solving Agents

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

Submit Feedback

Similar Articles

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

MemGym: a Long-Horizon Memory Environment for LLM Agents

AdMem: Advanced Memory for Task-solving Agents

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues