Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Hugging Face Daily Papers Papers

Summary

This paper introduces PerMemBench, the first benchmark for evaluating personalized memory systems in LLM-based agents, and proposes a session-level storage gating framework that adapts memory policies to individual user contexts.

Existing large language model (LLM) based memory systems apply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limited memory budget on transient interactions while failing to preserve critical context for long horizon tasks. To address this gap, we investigate an underexplored question: can LLM based memory systems learn personalized memory policies? We introduce PerMemBench, the first benchmark for evaluating personalized memory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposing session level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.
Original Article
View Cached Full Text

Cached at: 05/27/26, 02:48 AM

Paper page - Personalize-then-Store: Benchmarking and Learning Personalized Memory for Long-horizon Agents

Source: https://huggingface.co/papers/2605.25535

Abstract

Large language model-based memory systems can benefit from personalized policies that adapt to individual user contexts, though accurate implementation remains challenging.

Existinglarge language model(LLM) basedmemory systemsapply universal, static policies that overlook a fundamental reality: the contexts that are worth storing in memory are different across users. This misalignment wastes limitedmemory budgetontransient interactionswhile failing to preserve critical context forlong horizon tasks. To address this gap, we investigate an underexplored question: can LLM basedmemory systemslearnpersonalized memory policies? We introducePerMemBench, the first benchmark for evaluating personalizedmemory systems, featuring multi year, multi domain interaction histories across diverse user personas. We further present the first empirical study of memory personalization, proposingsession level storage gating, a lightweight framework that selectively bypasses memory operations for transient sessions. Our study confirms that personalization yields substantial retention gains under perfect gating, yet reveals that accurate gating remains an open and critical challenge.

View arXiv pageView PDFGitHub1Add to collection

Get this paper in your agent:

hf papers read 2605\.25535

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.25535 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.25535 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.25535 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

Hugging Face Daily Papers

MemPrivacy is a research paper introducing a framework for privacy-preserving personalized memory management in edge-cloud AI agents, using type-aware placeholders to protect sensitive data while maintaining semantic utility. It includes a new benchmark dataset and demonstrates superior performance over general-purpose models like GPT-5.2 and Gemini-3.1-Pro.

MemGym: a Long-Horizon Memory Environment for LLM Agents

arXiv cs.CL

MemGym is a benchmark for evaluating memory formation in LLM agents over long-horizon tasks, unifying existing agent gyms and synthetic pipelines with memory-isolated scores. It spans tool-use dialogue, multi-turn search, coding, and computer use, and includes a lightweight reward model (MemRM) for efficient evaluation.