PersonaVLM: Long-Term Personalized Multimodal LLMs
Summary
PersonaVLM introduces a personalized multimodal LLM framework that enables long-term user adaptation through memory retention, multi-turn reasoning, and response alignment, outperforming GPT-4o by 5.2% on the new Persona-MME benchmark.
View Cached Full Text
Cached at: 04/20/26, 08:27 AM
Paper page - PersonaVLM: Long-Term Personalized Multimodal LLMs
Source: https://huggingface.co/papers/2604.13074
Abstract
A novel personalized multimodal language model framework called PersonaVLM is introduced that enables long-term personalization through memory retention, multi-turn reasoning, and response alignment capabilities.
Multimodal Large Language Models (https://huggingface.co/papers?q=Multimodal%20Large%20Language%20Models) (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users’ evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework (https://huggingface.co/papers?q=personalized%20multimodal%20agent%20framework) designed for long-term personalization (https://huggingface.co/papers?q=long-term%20personalization). It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories (https://huggingface.co/papers?q=chronological%20multimodal%20memories) from interactions, consolidating them into a personalized database (https://huggingface.co/papers?q=personalized%20database). (b) Reasoning: It conducts multi-turn reasoning (https://huggingface.co/papers?q=multi-turn%20reasoning) by retrieving and integrating relevant memories from the database. (c) Response Alignment (https://huggingface.co/papers?q=Response%20Alignment): It infers the user’s evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method’s effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.
View arXiv page (https://arxiv.org/abs/2604.13074) View PDF (https://arxiv.org/pdf/2604.13074) Project page (https://personavlm.github.io/) GitHub (https://github.com/MiG-NJU/PersonaVLM) Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2604.13074)
Get this paper in your agent:
hf papers read 2604.13074
Don’t have the latest CLI? curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 1
ClareNie/PersonaVLM-8B • Updated 4 days ago • 37 • 7 (https://huggingface.co/ClareNie/PersonaVLM)
Datasets citing this paper 2
ClareNie/Persona-MME Viewer • Updated 4 days ago • 4.54k • 36.6k • 2 (https://huggingface.co/datasets/ClareNie/Persona-MME)
ClareNie/PersonaVLM-Dataset Viewer • Updated 4 days ago • 33.3k • 74 • 3 (https://huggingface.co/datasets/ClareNie/PersonaVLM-Dataset)
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2604.13074 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a collection (https://huggingface.co/new-collection) to link it from this page.
Similar Articles
From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents
Researchers introduce Memora, a benchmark that evaluates LLMs’ ability to retain, update, and forget long-term user memories over weeks-to-months conversations, revealing frequent reuse of obsolete memories.
Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning
This paper investigates whether assigning personas to large language models induces human-like motivated reasoning, finding that persona-assigned LLMs show up to 9% reduced veracity discernment and are up to 90% more likely to evaluate scientific evidence in ways congruent with their induced political identity, with prompt-based debiasing largely ineffective.
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Mem0 introduces a scalable memory-centric architecture using graph-based representations to improve long-term conversational coherence in LLMs, significantly reducing latency and token costs while outperforming existing memory systems.
Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks
Researchers introduce BEHEMOTH benchmark and CluE cluster-based prompt optimization to enable LLMs to extract and retain heterogeneous memory across diverse tasks, achieving 9% gains over prior self-evolving frameworks.
Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]
This paper introduces a Fast-Slow Training framework for LLMs that combines parameter updates with optimized context to improve sample efficiency and reduce catastrophic forgetting during continual learning.