PersonaVLM: Long-Term Personalized Multimodal LLMs

Hugging Face Daily Papers 03/20/26, 12:00 AM Papers

personalization multimodal-llm long-term-memory vlm agent-framework response-alignment benchmark

Summary

PersonaVLM introduces a personalized multimodal LLM framework that enables long-term user adaptation through memory retention, multi-turn reasoning, and response alignment, outperforming GPT-4o by 5.2% on the new Persona-MME benchmark.

Multimodal Large Language Models (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users' evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework designed for long-term personalization. It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories from interactions, consolidating them into a personalized database. (b) Reasoning: It conducts multi-turn reasoning by retrieving and integrating relevant memories from the database. (c) Response Alignment: It infers the user's evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method's effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/20/26, 08:27 AM

Paper page - PersonaVLM: Long-Term Personalized Multimodal LLMs

Source: https://huggingface.co/papers/2604.13074

Abstract

A novel personalized multimodal language model framework called PersonaVLM is introduced that enables long-term personalization through memory retention, multi-turn reasoning, and response alignment capabilities.

Multimodal Large Language Models (https://huggingface.co/papers?q=Multimodal%20Large%20Language%20Models) (MLLMs) serve as daily assistants for millions. However, their ability to generate responses aligned with individual preferences remains limited. Prior approaches enable only static, single-turn personalization through input augmentation or output alignment, and thus fail to capture users’ evolving preferences and personality over time (see Fig.1). In this paper, we introduce PersonaVLM, an innovative personalized multimodal agent framework (https://huggingface.co/papers?q=personalized%20multimodal%20agent%20framework) designed for long-term personalization (https://huggingface.co/papers?q=long-term%20personalization). It transforms a general-purpose MLLM into a personalized assistant by integrating three key capabilities: (a) Remembering: It proactively extracts and summarizes chronological multimodal memories (https://huggingface.co/papers?q=chronological%20multimodal%20memories) from interactions, consolidating them into a personalized database (https://huggingface.co/papers?q=personalized%20database). (b) Reasoning: It conducts multi-turn reasoning (https://huggingface.co/papers?q=multi-turn%20reasoning) by retrieving and integrating relevant memories from the database. (c) Response Alignment (https://huggingface.co/papers?q=Response%20Alignment): It infers the user’s evolving personality throughout long-term interactions to ensure outputs remain aligned with their unique characteristics. For evaluation, we establish Persona-MME, a comprehensive benchmark comprising over 2,000 curated interaction cases, designed to assess long-term MLLM personalization across seven key aspects and 14 fine-grained tasks. Extensive experiments validate our method’s effectiveness, improving the baseline by 22.4% (Persona-MME) and 9.8% (PERSONAMEM) under a 128k context, while outperforming GPT-4o by 5.2% and 2.0%, respectively. Project page: https://PersonaVLM.github.io.

View arXiv page (https://arxiv.org/abs/2604.13074) View PDF (https://arxiv.org/pdf/2604.13074) Project page (https://personavlm.github.io/) GitHub (https://github.com/MiG-NJU/PersonaVLM) Add to collection (https://huggingface.co/login?next=%2Fpapers%2F2604.13074)

Get this paper in your agent:

hf papers read 2604.13074

Don’t have the latest CLI? curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

ClareNie/PersonaVLM-8B • Updated 4 days ago • 37 • 7 (https://huggingface.co/ClareNie/PersonaVLM)

Datasets citing this paper 2

ClareNie/Persona-MME Viewer • Updated 4 days ago • 4.54k • 36.6k • 2 (https://huggingface.co/datasets/ClareNie/Persona-MME)

ClareNie/PersonaVLM-Dataset Viewer • Updated 4 days ago • 33.3k • 74 • 3 (https://huggingface.co/datasets/ClareNie/PersonaVLM-Dataset)

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.13074 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection (https://huggingface.co/new-collection) to link it from this page.

PersonaVLM: Long-Term Personalized Multimodal LLMs

Paper page - PersonaVLM: Long-Term Personalized Multimodal LLMs

Abstract

Models citing this paper 1

ClareNie/PersonaVLM-8B • Updated 4 days ago • 37 • 7 (https://huggingface.co/ClareNie/PersonaVLM)

Datasets citing this paper 2

ClareNie/Persona-MME Viewer • Updated 4 days ago • 4.54k • 36.6k • 2 (https://huggingface.co/datasets/ClareNie/Persona-MME)

ClareNie/PersonaVLM-Dataset Viewer • Updated 4 days ago • 33.3k • 74 • 3 (https://huggingface.co/datasets/ClareNie/PersonaVLM-Dataset)

Spaces citing this paper 0

Collections including this paper 0

Similar Articles

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]

Submit Feedback

Similar Articles

From Recall to Forgetting: Benchmarking Long-Term Memory for Personalized Agents

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]