Latent Preference Modeling for Cross-Session Personalized Tool Calling

Hugging Face Daily Papers 04/20/26, 12:00 AM Papers

Summary

Introduces MPT benchmark and PRefine method for cross-session personalized tool calling that captures user choice reasoning with minimal token overhead.

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge for tool-augmented agents, as API execution typically requires complete arguments, highlighting the need for personalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265 multi-session dialogues that cover three challenges: Preference Recall, Preference Induction, and Preference Transfer. We also propose PRefine, a test-time memory-augmented method that represents user preferences as evolving hypotheses. Through a generate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 04/21/26, 11:27 AM

Paper page - Latent Preference Modeling for Cross-Session Personalized Tool Calling

Source: https://huggingface.co/papers/2604.17886

Abstract

Personalized tool calling in LLM-based agents is improved through memory-augmented methods that capture user choice reasoning rather than just choices, using minimal token overhead.

Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge fortool-augmented agents, asAPI executiontypically requires complete arguments, highlighting the need forpersonalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265multi-session dialoguesthat cover three challenges:Preference Recall,Preference Induction, andPreference Transfer. We also proposePRefine, a test-time memory-augmented method that representsuser preferencesas evolving hypotheses. Through agenerate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2604\.17886

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2604.17886 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2604.17886 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2604.17886 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Latent Preference Modeling for Cross-Session Personalized Tool Calling

Paper page - Latent Preference Modeling for Cross-Session Personalized Tool Calling

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users

PersonaVLM: Long-Term Personalized Multimodal LLMs

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering

Inference-Time Budget Control for LLM Search Agents

Submit Feedback

Similar Articles

FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users

PersonaVLM: Long-Term Personalized Multimodal LLMs

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering

Inference-Time Budget Control for LLM Search Agents