Latent Preference Modeling for Cross-Session Personalized Tool Calling
Summary
Introduces MPT benchmark and PRefine method for cross-session personalized tool calling that captures user choice reasoning with minimal token overhead.
View Cached Full Text
Cached at: 04/21/26, 11:27 AM
Paper page - Latent Preference Modeling for Cross-Session Personalized Tool Calling
Source: https://huggingface.co/papers/2604.17886
Abstract
Personalized tool calling in LLM-based agents is improved through memory-augmented methods that capture user choice reasoning rather than just choices, using minimal token overhead.
Users often omit essential details in their requests to LLM-based agents, resulting in under-specified inputs for tool use. This poses a fundamental challenge fortool-augmented agents, asAPI executiontypically requires complete arguments, highlighting the need forpersonalized tool calling. To study this problem, we introduce MPT, a benchmark comprising 265multi-session dialoguesthat cover three challenges:Preference Recall,Preference Induction, andPreference Transfer. We also proposePRefine, a test-time memory-augmented method that representsuser preferencesas evolving hypotheses. Through agenerate--verify--refine loop, it extracts reusable constraints from history and improves tool-calling accuracy while using only 1.24% of the tokens required by full-history prompting. These results indicate that robust personalization in agentic systems depends on memory that captures the reasons behind user choices, not just the choices themselves.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2604\.17886
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.17886 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.17886 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.17886 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
FSPO: Few-Shot Optimization of Synthetic Preferences Personalizes to Real Users
FSPO proposes a few-shot preference optimization algorithm for LLM personalization that reframes reward modeling as meta-learning, enabling models to quickly infer personalized reward functions from limited user preferences. The method achieves 87% personalization performance on synthetic users and 70% on real users through careful synthetic preference dataset construction.
PersonaVLM: Long-Term Personalized Multimodal LLMs
PersonaVLM introduces a personalized multimodal LLM framework that enables long-term user adaptation through memory retention, multi-turn reasoning, and response alignment, outperforming GPT-4o by 5.2% on the new Persona-MME benchmark.
Preference Estimation via Opponent Modeling in Multi-Agent Negotiation
This paper proposes a novel preference estimation method that integrates natural language information from LLMs into a structured Bayesian opponent modeling framework for multi-agent negotiation. The approach leverages LLMs to extract qualitative cues from utterances and convert them into probabilistic formats, demonstrating improved agreement rates and preference estimation accuracy on multi-party negotiation benchmarks.
IPQA: A Benchmark for Core Intent Identification in Personalized Question Answering
IPQA introduces a benchmark for evaluating core intent identification in personalized question answering, addressing a gap in existing metrics that focus on response quality rather than intent understanding. The paper presents a dataset construction methodology grounded in bounded rationality and demonstrates that state-of-the-art language models struggle with identifying user-prioritized intents from answer selection patterns.
Inference-Time Budget Control for LLM Search Agents
This paper introduces a two-stage inference-time budget control method for LLM search agents, using Value-of-Information scores to optimize tool-call and token allocation during multi-hop question answering.