SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
Summary
SAVOIR framework applies cooperative game theory and Shapley values to train language agents with improved social intelligence, achieving SOTA on SOTOPIA benchmark and matching GPT-4o performance.
View Cached Full Text
Cached at: 04/23/26, 07:47 AM
Paper page - SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
Source: https://huggingface.co/papers/2604.18982 Authors:
,
,
,
,
,
,
,
,
,
,
Abstract
SAVOIR framework uses cooperative game theory to improve social intelligence in language agents by combining expected utility shifts and Shapley values for better credit assignment in dialogue systems.
Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge forlanguage agents. Training such agents viareinforcement learningrequires solving thecredit assignment problem: determining how individual utterances contribute to multi-turndialogue outcomes. Existing approaches directly employlanguage modelsto distributeepisode-level rewards, yielding attributions that are retrospective and lack theoretical grounding. We propose SAVOIR (ShApley Value fOr SocIal RL), a novel principled framework grounded incooperative game theory. Our approach combines two complementary principles:expected utility shiftsevaluation from retrospective attribution to prospective valuation, capturing an utterance’s strategic potential for enabling favorable future trajectories;Shapley valuesensure fair credit distribution with axiomatic guarantees of efficiency, symmetry, and marginality. Experiments on theSOTOPIA benchmarkdemonstrate that SAVOIR achieves new state-of-the-art performance across all evaluation settings, with our 7B model matching or exceeding proprietary models including GPT-4o and Claude-3.5-Sonnet. Notably, even large reasoning models consistently underperform, suggestingsocial intelligencerequires qualitatively different capabilities than analytical reasoning.
View arXiv pageView PDFGitHubAdd to collection
Get this paper in your agent:
hf papers read 2604\.18982
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2604.18982 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2604.18982 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2604.18982 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
ALSO: Adversarial Online Strategy Optimization for Social Agents
ALSO introduces a framework for online strategy optimization in multi-agent social simulation, formulating multi-turn interaction as an adversarial bandit problem and using a neural surrogate for reward prediction. Experiments on the Sotopia benchmark show it outperforms static baselines and existing optimization methods.
From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
This paper proposes SoVA, a framework using GraphRAG to align LLM-based agents with human social values by converting psychological theories into prescriptive instructions. Experiments on the DAILYDILEMMAS benchmark show significant improvements over prompt-based baselines.
SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
This paper introduces SPARK, a self-play reinforcement learning framework that leverages knowledge graphs derived from scientific literature to improve relational reasoning in vision-language models.
Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
This paper introduces a multi-agent environment based on the board game Fog of Love to evaluate affinity-based reinforcement learning for instilling virtuous behavior in AI agents. The authors demonstrate that localized affinities improve agent performance in both competitive and cooperative objectives, advancing machine ethics research beyond simple grid-world environments.
Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas
This paper presents a two-level autoresearch framework where an outer-loop AI agent autonomously optimizes inner-loop LLM policy-synthesis pipelines for multi-agent sequential social dilemmas, achieving superior performance and discovering objective-specific mechanisms like fairness under a maximin welfare objective.