Implicit Preference Alignment for Human Image Animation
Summary
This paper introduces Implicit Preference Alignment (IPA), a data-efficient post-training framework that improves hand motion generation in human image animation without requiring paired preference data. It utilizes implicit reward maximization and hand-aware local optimization to enhance generation quality while reducing data curation costs.
View Cached Full Text
Cached at: 05/13/26, 08:11 AM
Paper page - Implicit Preference Alignment for Human Image Animation
Source: https://huggingface.co/papers/2605.07545
Abstract
Implicit Preference Alignment (IPA) addresses hand motion generation challenges through data-efficient post-training that eliminates need for paired preference data while using hand-aware local optimization for improved quality.
Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. Whilereinforcement learning from human feedback, particularlydirect preference optimization, offers a potential solution, it necessitates the construction of strictpreference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we proposeImplicit Preference Alignment(IPA), a data-efficientpost-training frameworkthat eliminates the need for paired preference data. Theoretically grounded inimplicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce aHand-Aware Local Optimizationmechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA
View arXiv pageView PDFProject pageGitHubAdd to collection
Get this paper in your agent:
hf papers read 2605\.07545
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.07545 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.07545 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.07545 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
This paper introduces IAPO, a reinforcement learning algorithm that improves tool-calling capabilities in multimodal small language models by aligning input attribution with a stronger teacher. Experiments on Qwen2.5-VL-3B show an average 3% improvement in visual question answering accuracy across six test sets.
See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
This paper introduces OmniManim, a render-feedback-aware framework for generating educational animations from natural language descriptions using large language models. It addresses visual defects like element overlap and misalignment by incorporating explicit visual planning, post-render diagnostics, and localized repair, demonstrating improved render quality on newly constructed datasets.
From Correctness to Preference: A Framework for Personalized Agentic Reinforcement Learning
This paper proposes a unified framework for personalized agentic reinforcement learning that decouples generic task rewards from personalized preference rewards, introducing PARPO and PSGM for preference-aligned policy optimization and skill retrieval.
Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs
This paper introduces PNAPO, an offline preference optimization framework for rectified flow models that augments preference data with noise samples and uses dynamic regularization to improve training efficiency and sample efficiency.
Learning from human preferences
OpenAI presents a method for training AI agents using human preference feedback, where an agent learns reward functions from human comparisons of behavior trajectories and uses reinforcement learning to optimize for the inferred goals. The approach demonstrates strong sample efficiency, requiring less than 1000 bits of human feedback to train an agent to perform a backflip.