preference-learning

#preference-learning

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

arXiv cs.AI ↗ · yesterday Cached

This paper introduces CLIPR, a framework that learns transferable latent user preferences from minimal conversational input to improve human-aligned decision making in LLMs.

0 favorites 0 likes

#preference-learning

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces xi-DPO, a novel preference optimization method that reformulates the objective to minimize distance to optimal ratio reward margins, addressing hyperparameter tuning challenges in SimPO. Experimental results show that xi-DPO outperforms existing methods on open benchmarks.

0 favorites 0 likes

#preference-learning

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

arXiv cs.CL ↗ · 2026-04-20 Cached

WildFeedback is a novel framework that leverages in-situ user feedback from actual LLM conversations to automatically create preference datasets for aligning language models with human preferences, addressing scalability and bias issues in traditional annotation-based alignment methods.

0 favorites 0 likes

preference-learning

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Submit Feedback