preference-learning

Tag

Cards List
#preference-learning

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

arXiv cs.AI · yesterday Cached

This paper introduces CLIPR, a framework that learns transferable latent user preferences from minimal conversational input to improve human-aligned decision making in LLMs.

0 favorites 0 likes
#preference-learning

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

arXiv cs.LG · 2d ago Cached

This paper introduces xi-DPO, a novel preference optimization method that reformulates the objective to minimize distance to optimal ratio reward margins, addressing hyperparameter tuning challenges in SimPO. Experimental results show that xi-DPO outperforms existing methods on open benchmarks.

0 favorites 0 likes
#preference-learning

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

arXiv cs.CL · 2026-04-20 Cached

WildFeedback is a novel framework that leverages in-situ user feedback from actual LLM conversations to automatically create preference datasets for aligning language models with human preferences, addressing scalability and bias issues in traditional annotation-based alignment methods.

0 favorites 0 likes
← Back to home

Submit Feedback