sampling-design

Tag

Cards List
#sampling-design

Which Pairs to Compare for LLM Post-Training?

arXiv cs.AI · 2026-06-20 Cached

This paper studies the problem of selecting which completion pairs to label for human preference feedback in LLM post-training. It formulates comparison curation as a sampling-design problem, provides theoretical bounds on DPO's policy optimality gap, and proposes practical sampling designs that improve sample efficiency over common heuristics on synthetic and real benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback