sampling-design

#sampling-design

Which Pairs to Compare for LLM Post-Training?

arXiv cs.AI ↗ · 2026-06-20 Cached

This paper studies the problem of selecting which completion pairs to label for human preference feedback in LLM post-training. It formulates comparison curation as a sampling-design problem, provides theoretical bounds on DPO's policy optimality gap, and proposes practical sampling designs that improve sample efficiency over common heuristics on synthetic and real benchmarks.

0 favorites 0 likes

sampling-design

Which Pairs to Compare for LLM Post-Training?

Submit Feedback