Tag
Presented DV-DPO, a method to fine-tune Qwen2.5-7B on domain-specific tasks using only ~$3 in API calls and zero human labelers, achieving 96% composite performance of Claude Haiku via adversarial cross-examination.