Fine-tuned Qwen2.5-7B to 96% of Claude Haiku on a domain-specific task using ~$3 of API calls and zero human labelers

Reddit r/LocalLLaMA 06/10/26, 12:01 AM Papers

fine-tuning qwen dpo domain-specific decision-reasoning open-source local-model

Summary

Presented DV-DPO, a method to fine-tune Qwen2.5-7B on domain-specific tasks using only ~$3 in API calls and zero human labelers, achieving 96% composite performance of Claude Haiku via adversarial cross-examination.

Built a decision-reasoning engine (Orlog) and wanted to fine-tune a local model for it instead of paying per-call forever. **The method (DV-DPO):** * Run a 3-voice council on each question, produce a synthesis * Cross-examine: losing voices challenge the synthesis * If synthesis gets revised → DPO pair (chosen=post-revision, rejected=pre-revision) * If synthesis holds → no pair (good reasoning produces nothing to learn from) Only genuine revisions under adversarial pressure become training signal. Not format preference, not sampling variance. **Results:** * 1,040 pairs total (\~$3 at Haiku rates) * Head-to-head vs Claude Haiku: Format 100%, Commits 100%, Context 89%, Composite 96% * Latency: 11s vs 3s (T4 GPU, 4-bit quantized) * Adversarial failure rate: 2% on 96 targeted questions **Autonomous loop now running:** failure\_detector → auto\_red\_team → DPO pairs → retrain → redeploy → eval. v5 pairs accumulating. GGUF ready for Ollama. Happy to share the pipeline if there's interest.

Original Article

Similar Articles

@cniongolo: I’m not sure people realize yet that you can actually run Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF on a dua…

X AI KOLs Following

Demonstrates running a custom Qwen model (Qwen3.6-35B-A3B-Claude-4.7-Opus-abliterated-MTP-GGUF) on dual Nvidia RTX PRO 6000 Blackwell GPUs at 195 tokens per second using Hugging Face Inference.

LinguIUTics at PsyDefDetect: Iterative Imbalance-Aware Fine-tuning of Qwen3-8B for Psychological Defense Mechanism Classification

arXiv cs.CL

This paper presents an iterative imbalance-aware fine-tuning approach using Qwen3-8B with QLoRA for psychological defense mechanism classification, achieving a macro F1 of 0.3917 and ranking 4th out of 21 teams in the PsyDefDetect 2026 shared task.

Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

Hacker News Top

A developer fine-tunes a small Qwen 3 0.6B model using the Unsloth framework to categorize household questions, achieving good results with only 850 training examples.

Qwen 3.6 27B on DeepSWE

Reddit r/LocalLLaMA

Qwen 3.6 27B scored 2% on the DeepSWE benchmark, placing 18/20 above Haiku 4.5 and Minimax M2.7, highlighting the gap between local and leading-edge models.

@rasbt: Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training …