Tag
This paper audits the reliability of distribution-free risk control methods for selective classification in signal-domain detectors, finding that naive thresholding often exceeds its declared budget and that exchangeability violations cause certificate failures.
This paper proposes using pairwise queries to improve selective classification for binary classification, particularly where confidence estimates are inconsistent, as in LLM in-context learning. Theoretical conditions and experiments on synthetic and real datasets show that pairwise query-based algorithms achieve better accuracy-cost tradeoffs than raw confidence estimates.