coverage

#coverage

When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling

arXiv cs.LG ↗ · 2026-06-30 Cached

This paper identifies the 'modal ceiling' and 'correlation ceiling' in test-time scaling for reasoning models, showing that beyond a few dozen samples, additional sampling does not improve selection accuracy and can even harm it, highlighting the identifiability gap between generating and recognizing correct answers.

0 favorites 0 likes

#coverage

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Hugging Face Daily Papers ↗ · 2026-06-08 Cached

This paper identifies a blind spot in reference-free faithfulness metrics: they only measure precision (whether claims are supported) but not recall (coverage of relevant facts). The authors introduce a complete-oracle evaluation using Formula 1 telemetry and weather data, showing that high-precision models often have poor coverage, and propose a combined metric.

0 favorites 0 likes

coverage

When More Sampling Hurts: The Modal Ceiling and Correlation Ceiling of Test-Time Scaling

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Submit Feedback