Tag
This paper investigates whether early-token confidence signals from LLM decoding can predict reasoning quality in multi-agent debate systems, finding that confidence in the first few generated tokens is the strongest predictor of rubric-based essay scores.