Tag
This paper investigates whether early-token confidence signals from LLM decoding can predict reasoning quality in multi-agent debate systems, finding that confidence in the first few generated tokens is the strongest predictor of rubric-based essay scores.
This paper introduces TIDE, a novel framework that integrates trial and debate mechanisms to improve criteria-based prompt optimization for argumentative essay understanding tasks such as automated essay scoring, argument component detection, and argument relation identification. Experiments show performance improvements, highlighting the potential of combining prompt-based methods for robust argument analysis.