confidence-rationale-alignment

#confidence-rationale-alignment

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper introduces CoRA, a GRPO-based reinforcement learning framework that aligns LLM confidence with generated rationales to improve the reliability of chain-of-thought reasoning, achieving up to 26.51% reduction in misalignment error across multiple benchmarks.

0 favorites 0 likes

confidence-rationale-alignment

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

Submit Feedback