Reliable Chain-of-Thought via Prefix Consistency

Hugging Face Daily Papers 05/08/26, 12:00 AM Papers

Summary

This paper introduces 'prefix consistency,' a method that weights candidate responses in Chain-of-Thought reasoning based on answer reproduction rates during trace regeneration. It achieves high accuracy with significantly fewer tokens than standard majority voting across various reasoning models and benchmarks.

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

Original Article

View Cached Full Text

Cached at: 05/13/26, 12:14 PM

Paper page - Reliable Chain-of-Thought via Prefix Consistency

Source: https://huggingface.co/papers/2605.07654

Abstract

Prefix consistency uses answer reproduction rates under trace regeneration to weight candidate responses, achieving high accuracy with significantly fewer tokens than standard majority voting.

Large Language Models often improve accuracy on reasoning tasks by sampling multipleChain-of-Thought(CoT) traces and aggregating them withmajority voting(MV), atest-time techniquecalledself-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal,prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks,prefix consistencyis the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

View arXiv page View PDF Project page GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.07654

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.07654 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.07654 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.07654 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Reliable Chain-of-Thought via Prefix Consistency

Paper page - Reliable Chain-of-Thought via Prefix Consistency

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

ACIL: Auto Chain of Thoughts for In-Context Learning

Submit Feedback

Similar Articles

Revisiting Chain-of-Thought Reasoning under Limited Supervision: Semi-supervised Chain-of-Thought Learning

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning

ACIL: Auto Chain of Thoughts for In-Context Learning