COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models

arXiv cs.CL Papers

Summary

COFT is a training-free decoding method that applies token-level fairness control and conformal calibration to reduce bias in chain-of-thought reasoning of large language models, achieving 30-55% bias reduction with minimal computational overhead.

arXiv:2605.30641v1 Announce Type: new Abstract: Large language models (LLMs) can reveal and amplify societal biases during chain-of-thought (CoT) generation. We present COFT (Chain of Fair Thought), a training-free decoding method that applies token-level fairness control at decode time, with distribution-free marginal validity guarantees (under exchangeability) for any frozen causal language model. COFT operates in three stages. First, it creates a masked counterfactual prompt by replacing sensitive spans with neutral tokens. Second, it compares the factual and masked logit distributions through lightweight logit fusion to attenuate attribute-driven biases. Third, it uses dual-branch split-conformal calibration to certify per-step candidate token sets at a user-chosen risk level. We evaluate COFT across six models and multiple bias benchmarks. Our method reduces standard bias metrics by 30-55% (median 38%) while preserving task utility and language quality. Reasoning accuracies remain unchanged within run-to-run noise margins. The computational overhead is modest, equivalent to one additional cached forward pass (<=11%). COFT offers a clear, auditable path to safer CoT generation with significant bias reduction, negligible utility loss, and no requirement for retraining, auxiliary classifiers, or weight access.
Original Article
View Cached Full Text

Cached at: 06/01/26, 09:26 AM

# COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models
Source: [https://arxiv.org/abs/2605.30641](https://arxiv.org/abs/2605.30641)
[View PDF](https://arxiv.org/pdf/2605.30641)

> Abstract:Large language models \(LLMs\) can reveal and amplify societal biases during chain\-of\-thought \(CoT\) generation\. We present COFT \(Chain of Fair Thought\), a training\-free decoding method that applies token\-level fairness control at decode time, with distribution\-free marginal validity guarantees \(under exchangeability\) for any frozen causal language model\. COFT operates in three stages\. First, it creates a masked counterfactual prompt by replacing sensitive spans with neutral tokens\. Second, it compares the factual and masked logit distributions through lightweight logit fusion to attenuate attribute\-driven biases\. Third, it uses dual\-branch split\-conformal calibration to certify per\-step candidate token sets at a user\-chosen risk level\. We evaluate COFT across six models and multiple bias benchmarks\. Our method reduces standard bias metrics by 30\-55% \(median 38%\) while preserving task utility and language quality\. Reasoning accuracies remain unchanged within run\-to\-run noise margins\. The computational overhead is modest, equivalent to one additional cached forward pass \(<=11%\)\. COFT offers a clear, auditable path to safer CoT generation with significant bias reduction, negligible utility loss, and no requirement for retraining, auxiliary classifiers, or weight access\.

## Submission history

From: Arya Fayyazi \[[view email](https://arxiv.org/show-email/c4a9d3d9/2605.30641)\] **\[v1\]**Thu, 28 May 2026 22:52:15 UTC \(2,107 KB\)

Similar Articles

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

arXiv cs.CL

Proposes ProxyCoT, a training framework that improves long-context reasoning in large language models by first obtaining chain-of-thought reasoning traces on short proxy contexts (via reinforcement learning or distillation) and then grounding them in full long contexts through supervised fine-tuning. Experiments show consistent improvements over baselines with reduced computational cost.

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Hugging Face Daily Papers

This paper investigates many-shot chain-of-thought in-context learning for reasoning tasks, revealing that standard scaling rules do not transfer and proposing Curvilinear Demonstration Selection (CDS) for improved ordering, achieving up to 5.42 percentage-point gain.