faithfulness

#faithfulness

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration

arXiv cs.LG ↗ · 15h ago Cached

This paper studies how pruning attention layers in LLMs affects explanation faithfulness and confidence calibration, finding that accuracy often remains high but interpretability and reliability degrade, highlighting a misalignment between model confidence, interpretability, and accuracy.

0 favorites 0 likes

#faithfulness

Cycle-Consistent Neural Explanation of Formal Verification Certificates

arXiv cs.AI ↗ · yesterday Cached

This paper proposes a cycle-consistent neural architecture that generates faithful natural language explanations of formal verification certificates, achieving 90% soundness and 860x faster inference than LLM baselines.

0 favorites 0 likes

#faithfulness

Faithful by Construction: Claim-Anchored Attribution for Multi-Document Summarization

arXiv cs.CL ↗ · yesterday Cached

This paper introduces CAMS, a modular multi-document summarization framework that extracts atomic claims with token-level provenance, clusters equivalent claims, and rewrites them into summaries with fine-grained, multi-source traceability, significantly improving faithfulness and citation precision.

0 favorites 0 likes

#faithfulness

Beyond Accuracy: Measuring Bias Acknowledgment in Chain-of-Thought Reasoning for Responsible AI Evaluation

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper introduces a trace-level diagnostic for evaluating chain-of-thought reasoning, separating susceptibility (whether bias changes the answer) from acknowledgment (whether the trace flags the biased input). Experiments show models like GPT-4o and Claude Sonnet 4 have similar susceptibility rates but very different acknowledgment rates, highlighting a blind spot in accuracy-only evaluation.

0 favorites 0 likes

#faithfulness

Layer-Resolved Optimal Transport for Hallucination Detection in NMT and Abstractive Summarization

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper extends optimal transport-based hallucination detection to all decoder layers in NMT and abstractive summarization, finding that detection is concentrated in early layers and that the geometric signal transfers poorly to summarization due to faithfulness failures not detectable via attention concentration.

0 favorites 0 likes

#faithfulness

Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper proposes Detect–Remask–Repair, a diffusion-based framework for localized faithfulness repair in summarization when contexts evolve, and introduces the StreamSum benchmark for evaluating such settings. Experiments show it offers controllable trade-offs between faithfulness, speed, and content preservation.

0 favorites 0 likes

#faithfulness

LatticeBridge: Rare-Event Sequential Inference for Faithful Structured Sequence Synthesis

arXiv cs.CL ↗ · 2026-06-11 Cached

LatticeBridge proposes a twisted sequential Monte Carlo decoder for structured sequence generation that improves constraint satisfaction by treating the problem as rare-event inference, outperforming greedy and beam baselines on CommonGen, E2E NLG, and WikiBio.

0 favorites 0 likes

#faithfulness

Explicit Evidence Grounding via Structured Inline Citation Generation

arXiv cs.CL ↗ · 2026-06-08 Cached

This paper introduces FullCite, a framework for generating structured inline citations that link each claim to both its source document and specific evidence spans. Evaluated on three QA benchmarks (ASQA, BioASQ, ExpertQA), it finds that while LLMs are good at document-level attribution, they struggle with precise evidence span identification.

0 favorites 0 likes

#faithfulness

Evaluating Bivariate Causal Statements Based on Mutual Compatibility

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper introduces compatibility and incompatibility scores for evaluating collections of bivariate causal statements without relying on faithfulness, and demonstrates their applicability by analyzing causal claims from large language models.

0 favorites 0 likes

#faithfulness

OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

arXiv cs.CL ↗ · 2026-06-02 Cached

OCC-RAG introduces a family of compact small language models optimized for faithful question answering, using a novel pipeline to synthesize multi-context multi-hop QA data. The models demonstrate competitive performance against larger models on reasoning and faithfulness benchmarks.

0 favorites 0 likes

#faithfulness

The Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial Pressure

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper identifies a novel failure mode in reasoning models called unfaithful capitulation, where the chain-of-thought remains factually correct across adversarial multi-turn dialogues but the final answer flips wrong, highlighting limitations of current evaluation methods.

0 favorites 0 likes

#faithfulness

Measuring the Depth of LLM Unlearning via Activation Patching

arXiv cs.CL ↗ · 2026-05-26 Cached

The paper proposes the Unlearning Depth Score (UDS), a metric that uses activation patching to quantify how thoroughly target knowledge is erased from LLMs, achieving state-of-the-art faithfulness and robustness across multiple unlearning methods.

0 favorites 0 likes

#faithfulness

Faithfulness as Information Flow: Evaluating and Training Faithful Chain-of-Thought Reasoning

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper proposes a framework to evaluate and improve faithfulness of chain-of-thought reasoning by controlling information flow, using entropy-based, KL-divergence, and gradient-based diagnostics, and introduces training interventions (attention masking, gradient masking, adversarial perturbations) that make reasoning more transparent and reduce shortcut reliance.

0 favorites 0 likes

#faithfulness

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth

Hugging Face Daily Papers ↗ · 2026-05-24 Cached

This paper introduces BonaFide, a benchmark of 3,066 labeled chain-of-thought examples across 13 tasks and 10 models, and systematically evaluates faithfulness metrics, showing that most perform near chance and have significant limitations in reliability and efficiency.

0 favorites 0 likes

#faithfulness

Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

arXiv cs.CL ↗ · 2026-05-22 Cached

Faithful-MR1 is a training framework that improves faithful multimodal reasoning in MLLMs by anchoring visual attention via a <Focus> token and reinforcing faithful use through counterfactual image intervention. It outperforms baselines on Qwen2.5-VL backbones with less training data.

0 favorites 0 likes

#faithfulness

Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super Resolution

Hugging Face Daily Papers ↗ · 2026-05-22 Cached

This paper proposes an adversarial Sobolev alignment method for faithful image super resolution, aiming to reduce artifacts and improve fidelity.

0 favorites 0 likes

#faithfulness

Measuring AI Faithfulness-For Better or For Worse

Reddit r/AI_Agents ↗ · 2026-05-20

This article discusses the importance of faithfulness in LLM optimization, introducing a Structural Fidelity Score that measures drift across word overlap, constraint survival, and task-type match to ensure prompt optimization does not sacrifice intent.

0 favorites 0 likes

#faithfulness

Retrieval-Augmented Linguistic Calibration

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper proposes Retrieval-Augmented Linguistic Calibration (RALC), a post-hoc pipeline for calibrating confidence signals in LLMs by modeling linguistic confidence as a distribution and using retrieval-augmented rewriting. It introduces Faithfulness Divergence metric and shows significant improvements across benchmarks.

0 favorites 0 likes

#faithfulness

Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper investigates the trade-off between plausibility and faithfulness in cross-lingual explanations from LLMs, finding that English-pivot explanations achieve higher span agreement with human rationales but suffer reduced causal faithfulness compared to native-language explanations.

0 favorites 0 likes

#faithfulness

Fluency and Faithfulness in Human and Machine Literary Translation

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper empirically examines the tradeoff between fluency and faithfulness in literary translation using 130,486 paragraphs from 106 novels, finding a consistent negative correlation for human and Google Translate translations, but weaker for TranslateGemma.

0 favorites 0 likes

faithfulness

Submit Feedback