explainability

#explainability

Retrieval-Warmed Energy-Based Reasoning: A Five-Arm Ablation Methodology for Diffusion-as-Inference on Structured Reasoning Tasks

arXiv cs.LG ↗ · 2d ago Cached

This paper presents a five-arm ablation methodology for diagnosing which component of retrieval-warmed energy-based reasoning (RW-EBR) drives performance gains, applied to structured reasoning tasks like graph reachability and Sudoku. The method separates effects of class-prior bias, stochastic warm-starting, and graph-aligned value reuse.

0 favorites 0 likes

#explainability

ProvenAI: Provenance-Native Traces of Evidence in Generated Answers

arXiv cs.CL ↗ · 2d ago Cached

ProvenAI introduces a framework for decomposing transparency in multi-hop question answering into three independently measurable layers: answer correctness, citation fidelity, and per-document influence, revealing a citation-influence gap where cited sources may have weak influence while uncited sources significantly shape the output.

0 favorites 0 likes

#explainability

@MSFTResearch: Researchers introduce generative causal testing, which translates black box models into clear hypotheses and verifies t…

X AI KOLs Following ↗ · 3d ago Cached

Microsoft Research and collaborators introduce generative causal testing (GCT), a method that distills black-box brain prediction models into testable explanations and validates them with fMRI experiments, revealing specific brain region responses to language concepts.

0 favorites 0 likes

#explainability

What's in an Earth Embedding? An Explainability Analysis of Location Encoders

arXiv cs.LG ↗ · 3d ago Cached

This paper introduces methods to decompose location embeddings from geographic implicit neural representations into human-interpretable features, such as sparse latent concepts, natural language concepts, and visual features, revealing geographic structures like forests and urban areas.

0 favorites 0 likes

#explainability

When Multi-Sensor Fusion Fails to Generalize: Cattle Posture Classification Under Animal-Level and Temporal Distribution Shift

arXiv cs.LG ↗ · 3d ago Cached

This paper evaluates the robustness of multi-sensor fusion for cattle posture classification under temporal distribution shift, finding that multimodal models suffer significant performance drops and that simpler single-sensor models generalize better, highlighting shortcut learning issues.

0 favorites 0 likes

#explainability

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration

arXiv cs.LG ↗ · 3d ago Cached

This paper studies how pruning attention layers in LLMs affects explanation faithfulness and confidence calibration, finding that accuracy often remains high but interpretability and reliability degrade, highlighting a misalignment between model confidence, interpretability, and accuracy.

0 favorites 0 likes

#explainability

How Complexity Contributes to Learning Opacity in Machine Learning

arXiv cs.LG ↗ · 3d ago Cached

This paper analyzes why machine learning, particularly neural networks, remains opaque in its learning process by framing it as a complex dynamical system, identifying three key properties that contribute to learning opacity, and arguing that some sources may be irreducible.

0 favorites 0 likes

#explainability

A Definition of Good Explanations and the Challenges Explaining LLM Outputs

arXiv cs.AI ↗ · 2026-06-16 Cached

This paper proposes a definition of good explanations based on counterfactuals and prior beliefs, and discusses the inherent difficulties in explaining LLM outputs under this definition.

0 favorites 0 likes

#explainability

Forecasting Future Behavior as a Learning Task

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper proposes Behavior Forecasters, a learned approach that predicts an LRM's future behavior (e.g., answer consistency and input sensitivity) from its reasoning trajectory, outperforming GPT-5.4 and Claude Opus 4.6 at lower cost.

0 favorites 0 likes

#explainability

Forecasting Future Behavior as a Learning Task

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

This paper proposes training Behavior Forecasters to predict large reasoning model outputs from single trajectories, outperforming large language models like GPT-5.4 and Claude Opus-4.6 at lower computational cost, bypassing traditional explainability methods.

0 favorites 0 likes

#explainability

A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper examines counterfactual behavior in ML models through a geometric lens, showing that models with similar predictive performance can differ substantially in counterfactual outcomes due to the interaction between decision-boundary proximity and local data support. The findings identify counterfactual behavior as a distinct dimension from predictive performance, with implications for model selection and reliability of counterfactual explanation methods.

0 favorites 0 likes

#explainability

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

arXiv cs.AI ↗ · 2026-06-04 Cached

Researchers from the University of Michigan introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework that enables LLM agents to reason about the internal assumptions, dependencies, and execution behavior of scientific simulators rather than treating them as black boxes. The framework improves explanation quality and decision-making reliability across high-stakes domains like healthcare, finance, and public policy.

0 favorites 0 likes

#explainability

GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods

Hugging Face Daily Papers ↗ · 2026-06-02 Cached

GridVQA-X introduces a diagnostic framework to evaluate cross-modal explainability by distinguishing genuine spatial-relational reasoning from cross-modal shortcuts in multimodal models.

0 favorites 0 likes

#explainability

SafeRx-Agent: A Knowledge-Grounded Multi-Agent Framework for Safe and Explainable Medication Recommendation

arXiv cs.CL ↗ · 2026-05-29 Cached

Introduces SafeRx-Agent, a knowledge-grounded multi-agent framework for safe and explainable medication recommendation that generates fine-grained ATC code predictions while controlling drug interactions and contraindications, evaluated on MIMIC-III and MIMIC-IV datasets.

0 favorites 0 likes

#explainability

Show, Don't TELL: Explainable AI-Generated Text Detection

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

Introduces TELL, an AI-generated text detection system that provides explainable annotations alongside numerical scores, achieving competitive AUROC of 0.927 while enabling users to judge authorship based on highlighted textual indicators.

0 favorites 0 likes

#explainability

The Attribution Contract: Feature Attribution for Generative Language Models

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper introduces the Attribution Contract, a specification for feature-attribution claims in generative language models, addressing ambiguities in what constitutes a feature and how attribution methods should be evaluated. It uses autoregressive and diffusion models as case studies to show when attribution is informative or misleading.

0 favorites 0 likes

#explainability

The hardest part of AI in 2026 isn't building the workflow. It's explaining "probabilistic outputs" to traditional stakeholders.

Reddit r/ArtificialInteligence ↗ · 2026-05-24

The article argues that the primary challenge of AI in 2026 is not technical development but communicating probabilistic outputs to traditional stakeholders accustomed to deterministic guarantees, requiring skills in explanation and persuasion.

0 favorites 0 likes

#explainability

INSIGHTS: Demonstration-Based Summaries of Time Series Predictors

arXiv cs.LG ↗ · 2026-05-20

INSIGHTS is a model-agnostic approach for providing global explanations of time-series models by generating diverse, informative sample summaries that capture domain-specific behaviors, outperforming local attribution methods in user studies.

0 favorites 0 likes

#explainability

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper proposes a pipeline for fine-tuning LLMs specifically for explainable misinformation detection and introduces LonsRex, a data synthesis method to generate necessary and sufficient rationales, addressing limitations of naive filtering based solely on label correctness.

0 favorites 0 likes

#explainability

GESD: Beyond Outcome-Oriented Fairness

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper proposes GESD, a procedural-oriented fairness metric that measures disparities in explanation stability across subgroups, and integrates it into a multi-objective optimization framework for jointly optimizing utility, outcome fairness, and explanation fairness.

0 favorites 0 likes

explainability

Submit Feedback