feature-attribution

#feature-attribution

The Attribution Contract: Feature Attribution for Generative Language Models

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper introduces the Attribution Contract, a specification for feature-attribution claims in generative language models, addressing ambiguities in what constitutes a feature and how attribution methods should be evaluated. It uses autoregressive and diffusion models as case studies to show when attribution is informative or misleading.

0 favorites 0 likes

#feature-attribution

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper proves that no feature ranking can be simultaneously faithful, stable, and complete under collinearity, characterizing the full attribution design space and providing a formally verified impossibility theorem in explainable AI.

0 favorites 0 likes

#feature-attribution

model-agnostic sensitivity approximator [P]

Reddit r/MachineLearning ↗ · 2026-05-18

A 16-year-old developer created sage-explainer, a Python package that approximates prediction sensitivity to features for black-box models like random forests and XGBoost, offering more stable results than centered finite differences.

0 favorites 0 likes

#feature-attribution

From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks

arXiv cs.LG ↗ · 2026-05-18 Cached

Introduces a weight perturbation-based feature attribution method (XWP and XWPc) for fully connected neural networks, achieving competitive performance on standard baseline metrics.

0 favorites 0 likes

#feature-attribution

Prune, Interpret, Evaluate: A Cross-Layer Transcoder-Native Framework for Efficient Circuit Discovery via Feature Attribution

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers introduce PIE, a CLT-native framework for efficient circuit discovery via feature attribution-based pruning, achieving ~40× compression in feature selection while maintaining behavioral fidelity on IOI and Doc-String tasks.

0 favorites 0 likes

feature-attribution

The Attribution Contract: Feature Attribution for Generative Language Models

The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity

model-agnostic sensitivity approximator [P]

From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks

Prune, Interpret, Evaluate: A Cross-Layer Transcoder-Native Framework for Efficient Circuit Discovery via Feature Attribution

Submit Feedback