influence-functions

#influence-functions

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

arXiv cs.CL ↗ · 2026-05-20 Cached

This paper proposes CLIF, a method using influence functions to interpret NLP models at both sample and concept levels within Concept Bottleneck Models, enabling transparent debugging and concept-level analysis.

0 favorites 0 likes

#influence-functions

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper introduces a framework for token-level influence attribution in large language models by learning orthogonal latent spaces with sparse autoencoders, enabling precise identification of training data tokens that jointly influence predictions, with applications in high-stakes domains like healthcare.

0 favorites 0 likes

influence-functions

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

Submit Feedback