influence-functions

Tag

Cards List
#influence-functions

CLIF: Concept-Level Influence Functions for Transparent Bottleneck Models

arXiv cs.CL · 2026-05-20 Cached

This paper proposes CLIF, a method using influence functions to interpret NLP models at both sample and concept levels within Concept Bottleneck Models, enabling transparent debugging and concept-level analysis.

0 favorites 0 likes
#influence-functions

Correcting Influence: Unboxing LLM Outputs with Orthogonal Latent Spaces

arXiv cs.LG · 2026-05-14 Cached

This paper introduces a framework for token-level influence attribution in large language models by learning orthogonal latent spaces with sparse autoencoders, enabling precise identification of training data tokens that jointly influence predictions, with applications in high-stakes domains like healthcare.

0 favorites 0 likes
← Back to home

Submit Feedback