attention-heads

#attention-heads

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

arXiv cs.CL ↗ · 2026-07-02 Cached

The paper introduces LOGOS, a write-aware detector that identifies attention heads responsible for non-literal retrieval in LLMs by scoring the projection of their OV-circuit output onto the answer-token unembedding direction, outperforming prior attention-based methods across multiple model families.

0 favorites 0 likes

#attention-heads

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Hugging Face Daily Papers ↗ · 2026-07-01 Cached

This paper introduces LOCOS, a method for identifying attention heads responsible for non-literal context synthesis in large language models, outperforming existing techniques on retrieval benchmarks.

0 favorites 0 likes

#attention-heads

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

arXiv cs.LG ↗ · 2026-06-30 Cached

Hard-KV introduces a Cascade Cache hierarchy and Logits Calibration mechanism to resolve the static-dynamic mismatch in head-adaptive KV cache compression, achieving up to 2x throughput improvement in long-context LLM inference.

0 favorites 0 likes

#attention-heads

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

arXiv cs.CL ↗ · 2026-06-29 Cached

This paper investigates how vision-language models resolve conflicts between visual evidence and world knowledge, revealing that visual grounding is the default while prior knowledge depends on a small set of late-layer attention heads. The authors perform causal analysis across three VLM families, demonstrating an asymmetric structure where ablating these heads shifts predictions from knowledge-grounded to visually grounded answers.

0 favorites 0 likes

#attention-heads

Sentence-Level Contextual Entrainment in Large Language Models

arXiv cs.CL ↗ · 2026-06-24 Cached

This paper extends contextual entrainment from token-level to sentence-level, showing that even counterfactual sentences in prompts increase their probability during inference. The effect decreases with model size and is driven by 2-4% of attention heads, which can be ablated without performance loss.

0 favorites 0 likes

#attention-heads

Mind the Heads: Topological Representation Alignment for Multimodal LLMs

Hugging Face Daily Papers ↗ · 2026-06-22 Cached

HeRA aligns individual attention heads in Multimodal Large Language Models (MLLMs) to preserve local neighborhood relationships across modalities, improving vision-centric task performance and reducing visual hallucinations.

0 favorites 0 likes

#attention-heads

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

arXiv cs.AI ↗ · 2026-06-09 Cached

This paper shows that attention heads meeting common criteria for mechanistic role claims (necessity, linear decodability, ablation reversibility) routinely fail to transfer computations across prompts, and introduces the KID (Knowing/Intent/Doing) framework and a three-stage pipeline for more rigorous role assignment.

0 favorites 0 likes

#attention-heads

Mechanistic Insights into Functional Sparsity in Multimodal LLMs via CoRe Heads

arXiv cs.CL ↗ · 2026-06-05 Cached

This paper identifies a specialized subset of attention heads called CoRe heads in multimodal LLMs that exhibit functional sparsity in cross-modal retrieval. Causal interventions show these heads are crucial for multimodal reasoning, and leveraging this sparsity can accelerate inference.

0 favorites 0 likes

#attention-heads

MechRL: Reinforcement Learning Agents Perform Circuit Discovery for Mechanistic Interpretability

arXiv cs.LG ↗ · 2026-05-27 Cached

Proposes MechRL, a reinforcement learning approach to automate circuit discovery in transformer language models. A PPO agent trained on multiple tasks discovers attention head circuits that match known canonical circuits and generalizes to a held-out task.

0 favorites 0 likes

#attention-heads

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

arXiv cs.LG ↗ · 2026-05-26 Cached

Introduces a three-step recipe for identifying attention-head circuits in pretrained transformers using a spectral signal and task-pattern screen without requiring labels, validated across 51M to 1B parameter models and multiple architectures.

0 favorites 0 likes

#attention-heads

From Parameters to Data: A Task-Parameter-Guided Fine-Tuning Pipeline for Efficient LLM Alignment

arXiv cs.LG ↗ · 2026-05-22 Cached

P2D is a unified framework that leverages task-sensitive attention heads for both data selection and structural pruning, achieving an 8.3 pp performance gain and 7.0× speedup by updating only 10% of heads on 10% of data.

0 favorites 0 likes

#attention-heads

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper investigates how weight decay acts as a control parameter for transitioning between memorization and generalization in transformers trained on modular arithmetic, and introduces two cheap online diagnostic metrics from attention activations that track these dynamics.

0 favorites 0 likes

#attention-heads

The Point of No Return: Counterfactual Localization of Deceptive Commitment in Language-Model Reasoning

arXiv cs.CL ↗ · 2026-05-19 Cached

Introduces counterfactual localization to identify when language models become committed to deception during reasoning, using five environments and a corpus of 1.46M sentences across four reasoning models. Shows that attention-based transition features generalize across environments for detecting deceptive commitment.

0 favorites 0 likes

#attention-heads

Language-Switching Triggers Take a Latent Detour Through Language Models

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

This paper identifies a circuit underlying a language-switching backdoor in an 8B-parameter language model, where a three-word Latin trigger redirects English output to French via attention heads and orthogonal latent subspaces, with the final layer MLP converting the latent signal to French logits.

0 favorites 0 likes

#attention-heads

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper investigates prompt-induced hallucinations in vision-language models through mechanistic analysis, identifying specific attention heads responsible for the models' tendency to favor textual prompts over visual evidence. The authors demonstrate that ablating these PIH-heads reduces hallucinations by at least 40% without additional training, revealing model-specific mechanisms underlying this failure mode.

0 favorites 0 likes

attention-heads

Submit Feedback