qk-restore

#qk-restore

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Hugging Face Daily Papers ↗ · 2026-06-09 Cached

This paper identifies that chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, and proposes QK-Restore, a training-free method that restores long-context recall while preserving reasoning performance.

0 favorites 0 likes

qk-restore

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Submit Feedback