qk-restore

Tag

Cards List
#qk-restore

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Hugging Face Daily Papers · 2026-06-09 Cached

This paper identifies that chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, and proposes QK-Restore, a training-free method that restores long-context recall while preserving reasoning performance.

0 favorites 0 likes
← Back to home

Submit Feedback