coherence

Tag

Cards List
#coherence

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Hugging Face Daily Papers · 4d ago Cached

This paper identifies KV-cache contamination as a failure mode for activation steering in dialogue and proposes GCAD, a method that extracts steering signals from prompt contributions and applies token-level gating to improve long-horizon coherence, achieving substantial gains on multi-turn benchmarks.

0 favorites 0 likes
#coherence

Why MOE below A10b feels like im gambling

Reddit r/LocalLLaMA · 2026-04-22

Developer reports that small-active-parameter MOE models like qwen3.6-35b-A3b exhibit lower coherence and require more guidance than dense qwen3.5-27b, making them hard to slot into agentic workflows.

0 favorites 0 likes
← Back to home

Submit Feedback