Tag
SKIM is an adaptive multi-resolution soft token compression framework that compresses procedural skills for LLMs, maintaining task performance while reducing prefill cost and latency.
AdaPLD is a training-free method that improves model-free speculative decoding by using adaptive retrieval combining lexical and semantic similarity, and constructing branched reuse hypotheses to handle continuation uncertainty, achieving up to 3.10x decoding speedup.
This paper presents CosmicFish-HRM, a compact 82.77M parameter language model with a hierarchical reasoning module that dynamically allocates reasoning compute during inference, learning when to halt based on input complexity.
Proposes CIST, a method that assigns separate sample-wise adaptive temperatures to teacher and student in knowledge distillation, producing consistently informative soft labels and relaxing rigid logit-scale matching. Experiments on vision and language tasks show consistent improvements over standard KD.
A new semantic-adaptive eviction policy for LLM prefix caches that learns token reuse patterns across different token types, achieving 1.4x-2.7x TTFT improvement over existing policies.