Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models
Summary
SKIM is an adaptive multi-resolution soft token compression framework that compresses procedural skills for LLMs, maintaining task performance while reducing prefill cost and latency.
View Cached Full Text
Cached at: 06/11/26, 01:36 PM
Paper page - Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models
Source: https://huggingface.co/papers/2606.12203
Abstract
SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills.
Large language models(LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently,reusable natural language skillshave emerged as a popular paradigm to injectprocedural knowledgeinto LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. Whiletext compressiontechniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead ofprocedural knowledge, making them insufficient forskill compression. In this paper, we argue that an effectiveskill compressionmethod should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers ofsoft tokensthat not only improve the efficiency ofLLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preservingtask performancebetter than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .
View arXiv pageView PDFGitHubAdd to collection
Get this paper in your agent:
hf papers read 2606\.12203
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.12203 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.12203 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.12203 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents
This paper introduces SkillLens, a hierarchical framework for adaptive multi-granularity skill reuse in LLM agents, demonstrating improved accuracy and cost-efficiency on benchmark tasks.
Optimizing Korean-Centric LLMs via Token Pruning
This paper presents a systematic benchmark of token pruning—a compression technique that removes tokens and embeddings for irrelevant languages—applied to Korean-centric LLM tasks. The study evaluates popular multilingual models (Qwen3, Gemma-3, Llama-3, Aya) across different vocabulary configurations and finds that token pruning significantly improves generation stability and reduces memory footprint for domain-specific deployments.
SimpleMem: Efficient Lifelong Memory for LLM Agents
Introduces SimpleMem, an efficient memory framework for LLM agents that uses semantic lossless compression to improve accuracy and reduce token consumption, achieving 26.4% F1 improvement and up to 30x reduction in inference-time token usage.
End-to-End Context Compression at Scale
This paper presents Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that efficiently handle long contexts through architectural search and large-scale pretraining, outperforming traditional KV cache methods in accuracy, speed, and memory usage.
How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
This paper proposes Shadow Mask Distillation (SMD) to solve the off-policy bias caused by KV cache compression during reinforcement learning post-training for large language models. It introduces a mechanism that ensures on-policy alignment and improves memory efficiency for long-context reasoning tasks.