Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Hugging Face Daily Papers 06/10/26, 12:00 AM Papers

Summary

SKIM is an adaptive multi-resolution soft token compression framework that compresses procedural skills for LLMs, maintaining task performance while reducing prefill cost and latency.

Large language models (LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently, reusable natural language skills have emerged as a popular paradigm to inject procedural knowledge into LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. While text compression techniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead of procedural knowledge, making them insufficient for skill compression. In this paper, we argue that an effective skill compression method should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers of soft tokens that not only improve the efficiency of LLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preserving task performance better than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .

Original Article

View Cached Full Text

Cached at: 06/11/26, 01:36 PM

Paper page - Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Source: https://huggingface.co/papers/2606.12203

Abstract

SKIM is an adaptive multi-resolution soft token compression framework that efficiently compresses procedural skills while maintaining task performance and enabling lightweight offline compression for frequently updated community skills.

Large language models(LLMs) are widely used to tackle complex tasks with autonomous workflows. Recently,reusable natural language skillshave emerged as a popular paradigm to injectprocedural knowledgeinto LLM applications. Since popular skills are often invoked repeatedly, placing their full text in every context significantly increases prefill cost and latency. Whiletext compressiontechniques have the potential to solve this problem, most existing methods are designed to compress factual knowledge in documents instead ofprocedural knowledge, making them insufficient forskill compression. In this paper, we argue that an effectiveskill compressionmethod should: 1) preserve logical dependencies among workflows and tool protocols, 2) enable lightweight, offline compression for frequently updated community skills, and 3) be adaptable to varying complexities across skills. To address this, we present SKIM (SKIll coMpression), an adaptive multi-resolution soft token compression framework for procedural skills. Depending on the complexity of each skill, SKIM creates different numbers ofsoft tokensthat not only improve the efficiency ofLLM inference, but also preserve the effectiveness of skill usage. Experiments indicate that SKIM compresses skills to 30 to 60 percent of their original token length while preservingtask performancebetter than existing compression methods.We have released our code at https://github.com/bebr2/SKIM .

View arXiv page View PDF GitHub Add to collection

Get this paper in your agent:

hf papers read 2606\.12203

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.12203 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.12203 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.12203 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Paper page - Adaptive Multi-Resolution Procedural Knowledge Compression for Large Language Models

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

Optimizing Korean-Centric LLMs via Token Pruning

SimpleMem: Efficient Lifelong Memory for LLM Agents

End-to-End Context Compression at Scale

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

Submit Feedback

Similar Articles

SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

Optimizing Korean-Centric LLMs via Token Pruning

SimpleMem: Efficient Lifelong Memory for LLM Agents

End-to-End Context Compression at Scale

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment