token-selection

#token-selection

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

arXiv cs.CL ↗ · 3d ago Cached

Proposes Distribution-Aligned Self-Distillation (DASD), which dynamically filters tokens during self-distillation to preserve beneficial logical corrections while suppressing distributionally misaligned style noise, improving robust reasoning on math, code, and commonsense benchmarks.

0 favorites 0 likes

#token-selection

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Hugging Face Daily Papers ↗ · 2026-05-22 Cached

This paper introduces a two-stage token selection framework for visual geometry transformers that reduces computational costs by restricting key/value tokens during global attention, achieving over 85% acceleration on scenes with 500 images while maintaining baseline performance.

0 favorites 0 likes

#token-selection

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Hugging Face Daily Papers ↗ · 2026-05-19 Cached

SEATS is a training-free, stage-adaptive token selection method that reduces computational overhead in omni-modal LLMs by progressively pruning redundant visual and audio tokens, achieving a 9.3x FLOPs reduction and 4.8x prefill speedup while preserving 96.3% performance.

0 favorites 0 likes

token-selection

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

Submit Feedback