parallel-decoding

#parallel-decoding

LACE: Lattice Attention for Cross-thread Exploration

arXiv cs.AI ↗ · 2026-04-20 Cached

LACE introduces a lattice attention mechanism that enables concurrent reasoning paths in LLMs to share intermediate insights and correct errors during inference, improving reasoning accuracy by over 7 points compared to standard isolated parallel sampling.

0 favorites 0 likes

#parallel-decoding

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Hugging Face Daily Papers ↗ · 2026-04-17 Cached

This paper introduces STOP (Super Token for Pruning), a lightweight method that learns to prune unpromising reasoning paths early during parallel decoding by appending learnable tokens and reading KV cache states, achieving 70% token reduction while improving performance on AIME and GPQA benchmarks.

0 favorites 0 likes

parallel-decoding

LACE: Lattice Attention for Cross-thread Exploration

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

Submit Feedback