Tag
LACE introduces a lattice attention mechanism that enables concurrent reasoning paths in LLMs to share intermediate insights and correct errors during inference, improving reasoning accuracy by over 7 points compared to standard isolated parallel sampling.
This paper introduces STOP (Super Token for Pruning), a lightweight method that learns to prune unpromising reasoning paths early during parallel decoding by appending learnable tokens and reading KV cache states, achieving 70% token reduction while improving performance on AIME and GPQA benchmarks.