Tag
ConFu introduces a novel speculative decoding framework that enables draft models to anticipate future generation directions through contemplate tokens and soft prompts, achieving 8-20% improvements in token acceptance rates and generation speed over EAGLE-3 across multiple LLM models.