Intermittent random token injection during decoding stage increases LLM diversity without fine-tuning

Reddit r/ArtificialInteligence 05/11/26, 11:37 AM Papers

Summary

A Harvard research paper introduces Recoding-Decoding (RD), a novel decoding scheme that injects random priming phrases and diverting tokens to tap into an LLM's long-tail knowledge, significantly boosting output diversity without fine-tuning. The method maintains high relevance while mitigating response homogenization, with stronger models showing greater diversity gains.

"A new paper out of Harvard (Luo, King, Puett, Smith) introduces Recoding-Decoding (RD), a decoding scheme that pulls the long tail of an LLM's knowledge into actual outputs by injecting priming phrases and diverting tokens during decoding stage. How RD works: The authors argue that modern LLMs encode an enormous slice of human knowledge, but standard decoding (top-k, nucleus, etc.) only ever pulls from the peak of the conditional distribution. The long tail — heterodox, contrarian, non-Western, weird-but-relevant — sits unused. RD diverts the model off its modal path by: 1) Prepending a random ""priming phrase"" (e.g., **Related to FOOD:**, **Related to SKY:**) 2) Injecting a random 3-letter ""diverting stem"" (Pas, Tib, Mon, …) at the start of each new sentence For example, ""Brainstorm a world history topic"" can now resolve to ""[Pas]ta and the silk road"" or ""[Tib]etan sky burials"" by absorbing the injected tokens of [Pas] and [Tib], instead of generating the dominant answer of ""Age of Enlightenment."" What they found across 50 brainstorm topics + 500 prompts from 5 public datasets that relevance stays around 0.99 but diversity grows almost linearly out to 1,000 runs. They also found that the stronger the LLM (Gemini-3 > GPT-5.1 > GPT-3.5 > DeepSeek-3), the larger RD's lead — because more capable models have more peaked distributions and thus more hidden tail knowledge. Why it matters: The authors frame this as the ""search quest"" problem — picking a wedding dress, a research topic, a startup name, a school for a kid. The goal isn't the correct answer; it's learning the space. Current LLMs are anti-optimized for that, which the paper argues is quietly driving collective homogenization (they cite a striking incident where students using ChatGPT to outline essays turned in nearly identical arguments without ever talking to each other). 📄 Paper: [https://arxiv.org/abs/2603.19519](https://arxiv.org/abs/2603.19519)

Original Article

Intermittent random token injection during decoding stage increases LLM diversity without fine-tuning

Similar Articles

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Faster LLM Inference via Sequential Monte Carlo

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

@_avichawla: Researchers found a way to make LLMs 8.5x faster! (without compromising accuracy) Speculative decoding is quite an effe…

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

Submit Feedback

Similar Articles

$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction

Faster LLM Inference via Sequential Monte Carlo

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

@_avichawla: Researchers found a way to make LLMs 8.5x faster! (without compromising accuracy) Speculative decoding is quite an effe…

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection