Tag
This paper provides a proof-oriented introduction to diffusion models, covering Langevin dynamics, score-based models, discretization, discrete diffusion, and inference-time control, intended for graduate students.
This article presents a technique to improve LLM creative writing by modifying the sampling process using entropy, aiming to reduce the generic 'LLM feel' in generated text.
The paper introduces VGB, a process-guided sampling algorithm with probabilistic backtracking, which significantly improves coding performance on tiny 0.5B models by being robust to verifier errors.
This paper proposes the Time-Reparameterized Cumulative Intensity Extrapolation (TR-CIE) sampler for discrete flow matching, which improves sampling quality under limited function evaluations by rescaling the time grid and reusing cached model outputs, with theoretical analysis and experiments on text and image generation.
Introduces Nexus Sampling, a training-free KV-cache eviction method using weighted reservoir sampling instead of deterministic top-k, improving long-context LLM inference under fixed memory budgets, matching dense attention performance at 80% eviction.
Recommended a deep guide on modern LLM sampling mechanisms, covering methods such as Temperature, Top-P, Mirostat, etc., of significant reference value for developers aiming to improve output quality.
This paper discovers that large language models partially exhibit emergent symmetry under retokenization—replacing a prompt's canonical tokenization with an alternative valid segmentation while preserving bytes exactly. The authors use this phenomenon to probe compositional understanding and propose retokenization as a novel inference-time sampling strategy that can recover solutions not found by conventional temperature sampling.
This paper introduces ADAS, a training-free reranking rule for parallel masked diffusion decoding that uses attention to discount tokens that strongly attend to uncertain positions, improving low-NFE performance on reasoning and code tasks with minimal runtime overhead.
Guowei Xu discusses limitations of Best-of-N and tree search methods for LLMs on hard reasoning problems, noting sparse verification signals and that candidates remain within the model's distribution.
Proposes a hierarchical variational policy framework for reward-guided diffusion, enabling high-quality sampling with reduced inference cost. Achieves strong quality-speed tradeoff on tasks like super-resolution.
This paper proposes Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme that counters multi-account distillation by correlating responses across accounts while preserving exact statistical fidelity for individual benign users. Theoretical analysis and experiments show LADS degrades distilled student performance on image, math, and code generation.
This paper introduces TokenDrift, a drifting objective that refines discrete diffusion language models by lifting categorical predictions to a continuous semantic space for anti-symmetric drifting, significantly improving generation quality under a fixed number of denoising steps.
This paper introduces DiMS, a dynamical system sampler that guarantees exact sampling from the submanifold of minimum loss solutions in neural networks, enabling better uncertainty quantification in Bayesian inference.
This paper presents a novel framework for synthesizing finite-state controllers for Partially Observable Markov Decision Processes (POMDPs) by integrating sampling, automata learning, and model-checking. The approach provides formal guarantees for threshold-safety problems that elude existing formal synthesis tools.
This paper introduces a validity-diversity framework attributing diversity collapse in LLMs to order and shape miscalibration during decoding, validated across 14 language models.
VictoriaMetrics presented retroactive sampling at KubeCon EU 2026, a new method that significantly reduces traffic, CPU, and memory overhead compared to traditional tail sampling in OpenTelemetry pipelines.