Tag
This research paper introduces adaptive correction scheduling for enforcing hard constraints in generative sampling, demonstrating that it improves the cost-accuracy frontier compared to terminal or stepwise projection methods.
This paper introduces S-FLM, a novel flow-based language model that operates in a hyperspherical latent space to address the computational costs and semantic limitations of existing discrete diffusion and continuous flow models.
This paper introduces a protocol for fair comparison of diffusion-based OOD detectors and proposes Canonical Feature Snapshots (CFS), which leverage sparse internal activations for efficient detection.
This paper introduces Trajectory Matching Policy Optimization (TMPO), a method for aligning diffusion models that addresses reward hacking and visual mode collapse by matching trajectory-level reward distributions rather than maximizing scalar rewards.
This paper introduces DiffScore, a text evaluation framework based on Masked Large Diffusion Language Models that addresses positional bias in autoregressive scoring by using masked reconstruction.
This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.
This paper introduces Latent Visualization by Optimization (LVO), a mechanistic interpretability technique that uses sparse autoencoders to visualize monosemantic features in diffusion models like Stable Diffusion 1.5.
This paper introduces DOSER, a framework using diffusion models for out-of-distribution detection and selective regularization in offline reinforcement learning. It aims to improve performance on static datasets by distinguishing between beneficial and detrimental OOD actions.
This paper introduces NoiseRater, a meta-learning framework that assigns importance scores to individual noise samples during diffusion model training to improve efficiency and generation quality.
This paper introduces WildRelight, a new real-world benchmark dataset for single-image relighting that addresses the gap between synthetic and natural scenes. It proposes a physics-guided adaptation framework using diffusion posterior sampling and test-time adaptation to improve model performance on real-world data.
MoCam is a research paper introducing a diffusion-based framework for unified novel view synthesis that dynamically coordinates geometric and appearance priors to improve robustness against geometric errors.
The article introduces A²RD, a novel architecture for generating consistent long videos using agentic autoregressive diffusion. It proposes a Retrieve–Synthesize–Refine–Update cycle and a new benchmark, LVBench-C, to address semantic drift in long-horizon video synthesis.
This paper introduces Christoffel-DPS, a distribution-free framework for optimal sensor placement in diffusion posterior sampling that outperforms classical Gaussian-based methods. It provides theoretical guarantees and practical improvements for reconstructing states from complex, non-Gaussian distributions using generative models.
This research paper investigates privacy leakage in tabular diffusion models, quantifying how training setups, synthesis choices, and attacker knowledge impact privacy risks. It reveals that adversaries can succeed without perfect knowledge or massive resources and highlights pitfalls in heuristic privacy metrics.
This paper provides a theoretical analysis explaining why deterministic DDIM samplers hallucinate more than stochastic DDPM samplers in diffusion models, attributing it to getting stuck in mode-interpolation regions during reverse dynamics.
This arXiv preprint proposes a unified measure-theoretic framework for understanding diffusion, score-based, and flow matching generative models. It establishes connections between these methods via continuity/Fokker-Planck equations and analyzes their sampling schemes and theoretical guarantees.
This paper introduces a diffusion language model that treats text as a continuous process over binary bitstreams, using entropy-gated stochastic sampling to close the performance gap with autoregressive models. It achieves state-of-the-art results on LM1B and OWT benchmarks while reducing memory footprint.
ELF proposes a continuous diffusion model for language that uses embedding space and flow matching, outperforming existing discrete and continuous diffusion language models with fewer sampling steps.
Qwen-Image-2.0 is a new image generation foundation model that unifies high-fidelity synthesis and precise editing using Qwen3-VL and a Multimodal Diffusion Transformer. It excels in text-rich content, multilingual typography, and photorealistic generation.
The author introduces 'Bracket', an open-source tool that automates hyperparameter search for diffusion model fine-tuning using parallel training trials and VLM-based scoring to objectively determine the best configuration.