Normalizing Trajectory Models
Summary
This paper introduces Normalizing Trajectory Models (NTM), a novel approach to diffusion-based generation that models reverse steps as conditional normalizing flows with exact likelihood training. NTM enables high-quality text-to-image generation in just four steps while retaining the likelihood framework, outperforming baselines on standard benchmarks.
View Cached Full Text
Cached at: 05/11/26, 02:42 AM
Paper page - Normalizing Trajectory Models
Source: https://huggingface.co/papers/2605.08078
Abstract
Normalizing Trajectory Models introduce a novel approach to diffusion-based generation by modeling each reverse step as an expressive conditional normalizing flow with exact likelihood training, enabling high-quality sample generation in few steps while maintaining likelihood framework.
Diffusion-based modelsdecompose sampling into many smallGaussian denoising steps-- an assumption that breaks down when generation is compressed to a few coarse transitions. Existing few-step methods address this through distillation, consistency training, or adversarial objectives, but sacrifice the likelihood framework in the process. We introduce Normalizing Trajectory Models (NTM), which models each reverse step as an expressive conditional normalizing flow with exactlikelihood training. Architecturally, NTM combines shallowinvertible blockswithin each step with a deep parallel predictor across the trajectory, forming an end-to-end network trainable from scratch or initializable from pretrainedflow-matching models. Its exact trajectory likelihood further enablesself-distillation: a lightweight denoiser trained on the model’s own score produces high-quality samples in four steps. Ontext-to-image benchmarks, NTM matches or outperforms strong image generation baselines in just four sampling steps while uniquely retaining exact likelihood over the generative trajectory.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.08078
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.08078 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.08078 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.08078 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
This paper introduces Trajectory-Shaped Discrete Flow Matching (TS-DFM), which replaces blind stochastic jumps with guided navigation to significantly improve text generation efficiency and reduce computational costs. The method achieves superior perplexity and speed compared to traditional multi-step baselines while maintaining unchanged inference costs.
Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
This paper introduces TABOM, a self-distilled trajectory-based post-training framework for Diffusion Language Models that aligns training with inference trajectories using Boltzmann modeling to mitigate the training-inference discrepancy and reduce catastrophic forgetting.
Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models
This paper identifies a failure mode called 'trajectory locking' in reward-maximizing post-training for diffusion language models, and proposes TraFL, a trajectory-balance objective that improves diversity and performance across math and code benchmarks.
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
LeapAlign is a post-training method that improves flow matching model alignment with human preferences by reducing computational costs through two-step trajectory shortcuts while enabling stable gradient propagation to early generation steps. The method outperforms state-of-the-art approaches when fine-tuning Flux models across various image quality and text-alignment metrics.
Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models
NVIDIA introduces Nemotron-Labs Diffusion, a family of diffusion language models that generate text in parallel and iteratively refine it, offering faster generation and the ability to revise previous tokens.