Simplifying, stabilizing, and scaling continuous-time consistency models

OpenAI Blog Papers

Summary

OpenAI presents sCM (simplified continuous-time consistency models), a new approach that scales consistency models to 1.5B parameters and achieves ~50x speedup over diffusion models by generating high-quality samples in just 2 steps. The method demonstrates comparable sample quality to state-of-the-art diffusion models while using less than 10% of the effective sampling compute.

We’ve simplified, stabilized, and scaled continuous-time consistency models, achieving comparable sample quality to leading diffusion models, while using only two sampling steps.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:47 PM

# Simplifying, stabilizing, and scaling continuous-time consistency models Source: [https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/](https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/) Current sampling approaches of diffusion models often require dozens to hundreds of sequential steps to generate a single sample, which limits their efficiency and scalability for real\-time applications\. Various distillation techniques have been developed to accelerate sampling, but they often come with limitations, such as high computational costs, complex training, and reduced sample quality\. Extending our previous research on consistency models[1](https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/#citation-bottom-1),[2](https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/#citation-bottom-2), we have simplified the formulation and further stabilized the training process of continuous\-time consistency models\. Our new approach, called sCM, has enabled us to scale the training of continuous\-time consistency models to an unprecedented 1\.5 billion parameters on ImageNet at 512×512 resolution\. sCMs can generate samples with quality comparable to diffusion models using only two sampling steps, resulting in a ~50x wall\-clock speedup\. For example, our largest model, with 1\.5 billion parameters, generates a single sample in just 0\.11 seconds on a single A100 GPU without any inference optimization\. Additional acceleration is easily achievable through customized system optimization, opening up possibilities for real\-time generation in various domains such as image, audio, and video\. For rigorous evaluation, we benchmarked sCM against other state\-of\-the\-art generative models by comparing both sample quality, using the standard Fréchet Inception Distance \(FID\) scores \(where lower is better\), and effective sampling compute, which estimates the total compute cost for generating each sample\. As shown below, our 2\-step sCM produces samples with quality comparable to the best previous methods while using less than 10% of the effective sampling compute, significantly accelerating the sampling process\. Consistency models offer a faster alternative to traditional diffusion models for generating high\-quality samples\. Unlike diffusion models, which generate samples gradually through a large number of denoising steps, consistency models aim to convert noise directly into noise\-free samples in a single step\. This difference is visualized by paths in the diagram: the blue line represents the gradual sampling process of a diffusion model, while the red curve illustrates the more direct, accelerated sampling of a consistency model\. Using techniques like consistency training or consistency distillation[1](https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/#citation-bottom-1),[2](https://openai.com/index/simplifying-stabilizing-and-scaling-continuous-time-consistency-models/#citation-bottom-2),consistency models can be trained to generate high\-quality samples with significantly fewer steps, making them appealing for practical applications that require fast generation\. Our sCM distills knowledge from a pre\-trained diffusion model\. A key finding is that sCMs improve proportionally with the teacher diffusion model as both scale up\. Specifically, the relative difference in sample quality, measured by the ratio of FID scores, remains consistent across several orders of magnitude in model sizes, causing the absolute difference in sample quality to diminish at scale\. Additionally, increasing the sampling steps for sCMs further reduces the quality gap\. Notably, two\-step samples from sCMs are already comparable \(with less than a 10% relative difference in FID scores\) to samples from the teacher diffusion model, which requires hundreds of steps to generate\.

Similar Articles

Consistency Models

OpenAI Blog

OpenAI introduces Consistency Models, a new family of generative models that enable fast one-step image generation by directly mapping noise to data, while supporting multi-step sampling and zero-shot editing tasks like inpainting and super-resolution. The approach achieves state-of-the-art FID scores on CIFAR-10 and ImageNet 64x64 for one-step generation.

Improved Techniques for Training Consistency Models

OpenAI Blog

OpenAI presents improved techniques for training consistency models that enable high-quality single-step image generation without distillation, achieving significant FID improvements on CIFAR-10 and ImageNet 64×64 through novel loss functions and training strategies.

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Hugging Face Daily Papers

LangFlow presents the first continuous diffusion language model that rivals discrete diffusion approaches, challenging the long-held belief that continuous diffusion is inferior for language modeling. The work introduces key ingredients like optimal Gumbel-based noise scheduling and demonstrates competitive perplexity and transfer learning performance compared to discrete diffusion baselines.