PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications

OpenAI Blog Papers

Summary

PixelCNN++ introduces several architectural improvements to PixelCNN including discretized logistic mixture likelihood, downsampling, and shortcut connections, achieving state-of-the-art log likelihood results on CIFAR-10.

No content available
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 02:45 PM

# PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications Source: [https://openai.com/index/pixelcnn-plus-plus/](https://openai.com/index/pixelcnn-plus-plus/) ## Abstract PixelCNNs are a recently proposed class of powerful generative models with tractable likelihood\. Here we discuss our implementation of PixelCNNs which we make available at[this https URL⁠\(opens in a new window\)](https://github.com/openai/pixel-cnn)\. Our implementation contains a number of modifications to the original model that both simplify its structure and improve its performance\. 1\) We use a discretized logistic mixture likelihood on the pixels, rather than a 256\-way softmax, which we find to speed up training\. 2\) We condition on whole pixels, rather than R/G/B sub\-pixels, simplifying the model structure\. 3\) We use downsampling to efficiently capture structure at multiple resolutions\. 4\) We introduce additional short\-cut connections to further speed up optimization\. 5\) We regularize the model using dropout\. Finally, we present state\-of\-the\-art log likelihood results on CIFAR\-10 to demonstrate the usefulness of these modifications\.

Similar Articles

Improved Techniques for Training Consistency Models

OpenAI Blog

OpenAI presents improved techniques for training consistency models that enable high-quality single-step image generation without distillation, achieving significant FID improvements on CIFAR-10 and ImageNet 64×64 through novel loss functions and training strategies.

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

Hugging Face Daily Papers

Re2Pix is a hierarchical video prediction framework that improves future video generation by first predicting semantic representations using frozen vision foundation models, then conditioning a latent diffusion model on these predictions to generate photorealistic frames. The approach addresses train-test mismatches through nested dropout and mixed supervision strategies, achieving improved temporal semantic consistency and perceptual quality on autonomous driving benchmarks.

An Optimal Transport-driven Approach for Cultivating Latent Space in Online Incremental Learning

Hugging Face Daily Papers

This paper introduces MMOT, an online mixture model learning framework based on optimal transport theory that addresses incremental learning with distributional shifts through dynamic centroid updates and improved class similarity estimation. The approach includes a Dynamic Preservation strategy to mitigate catastrophic forgetting and maintain class separability in latent space.

Faster LLM Inference via Sequential Monte Carlo

arXiv cs.CL

This paper proposes Sequential Monte Carlo Speculative Decoding (SMC-SD), a method that accelerates LLM inference by replacing token-level rejection in speculative decoding with importance-weighted resampling over draft particles, achieving 2.36× speedup over standard speculative decoding and 5.2× over autoregressive decoding while maintaining 3% accuracy loss.