Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Hugging Face Daily Papers 06/26/26, 12:00 AM Papers

pixel-space autoregressive image-generation parallel-rollout intermediate-states imagenet

Summary

Parallel Rollout Approximation (PRA) improves pixel-space autoregressive image generation by using low-dimensional intermediate states and parallel training, achieving new state-of-the-art results on ImageNet-1K generation.

Pixel-space continuous-token autoregressive (AR) generation directly models images as sequences of raw pixel patches, avoiding discrete tokenization or a separately pretrained tokenizer. However, it faces coupled challenges: high-dimensional patch generation causes large single-step errors, and teacher-forced training creates a train--inference gap that makes these errors accumulate across AR steps. Existing fixes such as x-prediction and input noise injection only partially mitigate these issues. Exact rollout training better matches inference-time conditions, but is impractical due to prohibitively slow sequential sampling. We propose Parallel Rollout Approximation (PRA), a scalable framework that addresses both challenges jointly. PRA generates low-dimensional intermediate states instead of high-dimensional pixel patches, then maps them back to pixel-space tokens with a pixel decoder, preserving a pixel-in, pixel-out AR interface. It also constructs inference-like pixel inputs through the same intermediate-state-to-pixel path used at inference, independently across positions, approximating the pixel-feedback interface encountered during inference-time rollout while retaining parallel teacher-forced training. On class-conditional ImageNet-1K generation at 256times256 resolution, PRA-S with 135M parameters achieves an FID of 2.58, surpassing the previous billion-scale pixel-space AR result of 3.60. Scaling to PRA-L with 511M parameters further improves FID to 1.94, establishing a new state of the art among pixel-space AR models. Beyond generation, PRA achieves higher ImageNet classification probing accuracy than other AR and diffusion baselines, suggesting its potential for unified pixel-space image generation and understanding.

Original Article

View Cached Full Text

Cached at: 06/29/26, 02:03 PM

Paper page - Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Source: https://huggingface.co/papers/2606.27978

Abstract

Parallel Rollout Approximation (PRA) addresses limitations in pixel-space autoregressive image generation by using low-dimensional intermediate states and parallel training to improve quality and efficiency.

Pixel-space continuous-tokenautoregressive (AR) generation directly models images as sequences of raw pixel patches, avoiding discrete tokenization or a separately pretrained tokenizer. However, it faces coupled challenges: high-dimensional patch generation causes large single-step errors, andteacher-forced trainingcreates a train--inference gap that makes these errors accumulate across AR steps. Existing fixes such as x-prediction and input noise injection only partially mitigate these issues. Exactrollout trainingbetter matches inference-time conditions, but is impractical due to prohibitively slow sequential sampling. We proposeParallel Rollout Approximation(PRA), a scalable framework that addresses both challenges jointly. PRA generates low-dimensionalintermediate statesinstead of high-dimensional pixel patches, then maps them back topixel-spacetokens with apixel decoder, preserving a pixel-in, pixel-out AR interface. It also constructs inference-like pixel inputs through the same intermediate-state-to-pixel path used at inference, independently across positions, approximating the pixel-feedback interface encountered during inference-time rollout while retaining parallelteacher-forced training. On class-conditionalImageNet-1K generation at 256times256 resolution, PRA-S with 135M parameters achieves anFIDof 2.58, surpassing the previous billion-scalepixel-spaceAR result of 3.60. Scaling to PRA-L with 511M parameters further improvesFIDto 1.94, establishing a new state of the art amongpixel-spaceAR models. Beyond generation, PRA achieves higherImageNetclassification probing accuracy than other AR and diffusion baselines, suggesting its potential for unifiedpixel-spaceimage generation and understanding.

View arXiv page View PDF GitHub Add to collection

Get this paper in your agent:

hf papers read 2606\.27978

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.27978 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.27978 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.27978 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Paper page - Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

L2P: Unlocking Latent Potential for Pixel Generation

GEAR: Guided End-to-End AutoRegression for Image Synthesis

prunaai/p-image

Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction

GPT-Image-2 is rolling out

Submit Feedback

Similar Articles

L2P: Unlocking Latent Potential for Pixel Generation

GEAR: Guided End-to-End AutoRegression for Image Synthesis

Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction