Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation
Summary
Parallel Rollout Approximation (PRA) improves pixel-space autoregressive image generation by using low-dimensional intermediate states and parallel training, achieving new state-of-the-art results on ImageNet-1K generation.
View Cached Full Text
Cached at: 06/29/26, 02:03 PM
Paper page - Parallel Rollout Approximation for Pixel-Space Autoregressive Image Generation
Source: https://huggingface.co/papers/2606.27978
Abstract
Parallel Rollout Approximation (PRA) addresses limitations in pixel-space autoregressive image generation by using low-dimensional intermediate states and parallel training to improve quality and efficiency.
Pixel-spacecontinuous-tokenautoregressive (AR) generation directly models images as sequences of raw pixel patches, avoiding discrete tokenization or a separately pretrained tokenizer. However, it faces coupled challenges: high-dimensional patch generation causes large single-step errors, andteacher-forced trainingcreates a train--inference gap that makes these errors accumulate across AR steps. Existing fixes such as x-prediction and input noise injection only partially mitigate these issues. Exactrollout trainingbetter matches inference-time conditions, but is impractical due to prohibitively slow sequential sampling. We proposeParallel Rollout Approximation(PRA), a scalable framework that addresses both challenges jointly. PRA generates low-dimensionalintermediate statesinstead of high-dimensional pixel patches, then maps them back topixel-spacetokens with apixel decoder, preserving a pixel-in, pixel-out AR interface. It also constructs inference-like pixel inputs through the same intermediate-state-to-pixel path used at inference, independently across positions, approximating the pixel-feedback interface encountered during inference-time rollout while retaining parallelteacher-forced training. On class-conditionalImageNet-1K generation at 256times256 resolution, PRA-S with 135M parameters achieves anFIDof 2.58, surpassing the previous billion-scalepixel-spaceAR result of 3.60. Scaling to PRA-L with 511M parameters further improvesFIDto 1.94, establishing a new state of the art amongpixel-spaceAR models. Beyond generation, PRA achieves higherImageNetclassification probing accuracy than other AR and diffusion baselines, suggesting its potential for unifiedpixel-spaceimage generation and understanding.
View arXiv pageView PDFGitHubAdd to collection
Get this paper in your agent:
hf papers read 2606\.27978
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.27978 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.27978 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.27978 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
L2P: Unlocking Latent Potential for Pixel Generation
The L2P paper introduces a Latent-to-Pixel transfer paradigm that leverages pre-trained latent diffusion models to create efficient pixel-space models capable of 4K generation with minimal training overhead.
GEAR: Guided End-to-End AutoRegression for Image Synthesis
GEAR proposes a method to jointly train a vector-quantized tokenizer and autoregressive generator end-to-end via representation alignment, achieving up to 10x faster convergence on ImageNet gFID compared to strong baselines.
prunaai/p-image
P-Image is Pruna's text-to-image generation model that produces state-of-the-art images in less than a second, offering a combination of speed, affordability, and quality.
Next-Acceleration-Scale Prediction for Autoregressive MRI Reconstruction
Discrete autoregressive MRI reconstruction using privileged information distillation achieves superior performance under extreme undersampling by leveraging visual autoregressive modeling techniques.
GPT-Image-2 is rolling out
OpenAI is rolling out GPT-Image-2, a new image generation model. This appears to be a significant update to their image generation capabilities.