HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Hugging Face Daily Papers 05/19/26, 12:00 AM Papers

Summary

HL-OutPaint is a coarse-to-fine video outpainting framework for high-resolution long-range videos, using global coarse guidance to enable large spatial extrapolation while maintaining spatio-temporal consistency.

Video outpainting generates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable large spatial extrapolation over long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring global spatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolution video outpainting framework for long sequences. Our approach follows a coarse-to-fine strategy with a two-stage pipeline. We first construct Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novel global-local frame swapping mechanism that couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion and long video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving wide spatial extrapolation and long video sequences.

Original Article

View Cached Full Text

Cached at: 06/01/26, 03:20 PM

Paper page - HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Source: https://huggingface.co/papers/2605.17543

Abstract

HL-OutPaint is a high-resolution video outpainting framework that uses a coarse-to-fine strategy with global coarse guidance to enable large spatial extrapolation and long sequence generation while maintaining spatio-temporal consistency.

Video outpaintinggenerates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable largespatial extrapolationover long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring globalspatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolutionvideo outpaintingframework for long sequences. Our approach follows acoarse-to-fine strategywith a two-stage pipeline. We first constructGlobal Coarse Guidance(GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novelglobal-local frame swapping mechanismthat couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion andlong video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving widespatial extrapolationandlong video sequences.

View arXiv page View PDF Project page Add to collection

Get this paper in your agent:

hf papers read 2605\.17543

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.17543 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.17543 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.17543 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Paper page - HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Video outpainting is getting really good

DALL·E: Introducing outpainting

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

HDR Video Generation via Latent Alignment with Logarithmic Encoding

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Submit Feedback

Similar Articles

Video outpainting is getting really good

DALL·E: Introducing outpainting

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

HDR Video Generation via Latent Alignment with Logarithmic Encoding

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration