Relit-LiVE: Relight Video by Jointly Learning Environment Video

Hugging Face Daily Papers 05/07/26, 12:00 AM Papers

Summary

This paper introduces Relit-LiVE, a novel video relighting framework that produces physically consistent results without requiring camera pose information by using raw reference images and joint environment video prediction.

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

Original Article Export to Word Export to PDF

View Cached Full Text

Cached at: 05/13/26, 12:13 PM

Paper page - Relit-LiVE: Relight Video by Jointly Learning Environment Video

Source: https://huggingface.co/papers/2605.06658

Abstract

A novel video relighting framework called Relit-LiVE is presented that produces physically consistent results without requiring camera pose information by incorporating raw reference images and using environment video prediction for joint relighting and environment map generation.

Recent advances have shown that large-scalevideo diffusion modelscan be repurposed asneural renderersby first decomposing videos intointrinsic scene representationsand then performingforward renderingunder novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novelvideo relightingframework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novelenvironment video predictionformulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a singlediffusion process. This joint prediction enforces stronggeometric-illumination alignmentand naturally supports dynamic lighting and camera motion, significantly improving physical consistency invideo relightingwhile easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-artvideo relightingandneural renderingmethods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streamingvideo relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

View arXiv page View PDF Project page GitHub Add to collection

Get this paper in your agent:

hf papers read 2605\.06658

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### weiqingXiao/Relit-LiVE Image-to-Video• Updated5 days ago • 2

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.06658 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.06658 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Relit-LiVE: Relight Video by Jointly Learning Environment Video

Paper page - Relit-LiVE: Relight Video by Jointly Learning Environment Video

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

EasyVideoR1: Easier RL for Video Understanding

Submit Feedback

Similar Articles

WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

EasyVideoR1: Easier RL for Video Understanding