Relit-LiVE: Relight Video by Jointly Learning Environment Video

Hugging Face Daily Papers Papers

Summary

This paper introduces Relit-LiVE, a novel video relighting framework that produces physically consistent results without requiring camera pose information by using raw reference images and joint environment video prediction.

Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novel environment video prediction formulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a single diffusion process. This joint prediction enforces strong geometric-illumination alignment and naturally supports dynamic lighting and camera motion, significantly improving physical consistency in video relighting while easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-art video relighting and neural rendering methods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streaming video relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/13/26, 12:13 PM

Paper page - Relit-LiVE: Relight Video by Jointly Learning Environment Video

Source: https://huggingface.co/papers/2605.06658

Abstract

A novel video relighting framework called Relit-LiVE is presented that produces physically consistent results without requiring camera pose information by incorporating raw reference images and using environment video prediction for joint relighting and environment map generation.

Recent advances have shown that large-scalevideo diffusion modelscan be repurposed asneural renderersby first decomposing videos intointrinsic scene representationsand then performingforward renderingunder novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novelvideo relightingframework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore, we propose a novelenvironment video predictionformulation that simultaneously generates relit videos and per-frame environment maps aligned with each camera viewpoint in a singlediffusion process. This joint prediction enforces stronggeometric-illumination alignmentand naturally supports dynamic lighting and camera motion, significantly improving physical consistency invideo relightingwhile easing the requirement of known per-frame camera pose. Extensive experiments demonstrate that Relit-LiVE consistently outperforms state-of-the-artvideo relightingandneural renderingmethods across synthetic and real-world benchmarks. Beyond relighting, our framework naturally supports a wide range of downstream applications, including scene-level rendering, material editing, object insertion, and streamingvideo relighting. The Project is available at https://github.com/zhuxing0/Relit-LiVE.

View arXiv pageView PDFProject pageGitHubAdd to collection

Get this paper in your agent:

hf papers read 2605\.06658

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### weiqingXiao/Relit-LiVE Image-to-Video• Updated5 days ago • 2

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.06658 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.06658 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Hugging Face Daily Papers

Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.

EasyVideoR1: Easier RL for Video Understanding

Hugging Face Daily Papers

EasyVideoR1 is an efficient reinforcement learning framework for training large vision-language models on video understanding tasks, featuring offline preprocessing with tensor caching for 1.47x throughput improvement, a task-aware reward system covering 11 problem types, and evaluation across 22 video benchmarks. It also supports joint image-video training and a mixed offline-online data training paradigm.