WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting
Summary
This paper introduces WildRelight, a new real-world benchmark dataset for single-image relighting that addresses the gap between synthetic and natural scenes. It proposes a physics-guided adaptation framework using diffusion posterior sampling and test-time adaptation to improve model performance on real-world data.
View Cached Full Text
Cached at: 05/13/26, 12:15 PM
Paper page - WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting
Source: https://huggingface.co/papers/2605.11696
Abstract
WildRelight dataset addresses the gap between synthetic and real-world single-image relighting by providing high-resolution outdoor scenes with aligned natural illumination, enabling physics-guided domain adaptation through diffusion posterior sampling and test-time adaptation.
Recentsingle-image relightingmethods, powered by advancedgenerative models, have achieved impressive photorealism on synthetic benchmarks. However, their effectiveness in the complex visual landscape of the real world remains largely unverified. A critical gap exists, as current datasets are typically designed for multi-view reconstruction and fail to address the unique challenges ofsingle-image relighting. To bridge this synthetic-to-real gap, we introduce WildRelight, the first in-the-wild dataset specifically created for evaluatingsingle-image relightingmodels. WildRelight features a diverse collection of high-resolution outdoor scenes, captured under strictly aligned, temporally varying natural illuminations, each paired with a high-dynamic-range environment map. Using this data, we establish a rigorous benchmark revealing that state-of-the-art models trained on synthetic data suffer from severedomain shifts. The strictly aligned temporal structure of WildRelight enables a new paradigm for domain adaptation. We demonstrate this by introducing aphysics-guided inferenceframework that leverages the captured natural light evolution as a self-supervised constraint. By integratingDiffusion Posterior Sampling(DPS) withtemporal Sampling-Aware Test-Time Adaptation(TTA), we show that the dataset allows synthetic models to align with real-world statistics on-the-fly, transforming the intractable sim-to-real challenge into a tractable self-supervised task. The dataset and code will be made publicly available to foster robust, physically-grounded relighting research.
View arXiv pageView PDFProject pageAdd to collection
Get this paper in your agent:
hf papers read 2605\.11696
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.11696 in a model README.md to link it from this page.
Datasets citing this paper1
#### Lez/wildrelight Updatedabout 2 hours ago • 97 • 1
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.11696 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Relit-LiVE: Relight Video by Jointly Learning Environment Video
This paper introduces Relit-LiVE, a novel video relighting framework that produces physically consistent results without requiring camera pose information by using raw reference images and joint environment video prediction.
Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance
This paper introduces Sparkle, a new dataset and benchmark for instruction-guided video background replacement, addressing the lack of high-quality training data in this domain. It proposes a scalable pipeline with decoupled guidance to generate realistic foreground-background interactions.
Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction
Lite3R is a model-agnostic framework that improves the efficiency of transformer-based 3D reconstruction using sparse linear attention and FP8-aware quantization. It reduces latency and memory usage by up to 2.4x while maintaining geometric accuracy on backbones like VGGT and DA3-Large.
HP-Edit: A Human-Preference Post-Training Framework for Image Editing
HP-Edit introduces a post-training framework that aligns diffusion-based image editing models with human preferences via RLHF, using a new 50K real-world dataset and an automatic VLM-based evaluator.
ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis
ReImagine introduces an image-first approach to controllable high-quality human video generation, combining SMPL-X motion guidance with video diffusion models to decouple appearance from temporal consistency.