UniRelight

AI Plus

🧠Knowledge

AI format

VAE (Variational Autoencoder) 인코더-디코더 쌍, DiT (Diffusion Transformer) 비디오 모델

Created time

Jul 7, 2025 3:19 PM

Highly Recommend

Platform

Website

Posted By

Nvidia

URL

https://research.nvidia.com/labs/toronto-ai/UniRelight/

UniRelight

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

research.nvidia.com

UniRelight: 영상 리라이팅 기술의 새로운 지평 ✨

자, 그럼 UniRelight가 어떤 기술인지 함께 알아볼까요? 😊 이 기술은 영상 속 조명을 자유자재로 바꾸는 마법 같은 능력으로, 우리가 영상을 만들고 보는 방식을 확 바꿀 수 있답니다! 🎬

1. UniRelight, 무엇이 다른가요? 💡

UniRelight는 단순히 영상의 밝기만 조절하는 게 아니에요. 마치 그림자가 어떻게 생기고, 빛이 어떻게 반사되고, 심지어 유리처럼 투명한 물체가 어떻게 빛을 통과하는지까지 정확하게 파악해서, 새로운 조명 환경에 맞춰 영상을 진짜처럼 재구성해주는 혁신적인 기술이랍니다. 🤩 기존에는 어려웠던 장면의 고유한 특성(알베도)과 조명을 동시에 분리해낸다는 점이 가장 큰 차이점이에요.

2. 기존 기술의 한계와 UniRelight의 해결책 🚧

지금까지의 영상 리라이팅 기술들은 몇 가지 아쉬운 점이 있었어요. 😥 예를 들어, 다양한 조명 환경의 데이터를 구하기 어려워서 어떤 영상에도 잘 적용하기가 힘들었고요, 조명 분리와 재합성을 따로따로 두 단계로 처리하다 보니 오류가 쌓여서 결과물이 부자연스럽게 나올 때도 있었죠. 하지만 UniRelight는 이런 문제들을 똑똑하게 해결했어요! 알베도(장면의 고유한 색상과 질감) 추정과 리라이팅을 한 번에 처리하는 ‘통합 접근 방식’을 사용해서, 더욱 현실적이고 자연스러운 영상을 만들어낸답니다. ✨

3. 핵심 기술: 공동 추정 및 비디오 확산 모델 활용 🤝

UniRelight의 비결은 바로 ‘비디오 확산 모델’이에요. 🎞️ 마치 그림을 그리는 AI처럼, 이 모델의 뛰어난 생성 능력을 활용해서 영상의 알베도와 새로운 조명 아래에서의 모습을 동시에 예측해요. 이렇게 하면 모델이 영상의 구조를 깊이 이해하게 되고, 덕분에 처음 보는 환경에서도 훌륭한 결과물을 만들어낼 수 있게 되는 거죠. 🤯

4. UniRelight의 뛰어난 성능과 결과물 🌟

UniRelight는 실제 영상과 합성 데이터를 모두 학습해서, 기존의 어떤 방법보다도 시각적으로 뛰어나고 시간적으로도 끊김 없는 고품질의 영상을 만들어내요. 특히, 반짝이는 금속이나 투명한 유리 같은 복잡한 재질도 사실적으로 표현해낸다는 점이 정말 놀라워요! 💎

5. 다양한 활용 분야: 조명 증강 💡

UniRelight는 단순히 영상을 예쁘게 만드는 것을 넘어, 다양한 곳에 활용될 수 있어요. 예를 들어, 자율주행 차량 개발을 위해 운전 영상을 밤이나 황혼 같은 다양한 조명 조건으로 바꿔서 학습 데이터를 풍부하게 만들 수 있답니다. 🚗🌃 강력한 일반화 능력 덕분에 이렇게 다양한 시나리오에서 데이터를 효과적으로 늘리는 것이 가능해지는 거죠. 정말 똑똑하고 쓸모 많은 기술이죠? 😊

UniRelight is a relighting framework that jointly models the distribution of scene intrinsics and illumination. It enables high-quality relighting and intrinsic decomposition from a single input image or video, producing temporally consistent shadows, reflections, and transparency, and outperforms state-of-the-art methods.

Abstract

We address the challenge of relighting a single image or video, a task that demands precise scene intrinsic understanding and high-quality light transport synthesis. Existing end-to-end relighting models are often limited by the scarcity of paired multi-illumination data, restricting their ability to generalize across diverse scenes. Conversely, two-stage pipelines that combine inverse and forward rendering can mitigate data requirements but are susceptible to error accumulation and often fail to produce realistic outputs under complex lighting conditions or with sophisticated materials. In this work, we introduce a general-purpose approach that jointly estimates albedo and synthesizes relit outputs in a single pass, harnessing the generative capabilities of video diffusion models. This joint formulation enhances implicit scene comprehension and facilitates the creation of realistic lighting effects and intricate material interactions, such as shadows, reflections, and transparency. Trained on synthetic multi-illumination data and extensive automatically labeled real-world videos, our model demonstrates strong generalization across diverse domains and surpasses previous methods in both visual fidelity and temporal consistency.

Method overview. Given an input video and a target lighting configuration, our method jointly predicts a relit video and its corresponding albedo. We use a pretrained VAE encoder-decoder pair to map input and output videos to a latent space. The latents for the target relit video and albedo are concatenated along the temporal dimension with the encoded input video. Lighting features derived from the environment maps are concatenated along the channel dimension with the relit video latent. A finetuned DiT video model denoises the joint latent, enabling consistent generation of both relit appearance and intrinsic decomposition.

Motivation

Our key insight is to jointly model relighting and albedo estimation. Demodulation provides a strong prior for the relighting task, improving generalization and reducing shadow-baking artifacts.

This joint formulation encourages the model to learn an internal representation of scene structure, leading to improved generalization across diverse and unseen domains.

Joint Estimation

Albedo and relighting joint estimation. Our method produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions.

Relighting Results

UniRelight produces high-quality albedo and relighting results with realistic specular highlights and shadows under target lighting conditions on real-world videos.

22907dc2697881438ce8ec976ce3fb50

22907dc269788195b755f72914a355be

22907dc26978818bb560c2e54437295e

22907dc26978815abb5de119e1326feb

22907dc2697881fa8657f675785788a1

Comparison

Qualitative comparison on in-the-wild data. Our method generates more plausible results than the baselines, with higher quality and more realistic appearance. Especially, on complex materials-such as anisotropic surfaces, glass, and transparent objects, the previous state-of-the-art work DiffusionRenderer struggles to accurately represent the materials, leading to suboptimal results.

landscape Scene 1 landscape Scene 2 landscape Scene 3 landscape Scene 4 landscape Scene 5

Application: Illumination Augmentation

Our model's strong generalization capability enables effective data augmentation across different scenarios. We show several diverse samples of data generated by our model on driving scenes, including nighttime and dusk scenes, demonstrating that it accurately models the illumination distribution and can sample realistic relighting results under varying lighting conditions.

Paper

UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting

Kai He, Ruofan Liang, Jacob Munkberg, Jon Hasselgren, Nandita Vijaykumar, Alexander Keller, Sanja Fidler, Igor Gilitschenski†, Zan Gojcic†, Zian Wang†

description arXiv

description Paper

description Supp Video

BibTeX

@misc{he2025unirelight,
    title={UniRelight: Learning Joint Decomposition and Synthesis for Video Relighting},
    author={Kai He and Ruofan Liang and Jacob Munkberg and Jon Hasselgren and Nandita Vijaykumar
        and Alexander Keller and Sanja Fidler and Igor Gilitschenski and Zan Gojcic and Zian Wang},
    year={2025},
    eprint={2506.15673},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgment

The authors thank Tianshi Cao and Huan Ling for their insightful discussions that contributed to this project.