GitHub - Kunbyte-AI/Lumen: Official Repository of Lumen: Consistent Video Relighting and Harmonious Background Replacement

💡Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

Jianshu Zeng^1,3*, Yuxuan Liu^2*, Yutong Feng^2†, Chenxuan Miao⁴, Zixiang Gao¹, Jiwang Qu⁵, Jianzhang Zhang⁵, Bin Wang², Kun Yuan^1‡
¹Peking University, ²Kunbyte AI, ³University of Chinese Academy of Sciences, ⁴Zhejiang University, ⁵Hangzhou Normal University
^*Equal Contribution, ^†Project Leader, ^‡Corresponding Author

🔍 Introduction

✈️✈️ See Our Project Page for more demonstration.

Abstract

Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting in the foreground with harmonious blending. During translation, it is essential to preserve the original properties of the foreground, e.g. albedo, and propagate consistent relighting among temporal frames. While previous research mainly relies on 3D simulation, recent works leverage the generalization ability of diffusion generative models to achieve a learnable relighting of images.
In this paper, we propose Lumen, an end-to-end video relighting framework developed on large-scale video generative models, receiving flexible textual description for instructing the control of lighting and background. Considering the scarcity of high-qualified paired videos with the same foreground in various lighting conditions, we construct a large-scale dataset with a mixture of realistic and synthetic videos. For the synthetic domain, benefiting from the abundant 3D assets in the community, we leverage advanced 3D rendering engine to curate video pairs in diverse environments. For the realistic domain, we adapt a HDR-based lighting simulation to complement the lack of paired in-the-wild videos.
Powered by the aforementioned dataset, we design a joint training curriculum to effectively unleash the strengths of each domain, i.e., the physical consistency in synthetic videos, and the generalized domain distribution in realistic videos. To implement this, we inject a domain-aware adapter into the model to decouple the learning of relighting and domain appearance distribution. We construct a comprehensive benchmark to evaluate Lumen together with existing methods, from the perspectives of foreground preservation and video consistency assessment. Experimental results demonstrate that Lumen effectively edit the input into cinematic relighted videos with consistent lighting and strict foreground preservation.

Data Preparation

The data preparation and examples of two domains. (a) The 3D rendered data combines various environments, characters and animations to form paired videos with aligned foreground. (b) The realistic videos are transformed into uniform-lit appearance and rendered with HDR-based relighting.

Framework

The framework of Lumen, which is developed on a video generative model in DiT architecture. The model consumes the concatenation of noisy tokens and the masked input video. An adapter module is injected into the backbone to decouple the style distribution in 3D paired videos.

🚀 Quick Start

Environment

conda create -n lumen python=3.10 -y
conda activate lumen
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Models

We use the Wan2.1-Fun-1.3B-Control from VideoX-Fun as our base model, which is based on the Wan2.1. The main difference of the model structure of the DiT between Wan2.1-Fun and Wan2.1 is that Wan2.1-Fun extends the channel dimension of the latent space from 16 to 36(Wan2.1-Fun-Inp) or 48(Wan2.1-Fun-Control), so that it can concatenate the condition video and the origin video at channel dimension as input and then generate the result video based on the text.

You can download the weights of Lumen at Kunbyte/Lumen.

modelscope download --model PAI/Wan2.1-Fun-1.3B-Control --local_dir ckpt/Wan2.1-Fun-1.3B-Control
or
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download alibaba-pai/Wan2.1-Fun-1.3B-Control --local-dir ckpt/Wan2.1-Fun-1.3B-Control

[optional] modelscope download --model PAI/Wan2.1-Fun-14B-Control --local_dir ckpt/Wan2.1-Fun-14B-Control --exclude 'Wan2.1_VAE*' 'models_t5*' 'models_clip*'

huggingface-cli download Kunbyte/Lumen --local-dir ckpt/Lumen

The checkpoint directory is shown below.

Lumen/
└── ckpt/
    ├── Wan2.1-Fun-1.3B-Control
        ├── diffusion_pytorch_model.safetensors
        ├── Wan2.1_VAE.pth
        ├── models_t5_umt5-xxl-enc-bf16.pth
        ├── models_clip_open-clip-xlm-roberta-large-vit-huge-14.pth
            ...
    ├── Wan2.1-Fun-14B-Control
        └── diffusion_pytorch_model.safetensors
    ├── Lumen
        ├── Lumen-T2V-1.3B-V1.0.ckpt

Inference

python infer_t2v.py
python app_lumen.py

✅ TODOs

[2025.8.20] release paper, project page, inference code, models(Lumen-T2V-1.3B-V1.0)
release models: Lumen-T2V-14B-Lora, Lumen-I2V-1.3B/14B-Lora (soon)
release training code and training data

📋 Citation

if you find our work helpful, please consider citing:

@article{zeng2025lumen,
    title={Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models},
    author={Zeng, Jianshu and Liu, Yuxuan and Feng, Yutong and Miao, Chenxuan and Gao, Zixiang and Qu, Jiwang and Zhang, Jianzhang and Wang, Bin and Yuan, Kun},
    journal={arXiv preprint arXiv:2508.12945},
    year={2025},
    url={https://arxiv.org/abs/2508.12945}, 
}

Acknowledgements

We would like to thank the contributors to the DiffSynth-Studio, VideoX-Fun, etc. for their open research and exploration.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets/figs		assets/figs
diffsynth		diffsynth
my_data		my_data
test/pachong_test		test/pachong_test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app_lumen.py		app_lumen.py
infer_t2v.py		infer_t2v.py
joycaption.py		joycaption.py
qwen_caption_video.py		qwen_caption_video.py
requirements.txt		requirements.txt
rmbg2.py		rmbg2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💡Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

🔍 Introduction

Abstract

Data Preparation

Framework

🚀 Quick Start

Environment

Models

Inference

✅ TODOs

📋 Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Kunbyte-AI/Lumen

Folders and files

Latest commit

History

Repository files navigation

💡Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models

🔍 Introduction

Abstract

Data Preparation

Framework

🚀 Quick Start

Environment

Models

Inference

✅ TODOs

📋 Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages