Official implementation of FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Yuxuan Wang,1 Tianwei Cao1*, Huayu Zhang,2 Zhongjiang He,2 Kongming Liang1*, Zhanyu Ma1
Beijing University of Posts and Telecommunications,1 Institute of Artificial Intelligence China Telecom(TeleAI)2
- [07/04/2025] 🔥 The arXiv paper of FairHuman is released.
- [07/03/2025] 🔥 The training code and inference code are released.
Although large-scale text-to-image models (e.g., diffusion models) have made significant progress, the generation of local details such as faces and hands still suffers from deficiencies due to insufficient training supervision. To address this, we propose a multi-objective fine-tuning framework called FairHuman: 1) constructing a triple optimization objective comprising global (default diffusion objective) and local (based on face/hand spatial priors) targets; 2) employing the Minimum Potential Delay(MPD) criterion to derive a fairness-aware parameter update strategy. This approach significantly improves local detail generation while maintaining overall quality and can be applied in diverse scenarios.
Install the requirements
# create a virtual environment with python >= 3.10 <= 3.12, like
conda create -n fairhuman python=3.10 -y
conda activate fairhuman
# then install the requirements by you need
pip install -r requirements.txt # legacy installation command
# or
conda env create -f environment.yml
# Install the rest of the dependencies
cd preprocessor/hamer
pip install -e .[all]
pip install -v -e third-party/ViTPose
Before get started, put your models as follow:
--models
--base_model # sd base model if use local checkpoints
--finetuned_model # finetuned models (eg., lora, controlnet, adapter)
--hamer # models for extracting control conditions
--data #put mano model
--hamer_ckpts #put hamer.ckpt
--vitpose_ckpts #put wholebody.pth
--vae # vae model if use local checkpoints
--yolo # detectors
Notes: About the installation of hamer, please refer to https://github.com/geopavlakos/hamer
Start from the examples below. ✨
python inference/sdxl_inference.py --eval_txt_path "" --lora_path "" --lora_weight 0.3
python inference/sdxl_controlnet_inference.py --eval_txt_path "" --condition_path "" --controlnet_path ""
python inference/sdxl_adapter_inference.py --eval_txt_path "" --condition_path "" --adapter_path ""
Optional prepreration: we also provide a pipeline for automatical post-refining based on controlnet.
python inference/sdxl_controlnet_refine.py --eval_txt_path "" --controlnet_path "" --target_imgs_path ""
Lora finetuning
bash script/train_lora_sdxl_mpd_fair.sh
Controlnet finetuning
bash script/train_controlnet_sdxl_mpd_fair.sh
T2i_adapter finetuning
bash script/train_t2iadapter_sdxl_mpd_fair.sh
Our custom dataset format can be referred in
dataset/wholebody_dataset.py
and we provide examples of our data curation framework in
preprocessor/mask_annoation.py
preprocessor/preprocess.py
You can improve upon this framework to build a customized dataset for your own specific task and obtain more accurate annotations through advanced models.
Due to resource constraints, our code is primarily built upon the SDXL. Given the transferability of our method, we recommend deploying it on more latest models, such as Flux, to achieve higher image quality.
If FairHuman is helpful, please help to ⭐ the repo.
- Our codebase builds heavily on diffusers