DCI-VTON-with-Integrated-LoRA-Support-for-Hand-Adjustment-Enhancement

Abstract: The ever-evolving world of virtual trials is quite intriguing for those who follow technology. This project builds upon the existing DCI-VTON model by introducing several improvements. While successfully aligning the clothing, distortions were observed in the hands. We aimed to observe improvement by integrating our own solution method to this problem.

It was not necessary to retrain the entire model from scratch for the abnormalities detected in the hands. Training was performed by focusing on a small area using LoRA, one of the fine-tuning methods. The project was developed by integrating this method.

DCI-VTON Installation

Installation

Diffusion Model

Clone the repository

cd ./Documents/
git clone https://github.com/bcmi/DCI-VTON-Virtual-Try-On.git
cd DCI-VTON-Virtual-Try-On

Install Python dependencies

conda env create -f environment.yaml
conda activate dci-vton

Download the pretrained vgg checkpoint and put it in models/vgg/

Warping Module

Clone the PF-AFN repository

git clone https://github.com/geyuying/PF-AFN.git

Move the code to the corresponding directory

cp -r DCI-VTON-Virtual-Try-On/warp/train/* PF-AFN/PF-AFN_train/
cp -r DCI-VTON-Virtual-Try-On/warp/test/* PF-AFN/PF-AFN_test/

Data Preparation

VITON-HD

Download VITON-HD dataset (We downloaded it from this link - VITON-HD-Kaggle)
Download pre-warped cloth image/mask from Google Drive or Baidu Cloud and put it under your VITON-HD dataset

After these, the folder structure should look like this (the unpaired-cloth* only included in test directory):

├── VITON-HD
|   ├── test_pairs.txt
|   ├── train_pairs.txt
│   ├── [train | test]
|   |   ├── image
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth-mask
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth-warp
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── cloth-warp-mask
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── unpaired-cloth-warp
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]
│   │   ├── unpaired-cloth-warp-mask
│   │   │   ├── [000006_00.jpg | 000008_00.jpg | ...]

Inference

VITON-HD

Please download the pretrained model from Google Drive or Baidu Cloud.

Warping Module

To test the warping module, first move the warp_viton.pth to checkpoints directory:

mv warp_viton.pth PF-AFN/PF-AFN_test/checkpoints

Then run the following command:

cd PF-AFN/PF-AFN_test
sh test_VITON.sh

After inference, you can put the results in the VITON-HD for inference and training of the diffusion model.

Diffusion Model

To quickly test our diffusion model, run the following command:

python test.py --plms --gpu_id 0 \
--ddim_steps 100 \
--outdir results/viton \
--config configs/viton512.yaml \
--ckpt /CHECKPOINT_PATH/viton512.ckpt \
--dataroot /DATASET_PATH/ \
--n_samples 8 \
--seed 23 \
--scale 1 \
--H 512 \
--W 512 \
--unpaired

or just simply run:

sh test.sh

Training

Warping Module

To train the warping module, just run following commands:

cd PF-AFN/PF-AFN_train/
sh train_VITON.sh

Diffusion Model

We utilize the pretrained Paint-by-Example as initialization, please download the pretrained models from Google Drive and save the model to directory checkpoints.

To train a new model on VITON-HD, you should first modify the dataroot of VITON-HD dataset in configs/viton512.yaml and then use main.py for training. For example,

python -u main.py \
--logdir models/dci-vton \
--pretrained_model checkpoints/model.ckpt \
--base configs/viton512.yaml \
--scale_lr False

or simply run:

sh train.sh

Acknowledgements

Our code is heavily borrowed from Paint-by-Example. We also thank PF-AFN, our warping module depends on it.

Citation

@inproceedings{gou2023taming,
  title={Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow},
  author={Gou, Junhong and Sun, Siyu and Zhang, Jianfu and Si, Jianlou and Qian, Chen and Zhang, Liqing},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  year={2023}
}

LoRA Preparation

First, create a new set of models with clearly visible hands from the VITON-HD dataset. You can use the little_dataset_creation.py script for this.

cd ./Documents/DCI-VTON-Virtual-Try-On/scripts
python little_dataset_creation.py

Apply this script to the following 5:

image/
openpose_json/
image-parse-v3/   
image-densepose/
agnostic-mask/

After these, the folder structure should look like this

./little_dataset/
├── agnostic-mask/
├── openpose.json/
├── image/
├── image-densepose/
└── image-parse-v3/

After creating the data set, we create the hand masks of the models.You can use the hand_mask_creation.py script for this.
```
cd DCI-VTON-Virtual-Try-On/scripts
python hand_mask_creation.py
```

After creating the venv virtual environment, activate it. Install the Kohya_ss training framework and make sure it contains the sd-scripts folder. You can do all this by following the instructions below.

cd ~/Documents/
python3 -m venv kohya_env 
source kohya_env/bin/activate # Activate virtual environment

git clone https://github.com/bmaltais/kohya_ss 
cd kohya_ss
git clone https://github.com/kohya-ss/sd-scripts.git
pip install -r requirements.txt

LoRA Training

For the training phase, first make sure you are in the kohya_ss directory in your terminal. Then, run the train_network.py script located in the sd-scripts folder (or in the root directory of kohya_ss), adjusting the parameters below according to your own dataset.

In this step, you need to create a folder named img. Inside it, create another folder containing the original model images and name it {step-count}_hands. The train_data_dir parameter should be set to the path of the img folder. Otherwise, you may get errors ahead.

python sd-scripts/train_network.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --train_data_dir="Path For Your Own Training Data Folder" \ --output_dir="Path For Your Own Output Folder" \ 
--resolution=512 \ 
--network_dim=64 \ 
--network_alpha=32 \ 
--batch_size=4 \ 
--max_train_steps=1500 \ 
--save_every_n_epochs=100 \ 
--mixed_precision=fp16 \ 
--xformers \ 
--gradient_checkpointing \ 
—-gradient_accumulation_steps=4 \ 
–-network_module=networks.lora \ 
–-enable_bucket

After the training phase, verify that weights with the .safetensors extension have been created and run the infer.py file for inference, adjusting the parameters below according to your data. (Make sure you are in the correct directory.)

PYTHONPATH=$PWD python infer.py \ 
--config "$PWD/configs/viton512.yaml" \ 
--input_dir "*The Folder With Original Images*" \ 
--output_dir "*Folder Path For The Inference Results*" \ 
--checkpoint "$PWD/ldm/models/checkpoints/viton512_v2.ckpt" \ --lora_scale 0.8 \ 
--hand_correction True \ 
--hand_mask_dir "*Your Path To Hand Masks Directory*"

You've completed all the preparations! Now it's time to see the results. Update the directory paths in test.py from the original project to match your setup, and run it. The results are waiting for you in the result folder!

python test.py --plms --gpu_id 0 \
--ddim_steps 100 \ #Exp 10
--outdir results/viton \
--config configs/viton512.yaml \
--ckpt /CHECKPOINT_PATH/viton512.ckpt \
--dataroot /DATASET_PATH/ \
--n_samples 8 \ #Exp 1 or 2
--seed 23 \
--scale 1 \
--H 512 \
--W 512 \
--unpaired

Results and Discussion

In the DCI-VTON project, which features strong alignment capabilities, our focus on hand enhancement was successfully implemented through all stages of the pipeline. However, due to hardware limitations, our training machine was not sufficient, resulting in moderate performance (approximately 50% success rate). The training was conducted on a small dataset of 385 samples. The number of steps was set to the maximum capacity our machine could handle, which was 6000 steps in our case. To achieve better results, performance can be improved by expanding the dataset and increasing the number of training steps. Since the training was performed on a small region, the ideal dataset size would be around 1000–2000 samples. The number of epochs should also be adjusted based on the dataset size to avoid underfitting or overfitting.

A few examples of the results we obtained are given below. The left side displays the original DCI-VTON output, whereas the right side shows the output generated by the LoRA-integrated DCI-VTON model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DCI-VTON-with-Integrated-LoRA-Support-for-Hand-Adjustment-Enhancement

DCI-VTON Installation

Installation

Diffusion Model

Warping Module

Data Preparation

VITON-HD

Inference

VITON-HD

Warping Module

Diffusion Model

Training

Warping Module

Diffusion Model

Acknowledgements

Citation

LoRA Preparation

LoRA Training

Results and Discussion

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
configs		configs
ldm		ldm
scripts		scripts
warp		warp
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
infer.py		infer.py
main.py		main.py
test.py		test.py
test.sh		test.sh
train.sh		train.sh
train_lora.py		train_lora.py

License

beyza-ozben/DCI-VTON-with-Integrated-LoRA-Support-for-Hand-Adjustment-Enhancement

Folders and files

Latest commit

History

Repository files navigation

DCI-VTON-with-Integrated-LoRA-Support-for-Hand-Adjustment-Enhancement

DCI-VTON Installation

Installation

Diffusion Model

Warping Module

Data Preparation

VITON-HD

Inference

VITON-HD

Warping Module

Diffusion Model

Training

Warping Module

Diffusion Model

Acknowledgements

Citation

LoRA Preparation

LoRA Training

Results and Discussion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages