Skip to content

Commit 9aff91c

Browse files
lkk12014402chensuyue
authored andcommitted
diffusion model example (#1404)
(cherry picked from commit 4247fd3)
1 parent 7164e32 commit 9aff91c

File tree

6 files changed

+1295
-0
lines changed

6 files changed

+1295
-0
lines changed
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
## A example using Textual Inversion method to personalize text2image
2+
3+
**note**: the example is integrating INC in progress.
4+
5+
[Textual inversion](https://arxiv.org/abs/2208.01618) is a method to personalize text2image models like stable diffusion on your own images._By using just 3-5 images new concepts can be taught to Stable Diffusion and the model personalized on your own images_
6+
The `textual_inversion.py` script shows how to implement the training procedure and adapt it for stable diffusion.
7+
8+
### Installing the dependencies
9+
10+
Before running the scripts, make sure to install the library's training dependencies:
11+
12+
```bash
13+
pip install -r requirements.txt
14+
```
15+
16+
### Nezha cartoon example
17+
18+
You need to accept the model license before downloading or using the weights. In this example we'll use model version `v1-4`, so you'll need to visit [its card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree.
19+
20+
You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).
21+
22+
Run the following command to authenticate your token
23+
24+
```bash
25+
huggingface-cli login
26+
```
27+
28+
If you have already cloned the repo, then you won't need to go through these steps.
29+
30+
<br>
31+
32+
Now let's get our dataset. We just use one picture of nezha which is a screen shot from the `52'51` of the `Nezha: Birth of the Demon Child` movie, and save it to the `./nezha` directory. The picture show below:
33+
34+
![nezha](./nezha/1.jpg)
35+
36+
#### finetune with CPU using IPEX
37+
38+
The following script shows how to use CPU with BF16
39+
40+
```bash
41+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
42+
export DATA_DIR="./nezha"
43+
44+
# add use_bf16
45+
python textual_inversion_ipex.py \
46+
--pretrained_model_name_or_path=$MODEL_NAME \
47+
--train_data_dir=$DATA_DIR \
48+
--learnable_property="object" \
49+
--placeholder_token="nezha" --initializer_token="cartoon" \
50+
--resolution=512 \
51+
--train_batch_size=1 \
52+
--gradient_accumulation_steps=4 \
53+
--use_bf16 \
54+
--max_train_steps=3000 \
55+
--learning_rate=5.0e-04 --scale_lr \
56+
--lr_scheduler="constant" \
57+
--lr_warmup_steps=0 \
58+
--output_dir="nezha_output"
59+
```
60+
61+
#### finetune with GPU using accelerate
62+
63+
Initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:
64+
65+
```bash
66+
accelerate config
67+
```
68+
69+
And launch the training using
70+
71+
```bash
72+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
73+
export DATA_DIR="./nezha"
74+
75+
accelerate launch textual_inversion.py \
76+
--pretrained_model_name_or_path=$MODEL_NAME \
77+
--train_data_dir=$DATA_DIR \
78+
--learnable_property="object" \
79+
--placeholder_token="nezha" --initializer_token="cartoon" \
80+
--resolution=512 \
81+
--train_batch_size=1 \
82+
--gradient_accumulation_steps=4 \
83+
--max_train_steps=3000 \
84+
--learning_rate=5.0e-04 --scale_lr \
85+
--lr_scheduler="constant" \
86+
--lr_warmup_steps=0 \
87+
--output_dir="nezha_output"
88+
```
89+
90+
91+
### Inference
92+
93+
Once you have trained a model using above command, the inference can be done simply using the `StableDiffusionPipeline`. Make sure to include the `placeholder_token` in your prompt.
94+
95+
```python
96+
from diffusers import StableDiffusionPipeline
97+
import torch
98+
99+
model_id = "nezha_output"
100+
101+
# use gpu
102+
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
103+
104+
# use cpu
105+
# pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float)
106+
107+
prompt = "a graffiti in a wall with a nezha on it"
108+
109+
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
110+
111+
image.save("./generated_images/graffiti.png")
112+
```
113+
114+
one of the inference result shows below:
115+
116+
![nezha](./generated_images/graffiti.png)
117+
Loading
Loading
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
diffusers==0.4.1
2+
accelerate
3+
torchvision
4+
transformers>=4.21.0
5+
ftfy
6+
tensorboard
7+
modelcards
8+
intel_extension_for_pytorch

0 commit comments

Comments
 (0)