@@ -12,10 +12,17 @@ My summary tweet is found [here](https://twitter.com/mk1stats/status/16428655051
12
12
left: LoRA, right: SVDiff
13
13
14
14
15
- Compared with LoRA, the number of trainable parameters is 0.6 M less parameters and the file size is only <1MB (LoRA: 3.1MB)!!
15
+ Compared with LoRA, the number of trainable parameters is 0.5 M less parameters and the file size is only 1.2MB (LoRA: 3.1MB)!!
16
16
17
17
![ kumamon] ( assets/kumamon.png )
18
18
19
+ ## Updates
20
+ ### 2023.4.11
21
+ - Released v0.2.0 (please see [ here] ( https://github.com/mkshing/svdiff-pytorch/releases/tag/v0.2.0 ) for the details)
22
+ - Add [ Single Image Editing] ( #single-image-editing )
23
+ ![ chair-result] ( assets/chair-result.png )
24
+ <br >"photo of a ~~ pink~~ blue chair with black legs"
25
+
19
26
## Installation
20
27
```
21
28
$ pip install svdiff-pytorch
@@ -26,9 +33,10 @@ $ git clone https://github.com/mkshing/svdiff-pytorch
26
33
$ pip install -r requirements.txt
27
34
```
28
35
29
- ## Training
30
- The following example script is for "Single-Subject Generation", which is a domain-tuning on a single object or concept (using 3-5 images). (See Section 4.1)
36
+ ## Single-Subject Generation
37
+ "Single-Subject Generation" is a domain-tuning on a single object or concept (using 3-5 images). (See Section 4.1)
31
38
39
+ ### Training
32
40
According to the paper, the learning rate for SVDiff needs to be 1000 times larger than the lr used for fine-tuning.
33
41
34
42
``` bash
@@ -48,29 +56,32 @@ accelerate launch train_svdiff.py \
48
56
--resolution=512 \
49
57
--train_batch_size=1 \
50
58
--gradient_accumulation_steps=1 \
51
- --learning_rate=5e-3 \
59
+ --learning_rate=1e-3 \
60
+ --learning_rate_1d=1e-6 \
61
+ --train_text_encoder \
52
62
--lr_scheduler=" constant" \
53
63
--lr_warmup_steps=0 \
54
64
--num_class_images=200 \
55
- --max_train_steps=800
65
+ --max_train_steps=500
56
66
```
57
67
58
-
59
- ## Inference
68
+ ### Inference
60
69
61
70
``` python
62
71
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
63
72
import torch
64
73
65
- from svdiff_pytorch import load_unet_for_svdiff
74
+ from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff
66
75
67
76
pretrained_model_name_or_path = " runwayml/stable-diffusion-v1-5"
68
- spectral_shifts_ckpt = " spectral_shifts.safetensors-path"
69
- unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt = spectral_shifts_ckpt, subfolder = " unet" )
77
+ spectral_shifts_ckpt_dir = " ckpt-dir-path"
78
+ unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt = spectral_shifts_ckpt_dir, subfolder = " unet" )
79
+ text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt = spectral_shifts_ckpt_dir, subfolder = " text_encoder" )
70
80
# load pipe
71
81
pipe = StableDiffusionPipeline.from_pretrained(
72
82
pretrained_model_name_or_path,
73
83
unet = unet,
84
+ text_encoder = text_encoder,
74
85
)
75
86
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
76
87
pipe.to(" cuda" )
@@ -82,14 +93,14 @@ You can use the following CLI too! Once it's done, you will see `grid.png` for t
82
93
``` bash
83
94
python inference.py \
84
95
--pretrained_model_name_or_path=" runwayml/stable-diffusion-v1-5" \
85
- --spectral_shifts_ckpt=" spectral_shifts.safetensors -path" \
96
+ --spectral_shifts_ckpt=" ckpt-dir -path" \
86
97
--prompt=" A picture of a sks dog in a bucket" \
87
98
--scheduler_type=" dpm_solver++" \
88
99
--num_inference_steps=25 \
89
100
--num_images_per_prompt=2
90
101
```
91
102
92
- ## Gradio
103
+ ### Gradio
93
104
You can also try SVDiff-pytorch in a UI with [ gradio] ( https://gradio.app/ ) . This demo supports both training and inference!
94
105
95
106
[ ![ Open in Spaces] ( https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg )] ( https://huggingface.co/spaces/svdiff-library/SVDiff-Training-UI )
@@ -103,7 +114,73 @@ $ export HF_TOKEN="YOUR_HF_TOKEN_HERE"
103
114
$ python app.py
104
115
```
105
116
117
+ ## Single Image Editing
118
+ ### Training
119
+ In Single Image Editing, your instance prompt should be just the description of your input image ** without the identifier** .
120
+
121
+ ``` bash
122
+ export MODEL_NAME=" runwayml/stable-diffusion-v1-5"
123
+ export INSTANCE_DIR=" dir-path-to-input-image"
124
+ export CLASS_DIR=" path-to-class-images"
125
+ export OUTPUT_DIR=" path-to-save-model"
126
+
127
+ accelerate launch train_svdiff.py \
128
+ --pretrained_model_name_or_path=$MODEL_NAME \
129
+ --instance_data_dir=$INSTANCE_DIR \
130
+ --class_data_dir=$CLASS_DIR \
131
+ --output_dir=$OUTPUT_DIR \
132
+ --with_prior_preservation --prior_loss_weight=1.0 \
133
+ --instance_prompt=" photo of a pink chair with black legs" \
134
+ --class_prompt=" photo of a chair" \
135
+ --resolution=512 \
136
+ --train_batch_size=1 \
137
+ --gradient_accumulation_steps=1 \
138
+ --learning_rate=1e-3 \
139
+ --learning_rate_1d=1e-6 \
140
+ --train_text_encoder \
141
+ --lr_scheduler=" constant" \
142
+ --lr_warmup_steps=0 \
143
+ --num_class_images=200 \
144
+ --max_train_steps=500
145
+ ```
146
+
147
+ ### Inference
148
+
149
+ ``` python
150
+ import torch
151
+ from PIL import Image
152
+ from diffusers import DDIMScheduler
153
+ from svdiff_pytorch import load_unet_for_svdiff, load_text_encoder_for_svdiff, StableDiffusionPipelineWithDDIMInversion
154
+
155
+ pretrained_model_name_or_path = " runwayml/stable-diffusion-v1-5"
156
+ spectral_shifts_ckpt_dir = " ckpt-dir-path"
157
+ image = " path-to-image"
158
+ source_prompt = " prompt-for-image"
159
+ target_prompt = " prompt-you-want-to-generate"
160
+
161
+ unet = load_unet_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt = spectral_shifts_ckpt_dir, subfolder = " unet" )
162
+ text_encoder = load_text_encoder_for_svdiff(pretrained_model_name_or_path, spectral_shifts_ckpt = spectral_shifts_ckpt_dir, subfolder = " text_encoder" )
163
+ # load pipe
164
+ pipe = StableDiffusionPipelineWithDDIMInversion.from_pretrained(
165
+ pretrained_model_name_or_path,
166
+ unet = unet,
167
+ text_encoder = text_encoder,
168
+ )
169
+ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
170
+ pipe.to(" cuda" )
171
+
172
+ # (optional) ddim inversion
173
+ # if you don't do it, inv_latents = None
174
+ image = Image.open(image).convert(" RGB" ).resize((512 , 512 ))
175
+ # in SVDiff, they use guidance scale=1 in ddim inversion
176
+ inv_latents = pipe.invert(source_prompt, image = image, guidance_scale = 1.0 ).latents
177
+
178
+ image = pipe(target_prompt, latents = inv_latents).images[0 ]
179
+ ```
180
+
181
+
106
182
## Additional Features
183
+
107
184
### Spectral Shift Scaling
108
185
109
186
![ scale] ( assets/scale.png )
@@ -165,6 +242,7 @@ And, add `--enable_tome_merging` to your training arguments!
165
242
- [x] Training
166
243
- [x] Inference
167
244
- [x] Scaling spectral shifts
245
+ - [x] Support Single Image Editing
168
246
- [ ] Support multiple spectral shifts (Section 3.2)
169
247
- [ ] Cut-Mix-Unmix (Section 3.3)
170
248
- [ ] SVDiff + LoRA
0 commit comments