1
1
# WD LLM Caption Cli
2
2
3
- A Python base cli tool for caption images
3
+ A Python base cli tool and a simple gradio GUI for caption images
4
4
with [ WD series] ( https://huggingface.co/SmilingWolf ) , [ joy-caption-pre-alpha] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) , [ LLama3.2 Vision Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
5
5
[ Qwen2 VL Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct ) , [ Mini-CPM V2.6] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
6
- and [ Florence-2] ( https://huggingface.co/microsoft/Florence-2-large ) models.
6
+ and [ Florence-2] ( https://huggingface.co/microsoft/Florence-2-large ) models.
7
+
8
+ <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_GUI.png " width =" 700 " />
7
9
8
10
## Introduce
9
11
@@ -12,18 +14,23 @@ This tool can make a caption with danbooru style tags or a nature language descr
12
14
13
15
### New Changes:
14
16
15
- #### 2024.10.13: Add Florence2 Support. Now LLM will use own default generate params while ` --llm_temperature ` and
16
- ` --llm_max_tokens ` are 0.
17
+ 2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
18
+
19
+ 2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
20
+ GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
21
+
22
+ 2024.10.13: Add Florence2 Support.
23
+ Now LLM will use own default generate params while ` --llm_temperature ` and ` --llm_max_tokens ` are 0.
17
24
18
- #### 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
25
+ 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
19
26
20
- #### 2024.10.09: Build in wheel, now you install this repo from pypi.
27
+ 2024.10.09: Build in wheel, now you can install this repo from pypi.
21
28
22
29
``` shell
23
30
# Install torch base on your GPU driver. e.g.
24
- pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
31
+ pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
25
32
# Install via pip from pypi
26
- pip install wd_llm_caption
33
+ pip install wd-llm-caption
27
34
# For CUDA 11.8
28
35
pip install -U -r requirements_onnx_cu118.txt
29
36
# For CUDA 12.X
@@ -34,15 +41,13 @@ wd-llm-caption --data_path your_data_path
34
41
wd-llm-caption-gui
35
42
```
36
43
37
- #### 2024.10.04: Add Qwen2 VL support.
44
+ 2024.10.04: Add Qwen2 VL support.
38
45
39
- #### 2024.09.30: A simple gui run through gradio now😊
40
-
41
- <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_GUI.png " width =" 300 " />
46
+ 2024.09.30: A simple gui run through gradio now😊
42
47
43
48
## Example
44
49
45
- <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_her.jpg " width =" 300 " height =" 400 " />
50
+ <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_her.jpg " width =" 600 " height =" 800 " />
46
51
47
52
### Standalone Inference
48
53
@@ -167,12 +172,16 @@ place).
167
172
168
173
### Joy Caption models
169
174
170
- | Model | Hugging Face Link | ModelScope Link |
171
- | :---------------------------------:| :---------------------------------------------------------------------------------:| :------------------------------------------------------------------------------------------:|
172
- | joy-caption-pre-alpha | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha ) |
173
- | siglip-so400m-patch14-384(Google) | [ Hugging Face] ( https://huggingface.co/google/siglip-so400m-patch14-384 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384 ) |
174
- | Meta-Llama-3.1-8B | [ Hugging Face] ( https://huggingface.co/meta-llama/Meta-Llama-3.1-8B ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B ) |
175
- | Llama-3.1-8B-Lexi-Uncensored-V2 | [ Hugging Face] ( https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2 ) |
175
+ | Model | Hugging Face Link | ModelScope Link |
176
+ | :----------------------------------:| :-------------------------------------------------------------------------------------:| :----------------------------------------------------------------------------------------------:|
177
+ | joy-caption-pre-alpha | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha ) |
178
+ | Joy-Caption-Alpha-One | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one ) |
179
+ | Joy-Caption-Alpha-Two | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two ) |
180
+ | Joy-Caption-Alpha-Two-Llava | [ Hugging Face] ( https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava ) |
181
+ | siglip-so400m-patch14-384(Google) | [ Hugging Face] ( https://huggingface.co/google/siglip-so400m-patch14-384 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384 ) |
182
+ | Meta-Llama-3.1-8B | [ Hugging Face] ( https://huggingface.co/meta-llama/Meta-Llama-3.1-8B ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B ) |
183
+ | unsloth/Meta-Llama-3.1-8B-Instruct | [ Hugging Face] ( https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct ) |
184
+ | Llama-3.1-8B-Lexi-Uncensored-V2 | [ Hugging Face] ( https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2 ) |
176
185
177
186
### Llama 3.2 Vision Instruct models
178
187
@@ -221,7 +230,7 @@ python -m venv .venv
221
230
222
231
# Install torch
223
232
# Install torch base on your GPU driver. e.g.
224
- pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
233
+ pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
225
234
226
235
# Base dependencies, models for inference will download via python request libs.
227
236
# For WD Caption
@@ -376,6 +385,19 @@ if `queue`, all images will caption with wd models first,
376
385
then caption all of them with joy models while wd captions in joy user prompt.
377
386
default is ` sync ` .
378
387
388
+ ` --caption_extension `
389
+
390
+ extension of caption file, default is ` .txt ` .
391
+ If ` caption_method ` not ` wd+llm ` , it will be wd or llm caption file extension.
392
+
393
+ ` --save_caption_together `
394
+
395
+ Save WD tags and LLM captions in one file.
396
+
397
+ ` --save_caption_together_seperator `
398
+
399
+ Seperator between WD and LLM captions, if they are saved in one file.
400
+
379
401
` --image_size `
380
402
381
403
resize image to suitable, default is ` 1024 ` .
@@ -481,15 +503,15 @@ load joy models use cpu.
481
503
482
504
` --llm_llm_dtype `
483
505
484
- choice joy llm load dtype[ ` auto ` , ` fp16 ` , ` bf16", ` fp32` ], default is ` auto `.
506
+ choice joy llm load dtype[ ` fp16 ` , ` bf16", ` fp32` ], default is ` fp16 `.
485
507
486
508
` --llm_llm_qnt `
487
509
488
510
Enable quantization for joy llm [ ` none ` ,` 4bit ` , ` 8bit ` ] . default is ` none ` .
489
511
490
512
` --llm_caption_extension `
491
513
492
- extension of caption file, default is ` .txt `
514
+ extension of caption file, default is ` .llmcaption `
493
515
494
516
` --llm_read_wd_caption `
495
517
@@ -516,7 +538,7 @@ max tokens for LLM model output, default is `0`, means use llm own default value
516
538
## Credits
517
539
518
540
Base
519
- on [ SmilingWolf/wd-tagger] ( https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py ) , [ joy-caption-pre-alpha ] ( https://huggingface.co/spaces/ fancyfeast/joy-caption-pre-alpha ) , [ meta-llama/Llama-3.2-11B-Vision-Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
520
- [ Qwen/Qwen2-VL-7B-Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct )
521
- and [ Mini-CPM V2.6 ] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
541
+ on [ SmilingWolf/wd-tagger models ] ( https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py ) , [ fancyfeast/ joy-caption models ] ( https://huggingface.co/fancyfeast ) , [ meta-llama/Llama-3.2-11B-Vision-Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
542
+ [ Qwen/Qwen2-VL-7B-Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct ) , [ openbmb/Mini-CPM V2.6 ] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
543
+ and [ microsoft/florence2 ] ( https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de ) .
522
544
Without their works(👏👏), this repo won't exist.
0 commit comments