Skip to content

Commit 10c6ae0

Browse files
committed
Add Joy-Caption-Alpha-One, Joy-Caption-Alpha-Two, Joy-Caption-Alpha-Two-Llava Support
1 parent b35420c commit 10c6ae0

15 files changed

+1291
-407
lines changed

CHANGLOG.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
### NEW
22

3-
1. Add Mini-CPM V2.6 Support.
4-
2. Add Florence2 Support.
3+
1. Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
4+
2. GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
5+
3. Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
56

67
### CHANGE
78

8-
1. GUI using Gradio 5 now.
9-
2. Now LLM will use own default generate params while `--llm_temperature` and `llm_max_tokens` are 0.
9+
1. Upgrade some dependencies version.
10+
2. Remove `--llm_dtype` option `auto`(Avoid cause bugs)
1011

1112
### BUG FIX
1213

DEMO/DEMO_GUI.png

-16.6 KB
Loading

README.md

Lines changed: 47 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# WD LLM Caption Cli
22

3-
A Python base cli tool for caption images
3+
A Python base cli tool and a simple gradio GUI for caption images
44
with [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
55
[Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
6-
and [Florence-2](https://huggingface.co/microsoft/Florence-2-large)models.
6+
and [Florence-2](https://huggingface.co/microsoft/Florence-2-large) models.
7+
8+
<img alt="DEMO_her.jpg" src="DEMO/DEMO_GUI.png" width="700"/>
79

810
## Introduce
911

@@ -12,18 +14,23 @@ This tool can make a caption with danbooru style tags or a nature language descr
1214

1315
### New Changes:
1416

15-
#### 2024.10.13: Add Florence2 Support. Now LLM will use own default generate params while `--llm_temperature` and
16-
`--llm_max_tokens` are 0.
17+
2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
18+
19+
2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
20+
GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
21+
22+
2024.10.13: Add Florence2 Support.
23+
Now LLM will use own default generate params while `--llm_temperature` and `--llm_max_tokens` are 0.
1724

18-
#### 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
25+
2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
1926

20-
#### 2024.10.09: Build in wheel, now you install this repo from pypi.
27+
2024.10.09: Build in wheel, now you can install this repo from pypi.
2128

2229
```shell
2330
# Install torch base on your GPU driver. e.g.
24-
pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
31+
pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
2532
# Install via pip from pypi
26-
pip install wd_llm_caption
33+
pip install wd-llm-caption
2734
# For CUDA 11.8
2835
pip install -U -r requirements_onnx_cu118.txt
2936
# For CUDA 12.X
@@ -34,15 +41,13 @@ wd-llm-caption --data_path your_data_path
3441
wd-llm-caption-gui
3542
```
3643

37-
#### 2024.10.04: Add Qwen2 VL support.
44+
2024.10.04: Add Qwen2 VL support.
3845

39-
#### 2024.09.30: A simple gui run through gradio now😊
40-
41-
<img alt="DEMO_her.jpg" src="DEMO/DEMO_GUI.png" width="300"/>
46+
2024.09.30: A simple gui run through gradio now😊
4247

4348
## Example
4449

45-
<img alt="DEMO_her.jpg" src="DEMO/DEMO_her.jpg" width="300" height="400"/>
50+
<img alt="DEMO_her.jpg" src="DEMO/DEMO_her.jpg" width="600" height="800"/>
4651

4752
### Standalone Inference
4853

@@ -167,12 +172,16 @@ place).
167172

168173
### Joy Caption models
169174

170-
| Model | Hugging Face Link | ModelScope Link |
171-
|:---------------------------------:|:---------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|
172-
| joy-caption-pre-alpha | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha) |
173-
| siglip-so400m-patch14-384(Google) | [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384) |
174-
| Meta-Llama-3.1-8B | [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B) |
175-
| Llama-3.1-8B-Lexi-Uncensored-V2 | [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2) |
175+
| Model | Hugging Face Link | ModelScope Link |
176+
|:----------------------------------:|:-------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|
177+
| joy-caption-pre-alpha | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha) |
178+
| Joy-Caption-Alpha-One | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one) |
179+
| Joy-Caption-Alpha-Two | [Hugging Face](https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two) |
180+
| Joy-Caption-Alpha-Two-Llava | [Hugging Face](https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava) |
181+
| siglip-so400m-patch14-384(Google) | [Hugging Face](https://huggingface.co/google/siglip-so400m-patch14-384) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384) |
182+
| Meta-Llama-3.1-8B | [Hugging Face](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B) |
183+
| unsloth/Meta-Llama-3.1-8B-Instruct | [Hugging Face](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct) |
184+
| Llama-3.1-8B-Lexi-Uncensored-V2 | [Hugging Face](https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2) | [ModelScope](https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2) |
176185

177186
### Llama 3.2 Vision Instruct models
178187

@@ -221,7 +230,7 @@ python -m venv .venv
221230

222231
# Install torch
223232
# Install torch base on your GPU driver. e.g.
224-
pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
233+
pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
225234

226235
# Base dependencies, models for inference will download via python request libs.
227236
# For WD Caption
@@ -376,6 +385,19 @@ if `queue`, all images will caption with wd models first,
376385
then caption all of them with joy models while wd captions in joy user prompt.
377386
default is `sync`.
378387

388+
`--caption_extension`
389+
390+
extension of caption file, default is `.txt`.
391+
If `caption_method` not `wd+llm`, it will be wd or llm caption file extension.
392+
393+
`--save_caption_together`
394+
395+
Save WD tags and LLM captions in one file.
396+
397+
`--save_caption_together_seperator`
398+
399+
Seperator between WD and LLM captions, if they are saved in one file.
400+
379401
`--image_size`
380402

381403
resize image to suitable, default is `1024`.
@@ -481,15 +503,15 @@ load joy models use cpu.
481503

482504
`--llm_llm_dtype`
483505

484-
choice joy llm load dtype[`auto`, `fp16`, `bf16", `fp32`], default is `auto`.
506+
choice joy llm load dtype[`fp16`, `bf16", `fp32`], default is `fp16`.
485507

486508
`--llm_llm_qnt`
487509

488510
Enable quantization for joy llm [`none`,`4bit`, `8bit`]. default is `none`.
489511

490512
`--llm_caption_extension`
491513

492-
extension of caption file, default is `.txt`
514+
extension of caption file, default is `.llmcaption`
493515

494516
`--llm_read_wd_caption`
495517

@@ -516,7 +538,7 @@ max tokens for LLM model output, default is `0`, means use llm own default value
516538
## Credits
517539

518540
Base
519-
on [SmilingWolf/wd-tagger](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
520-
[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
521-
and [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
541+
on [SmilingWolf/wd-tagger models](https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py), [fancyfeast/joy-caption models](https://huggingface.co/fancyfeast), [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
542+
[Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [openbmb/Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
543+
and [microsoft/florence2](https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de).
522544
Without their works(👏👏), this repo won't exist.

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
v0.1.3-alpha
1+
v0.1.4-alpha

pyproject.toml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -67,21 +67,21 @@ classifiers = [
6767
dependencies = [
6868
"numpy>=1.26.4,<2.0.0",
6969
"opencv-python-headless==4.10.0.84",
70-
"pillow==10.4.0",
70+
"pillow>=10.4.0",
7171
"requests==2.32.3",
7272
"tqdm==4.66.5",
7373
"accelerate>=0.34.2",
7474
"bitsandbytes>=0.42.0",
75-
"peft==0.13.2",
75+
# "peft==0.13.2",
7676
"sentencepiece==0.2.0",
7777
"transformers==4.45.2",
78-
"timm==1.0.9",
78+
"timm==1.0.11",
7979
"torch>=2.1.0",
8080
"onnx==1.17.0",
8181
"onnxruntime==1.19.2",
82-
"huggingface_hub>=0.25.1",
83-
"modelscope>=1.18.1",
84-
"gradio>=5.0.2"
82+
"huggingface_hub>=0.26.0",
83+
"modelscope>=1.19.0",
84+
"gradio>=5.1.0"
8585
]
8686

8787
[project.urls]

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
numpy>=1.26.4
1+
numpy>=1.26.4,<2.0.0
22
opencv-python-headless==4.10.0.84
3-
pillow==10.4.0
3+
pillow>=10.4.0
44
requests==2.32.3
55
tqdm==4.66.5

requirements_gui.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
gradio>=5.0.2
1+
gradio>=5.1.0

requirements_huggingface.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
huggingface_hub==0.25.1
1+
huggingface_hub==0.25.2

requirements_llm.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
accelerate==0.34.2
22
bitsandbytes==0.44.1
3-
peft==0.13.2
3+
# peft==0.13.2
44
sentencepiece==0.2.0
55
transformers==4.45.2
6-
timm==1.0.9
6+
timm==1.0.11
77
-r requirements.txt

requirements_modelscope.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
modelscope==1.18.1
1+
modelscope>=1.19.0

0 commit comments

Comments
 (0)