Skip to content

Commit b35420c

Browse files
committed
Let LLM use different default generate params while --llm_temperature and llm_max_tokens are 0.
1 parent 6dc8898 commit b35420c

File tree

7 files changed

+123
-27
lines changed

7 files changed

+123
-27
lines changed

CHANGLOG.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
### NEW
2+
3+
1. Add Mini-CPM V2.6 Support.
4+
2. Add Florence2 Support.
5+
6+
### CHANGE
7+
8+
1. GUI using Gradio 5 now.
9+
2. Now LLM will use own default generate params while `--llm_temperature` and `llm_max_tokens` are 0.
10+
11+
### BUG FIX
12+
13+
1. Fix minor bugs.

README.md

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
A Python base cli tool for caption images
44
with [WD series](https://huggingface.co/SmilingWolf), [joy-caption-pre-alpha](https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha), [LLama3.2 Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct),
5-
[Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
6-
and [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6) models.
5+
[Qwen2 VL Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), [Mini-CPM V2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6)
6+
and [Florence-2](https://huggingface.co/microsoft/Florence-2-large)models.
77

88
## Introduce
99

@@ -12,6 +12,9 @@ This tool can make a caption with danbooru style tags or a nature language descr
1212

1313
### New Changes:
1414

15+
#### 2024.10.13: Add Florence2 Support. Now LLM will use own default generate params while `--llm_temperature` and
16+
`--llm_max_tokens` are 0.
17+
1518
#### 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
1619

1720
#### 2024.10.09: Build in wheel, now you install this repo from pypi.
@@ -41,7 +44,9 @@ wd-llm-caption-gui
4144

4245
<img alt="DEMO_her.jpg" src="DEMO/DEMO_her.jpg" width="300" height="400"/>
4346

44-
#### WD Caption
47+
### Standalone Inference
48+
49+
#### WD Tags
4550

4651
Use wd-eva02-large-tagger-v3
4752

@@ -87,6 +92,16 @@ Default Mini-CPM V2.6 7B, no quantization
8792
The image depicts a humanoid robot with a human-like appearance, standing on a balcony railing at night. The robot has a sleek, white and black body with visible mechanical joints and components, suggesting advanced technology. Its pose is confident, with one hand resting on the railing and the other hanging by its side. The robot has long, straight, platinum blonde hair that falls over its shoulders. The background features a cityscape with illuminated buildings and a prominent tower, suggesting an urban setting. The lighting is dramatic, highlighting the robot against the darker backdrop of the night sky. The overall atmosphere is one of futuristic sophistication.
8893
```
8994

95+
#### Florence 2 large
96+
97+
Default Florence 2 large, no quantization
98+
99+
```text
100+
The image is a promotional poster for an AIGC work by DukeG. It features a young woman with long blonde hair, standing on a rooftop with a city skyline in the background. She is wearing a futuristic-looking outfit with a white and black color scheme. The outfit has a high neckline and long sleeves, and the woman is posing with one hand on her hip and the other resting on the railing. The text on the poster reads "Publish on 2024.07.30" and "Generated by Stable Diffusion" with the text "Tuned by Adobe Photoshop".
101+
```
102+
103+
### WD+LLM Inference
104+
90105
#### Joy Caption with WD
91106

92107
Use wd-eva02-large-tagger-v3 and LLama3.1 8B, no quantization.
@@ -180,6 +195,15 @@ place).
180195
|:-------------:|:------------------------------------------------------------:|:--------------------------------------------------------------------:|
181196
| MiniCPM-V-2_6 | [Hugging Face](https://huggingface.co/openbmb/MiniCPM-V-2_6) | [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6) |
182197

198+
### Florence-2 models
199+
200+
| Model | Hugging Face Link | ModelScope Link |
201+
|:-------------------:|:--------------------------------------------------------------------:|:---------------------------------------------------------------------------------:|
202+
| Florence-2-large | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large) |
203+
| Florence-2-base | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base) |
204+
| Florence-2-large-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-large-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-large-ft ) |
205+
| Florence-2-base-ft | [Hugging Face](https://huggingface.co/microsoft/Florence-2-base-ft) | [ModelScope](https://www.modelscope.cn/models/AI-ModelScope/Florence-2-base-ft) |
206+
183207
## Installation
184208

185209
Python 3.10 works fine.
@@ -437,7 +461,7 @@ e.g., `character_name_(series)` will be expanded to `character_name, series`.
437461

438462
`--llm_choice`
439463

440-
select llm models[`joy`, `llama`, `qwen`, `minicpm`], default is `llama`.
464+
select llm models[`joy`, `llama`, `qwen`, `minicpm`, `florence`], default is `llama`.
441465

442466
`--llm_config`
443467

@@ -481,11 +505,11 @@ user prompt for caption.
481505

482506
`--llm_temperature`
483507

484-
temperature for joy LLM model, default is `0.5`.
508+
temperature for LLM model, default is `0`,means use llm own default value.
485509

486510
`--llm_max_tokens`
487511

488-
max tokens for joy LLM model output, default is `300`.
512+
max tokens for LLM model output, default is `0`, means use llm own default value.
489513

490514
</details>
491515

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
v0.1.2-alpha
1+
v0.1.3-alpha

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,13 +50,13 @@ dynamic = ["version"]
5050
authors = [
5151
{ name = "DukeG", email = "fireicewolf@gmail.com" },
5252
]
53-
description = "A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct and Mini-CPM V2.6 models."
53+
description = "A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha, meta Llama 3.2 Vision Instruct, Qwen2 VL Instruct, Mini-CPM V2.6 and Florence-2 models."
5454
readme = "README.md"
55-
keywords = ["image-caption", "WD", "Llama 3.2 Vision Instruct", "Qwen2 VL Instruct", "Mini-CPM V2.6", "Joy Caption Alpha"]
55+
keywords = ["Image Caption", "WD", "Llama 3.2 Vision Instruct", "Joy Caption Alpha", "Qwen2 VL Instruct", "Mini-CPM V2.6", "Florence-2"]
5656
license = { file = 'LICENSE' }
5757
requires-python = ">=3.10"
5858
classifiers = [
59-
"Development Status :: 5 - Production/Stable",
59+
"Development Status :: 3 - Alpha",
6060
"Intended Audience :: Developers",
6161
"Intended Audience :: Science/Research",
6262
"License :: OSI Approved :: Apache Software License",

wd_llm_caption/caption.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -377,14 +377,16 @@ def run_inference(
377377
pbar.set_description('Processing with Qwen model...')
378378
elif self.use_minicpm:
379379
pbar.set_description('Processing with Mini-CPM model...')
380+
elif self.use_florence:
381+
pbar.set_description('Processing with Florence model...')
380382
self.my_llm.inference()
381383
pbar.update(1)
382384

383385
pbar.close()
384386
else:
385387
if self.use_wd:
386388
self.my_tagger.inference()
387-
elif self.use_joy or self.use_llama or self.use_qwen:
389+
elif self.use_joy or self.use_llama or self.use_qwen or self.use_minicpm or self.use_florence:
388390
self.my_llm.inference()
389391

390392
total_inference_time = time.monotonic() - start_inference_time
@@ -695,14 +697,14 @@ def setup_args() -> argparse.Namespace:
695697
llm_args.add_argument(
696698
'--llm_temperature',
697699
type=float,
698-
default=0.5,
699-
help='temperature for LLM model, default is `0.5`.'
700+
default=0,
701+
help='temperature for LLM model, default is `0`,means use llm own default value.'
700702
)
701703
llm_args.add_argument(
702704
'--llm_max_tokens',
703705
type=int,
704-
default=300,
705-
help='max tokens for LLM model output, default is `300`.'
706+
default=0,
707+
help='max tokens for LLM model output, default is `0`, means use llm own default value.'
706708
)
707709

708710
gradio_args = args.add_argument_group("Gradio dummy args, no effects")

wd_llm_caption/gui.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ def gui_setup_args():
4141
parser.add_argument('--inbrowser', action='store_true', help="auto open in browser")
4242
parser.add_argument('--log_level', type=str, choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL'],
4343
default='INFO', help="set log level, default is `INFO`")
44+
parser.add_argument('--models_save_path', type=str, default=caption.DEFAULT_MODELS_SAVE_PATH,
45+
help='path to save models, default is `models`.')
4446

4547
return parser.parse_args()
4648

@@ -165,9 +167,9 @@ def llm_choice_visibility(caption_method_radio):
165167
value=caption.DEFAULT_USER_PROMPT_WITH_WD)
166168

167169
llm_temperature = gr.Slider(label="temperature for LLM model",
168-
minimum=0.1, maximum=1.0, value=0.5, step=0.1)
170+
minimum=0, maximum=1.0, value=0, step=0.1)
169171
llm_max_tokens = gr.Slider(label="max token for LLM model",
170-
minimum=1, maximum=1024, value=300, step=1)
172+
minimum=0, maximum=2048, value=0, step=1)
171173

172174
with gr.Group():
173175
gr.Markdown("<center>Common Settings</center>")
@@ -422,6 +424,7 @@ def caption_models_load(
422424
os.environ["HF_TOKEN"] = str(huggingface_token_value)
423425

424426
get_gradio_args = gui_setup_args()
427+
args.models_save_path = str(get_gradio_args.models_save_path)
425428
args.log_level = str(get_gradio_args.log_level)
426429
args.caption_method = str(caption_method_value).lower()
427430
args.llm_choice = str(llm_choice_value).lower()
@@ -450,7 +453,6 @@ def caption_models_load(
450453
CAPTION_FN.set_logger(args)
451454

452455
caption_init = CAPTION_FN
453-
454456
args.wd_force_use_cpu = bool(wd_force_use_cpu_value)
455457

456458
args.llm_use_cpu = bool(llm_use_cpu_value)

wd_llm_caption/utils/inference.py

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -298,8 +298,8 @@ def get_caption(
298298
image: Image.Image,
299299
system_prompt: str,
300300
user_prompt: str,
301-
temperature: float = 0.5,
302-
max_new_tokens: int = 300,
301+
temperature: float = 0,
302+
max_new_tokens: int = 0,
303303
) -> str:
304304
# Import torch
305305
try:
@@ -355,8 +355,16 @@ def get_caption(
355355
], dim=1).to(device)
356356
attention_mask = torch.ones_like(input_ids)
357357
# Generate caption
358-
self.logger.debug(f'LLM temperature is {temperature}')
359-
self.logger.debug(f'LLM max_new_tokens is {max_new_tokens}')
358+
if temperature == 0:
359+
temperature = 0.5
360+
self.logger.warning(f'LLM temperature not set, using default value {temperature}')
361+
else:
362+
self.logger.debug(f'LLM temperature is {temperature}')
363+
if max_new_tokens == 0:
364+
max_new_tokens = 300
365+
self.logger.warning(f'LLM max_new_tokens not set, using default value {max_new_tokens}')
366+
else:
367+
self.logger.debug(f'LLM max_new_tokens is {max_new_tokens}')
360368
generate_ids = self.llm.generate(input_ids,
361369
inputs_embeds=inputs_embeds,
362370
attention_mask=attention_mask,
@@ -378,9 +386,37 @@ def get_caption(
378386
self.logger.debug(f'Using system prompt:{system_prompt}')
379387
self.logger.debug(f'Using user prompt:{user_prompt}')
380388
messages = [{'role': 'user', 'content': [image, f'{user_prompt}']}]
389+
if temperature == 0 and max_new_tokens == 0:
390+
max_new_tokens = 2048
391+
self.logger.warning(f'LLM temperature and max_new_tokens not set, only '
392+
f'using default max_new_tokens value {max_new_tokens}')
393+
params = {
394+
'num_beams': 3,
395+
'repetition_penalty': 1.2,
396+
"max_new_tokens": max_new_tokens
397+
}
398+
else:
399+
if temperature == 0:
400+
temperature = 0.7
401+
self.logger.warning(f'LLM temperature not set, using default value {temperature}')
402+
else:
403+
self.logger.debug(f'LLM temperature is {temperature}')
404+
if max_new_tokens == 0:
405+
max_new_tokens = 2048
406+
self.logger.warning(f'LLM max_new_tokens not set, using default value {max_new_tokens}')
407+
else:
408+
self.logger.debug(f'LLM max_new_tokens is {max_new_tokens}')
409+
params = {
410+
'top_p': 0.8,
411+
'top_k': 100,
412+
'temperature': temperature,
413+
'repetition_penalty': 1.05,
414+
"max_new_tokens": max_new_tokens
415+
}
416+
params["max_inp_length"] = 4352
381417
content = self.llm.chat(image=image, msgs=messages, tokenizer=self.llm_tokenizer,
382418
system_prompt=system_prompt if system_prompt else None,
383-
sampling=False, stream=False)
419+
sampling=False, stream=False, **params)
384420
elif self.models_type == "florence":
385421
self.logger.warning(f"Florence models don't support system prompt or user prompt!")
386422
self.logger.warning(f"Florence models don't support temperature or max tokens!")
@@ -433,10 +469,29 @@ def run_inference(task_prompt, text_input=None):
433469
# Generate caption
434470
self.logger.debug(f'LLM temperature is {temperature}')
435471
self.logger.debug(f'LLM max_new_tokens is {max_new_tokens}')
436-
output = self.llm.generate(**inputs,
437-
max_new_tokens=max_new_tokens,
438-
do_sample=True, top_k=10,
439-
temperature=temperature)
472+
if temperature == 0 and max_new_tokens == 0:
473+
max_new_tokens = 300
474+
self.logger.warning(f'LLM temperature and max_new_tokens not set, only '
475+
f'using default max_new_tokens value {max_new_tokens}')
476+
params = {}
477+
else:
478+
if temperature == 0:
479+
temperature = 0.5
480+
self.logger.warning(f'LLM temperature not set, using default value {temperature}')
481+
else:
482+
self.logger.debug(f'LLM temperature is {temperature}')
483+
if max_new_tokens == 0:
484+
max_new_tokens = 300
485+
self.logger.warning(f'LLM max_new_tokens not set, using default value {max_new_tokens}')
486+
else:
487+
self.logger.debug(f'LLM max_new_tokens is {max_new_tokens}')
488+
params = {
489+
'do_sample': True,
490+
'top_k': 10,
491+
'temperature': temperature,
492+
}
493+
494+
output = self.llm.generate(**inputs, max_new_tokens=max_new_tokens, **params)
440495
content = self.llm_processor.decode(output[0][inputs["input_ids"].shape[-1]:],
441496
skip_special_tokens=True, clean_up_tokenization_spaces=True)
442497

0 commit comments

Comments
 (0)