-
Notifications
You must be signed in to change notification settings - Fork 265
Description
The issue is that every time I generate audio using the Chatterbox extension, it works fine the first time. However, after that, it stops functioning.
For example:
Clicking the “Generate” button does nothing—neither in the UI nor in the console log.
or
Refreshing the browser results in no UI appearing.
{'text': 'Chatterbox is an expressive text-to-speech model with reference audio support for voice cloning.', 'exaggeration': 0.5, 'cfg_weight': 0.5, 'temperature': 0.8, 'audio_prompt_path': None, 'seed': '4030282067', 'device': 'auto', 'dtype': 'float32', 'model_name': 'just_a_placeholder', 'chunked': False, 'cpu_offload': False, 'cache_voice': False, 'desired_length': 200, 'max_length': 300, 'halve_first_chunk': False, 'initial_forward_pass_backend': 'eager', 'generate_token_backend': 'cudagraphs-manual', 'max_new_tokens': 1000, 'max_cache_len': 1500}
Generating: '''Chatterbox is an expressive text-to-speech model w...'''
Chatterbox(
exaggeration=0.5,
cfg_weight=0.5,
temperature=0.8,
audio_prompt_path=None,
seed='4030282067',
device='auto',
dtype='float32',
model_name='just_a_placeholder',
chunked=False,
cpu_offload=False,
cache_voice=False,
desired_length=200,
max_length=300,
halve_first_chunk=False,
initial_forward_pass_backend='eager',
generate_token_backend='cudagraphs-manual',
max_new_tokens=1000,
max_cache_len=1500
)
Using device: cuda
Loading model 'Chatterbox on cuda with torch.float32'...
G:\Project\Python\webuitts\installer_files\env\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: LoRACompatibleLinear
is deprecated and will be removed in version 1.0.0. Use of LoRACompatibleLinear
is deprecated. Please switch to PEFT backend by installing PEFT: pip install peft
.
deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
loaded PerthNet (Implicit) at step 250,000
Moving model to cuda, torch.float32
Generating chunk: Chatterbox is an expressive text-to-speech model with reference audio support for voice cloning.
Estimated token count: 130
Input embeds shape before padding: torch.Size([2, 101, 1024])
Sampling: 0%| | 0/1000 [00:00<?, ?it/s]Capturing CUDA graph for bucket 250 (max_position: 250)
Sampling: 14%|█████████▋ | 140/1000 [00:03<00:19, 44.38it/s]
Generated in 31.108 seconds
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.wav
Saving metadata to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.json
Average execution time: 31.129
Saving waveform plot to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.png
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.ogg
Saved generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.ogg
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.flac
Saved generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.flac