I think chatterbox in tts crash after first generation

The issue is that every time I generate audio using the Chatterbox extension, it works fine the first time. However, after that, it stops functioning.

For example:
Clicking the “Generate” button does nothing—neither in the UI nor in the console log.
or
Refreshing the browser results in no UI appearing.




{'text': 'Chatterbox is an expressive text-to-speech model with reference audio support for voice cloning.', 'exaggeration': 0.5, 'cfg_weight': 0.5, 'temperature': 0.8, 'audio_prompt_path': None, 'seed': '4030282067', 'device': 'auto', 'dtype': 'float32', 'model_name': 'just_a_placeholder', 'chunked': False, 'cpu_offload': False, 'cache_voice': False, 'desired_length': 200, 'max_length': 300, 'halve_first_chunk': False, 'initial_forward_pass_backend': 'eager', 'generate_token_backend': 'cudagraphs-manual', 'max_new_tokens': 1000, 'max_cache_len': 1500}
Generating: '''Chatterbox is an expressive text-to-speech model w...'''
Chatterbox(
    exaggeration=0.5,
    cfg_weight=0.5,
    temperature=0.8,
    audio_prompt_path=None,
    seed='4030282067',
    device='auto',
    dtype='float32',
    model_name='just_a_placeholder',
    chunked=False,
    cpu_offload=False,
    cache_voice=False,
    desired_length=200,
    max_length=300,
    halve_first_chunk=False,
    initial_forward_pass_backend='eager',
    generate_token_backend='cudagraphs-manual',
    max_new_tokens=1000,
    max_cache_len=1500
)
Using device: cuda
Loading model 'Chatterbox on cuda with torch.float32'...
G:\Project\Python\webuitts\installer_files\env\lib\site-packages\diffusers\models\lora.py:393: FutureWarning: `LoRACompatibleLinear` is deprecated and will be removed in version 1.0.0. Use of `LoRACompatibleLinear` is deprecated. Please switch to PEFT backend by installing PEFT: `pip install peft`.
  deprecate("LoRACompatibleLinear", "1.0.0", deprecation_message)
loaded PerthNet (Implicit) at step 250,000
Moving model to cuda, torch.float32
Generating chunk: Chatterbox is an expressive text-to-speech model with reference audio support for voice cloning.
Estimated token count: 130
Input embeds shape before padding: torch.Size([2, 101, 1024])
Sampling:   0%|                                                                               | 0/1000 [00:00<?, ?it/s]Capturing CUDA graph for bucket 250 (max_position: 250)
Sampling:  14%|█████████▋                                                           | 140/1000 [00:03<00:19, 44.38it/s]
Generated in 31.108 seconds
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.wav
Saving metadata to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.json
Average execution time: 31.129
Saving waveform plot to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.png
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.ogg
Saved generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.ogg
Saving generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.flac
Saved generation to outputs\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a\2025-08-29_05-27-36__chatterbox__Chatterbox_is_a.flac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I think chatterbox in tts crash after first generation #558

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

I think chatterbox in tts crash after first generation #558

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions