Skip to content

[BUG] Runtime error when trying to load Qwen3 32B #784

@umar-mq

Description

@umar-mq

OS

Windows

GPU Library

CUDA 12.x

Python version

3.12

Pytorch version

2.6.0+cu124

Model

https://huggingface.co/CAPsMANyo/Qwen3-32B_exl2/tree/4.25

Describe the bug

When trying to load the new Qwen3 32B model, it loads then immediately crashes with the given error.

Reproduction steps

Just by running start.sh

Expected behavior

It was expected to load successfully

Logs

Activating venv
pip 24.3.1 from /home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/pip (python 3.12)
Loaded your saved preferences from `start_options.json`
Starting TabbyAPI...
2025-04-30 13:18:44.754 INFO:     ExllamaV2 version: 0.2.8
2025-04-30 13:18:44.786 WARNING:  Disabling authentication makes your instance vulnerable. Set the `disable_auth` flag to False in config.yml if you want to share
this instance with others.
2025-04-30 13:18:44.787 INFO:     Generation logging is enabled for: prompts
2025-04-30 13:18:44.817 WARNING:  Draft model is disabled because a model name wasn't provided. Please check your config.yml!
2025-04-30 13:18:44.818 WARNING:  An unsupported GPU is found in this configuration. Switching to compatibility mode.
2025-04-30 13:18:44.818 WARNING:  This disables parallel batching and features that rely on it (ex. CFG).
2025-04-30 13:18:44.818 WARNING:  To disable compatability mode, all GPUs must be ampere (30 series) or newer. AMD GPUs are not supported.
2025-04-30 13:18:44.819 INFO:     Attempting to load a prompt template if present.
2025-04-30 13:18:44.820 WARNING:  TemplateLoadError: Model JSON path "/home/thomas/text-generation-webui/models/Qwen3-32B-exl/chat_template.json" not found.
2025-04-30 13:18:44.842 INFO:     Using template "from_tokenizer_config" for chat completions.
2025-04-30 13:18:45.388 INFO:     Loading model: /home/thomas/text-generation-webui/models/Qwen3-32B-exl
2025-04-30 13:18:45.388 INFO:     Loading with tensor parallel
Loading model modules ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 131/131 0:00:00
Traceback (most recent call last):
  File "/home/thomas/tabbyAPI_NEW/start.py", line 291, in <module>
    entrypoint(args, parser)
  File "/home/thomas/tabbyAPI_NEW/main.py", line 166, in entrypoint
    asyncio.run(entrypoint_async())
  File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/thomas/tabbyAPI_NEW/main.py", line 71, in entrypoint_async
    await model.load_model(
  File "/home/thomas/tabbyAPI_NEW/common/model.py", line 112, in load_model
    async for _ in load_model_gen(model_path, **kwargs):
  File "/home/thomas/tabbyAPI_NEW/common/model.py", line 90, in load_model_gen
    async for module, modules in load_status:
  File "/home/thomas/tabbyAPI_NEW/backends/exllamav2/model.py", line 570, in load_gen
    async for value in iterate_in_threadpool(model_load_generator):
  File "/home/thomas/tabbyAPI_NEW/common/concurrency.py", line 30, in iterate_in_threadpool
    yield await asyncio.to_thread(gen_next, generator)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/tabbyAPI_NEW/common/concurrency.py", line 20, in gen_next
    return next(generator)
           ^^^^^^^^^^^^^^^
  File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 57, in generator_context
    response = gen.send(request)
               ^^^^^^^^^^^^^^^^^
  File "/home/thomas/tabbyAPI_NEW/backends/exllamav2/model.py", line 731, in load_model_sync
    self.model.forward(input_ids, cache=self.cache, preprocess_only=True)
  File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/exllamav2/model.py", line 900, in forward
    r = self.forward_chunk(
        ^^^^^^^^^^^^^^^^^^^
  File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^

Additional context

No response

Acknowledgements

  • I have looked for similar issues before submitting this one.
  • I understand the developers of this program are human, and I will ask my questions politely.
  • I understand that the developers have lives and my issue will be answered when possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions