-
-
Notifications
You must be signed in to change notification settings - Fork 321
Open
Labels
bugSomething isn't workingSomething isn't working
Description
OS
Windows
GPU Library
CUDA 12.x
Python version
3.12
Pytorch version
2.6.0+cu124
Model
https://huggingface.co/CAPsMANyo/Qwen3-32B_exl2/tree/4.25
Describe the bug
When trying to load the new Qwen3 32B model, it loads then immediately crashes with the given error.
Reproduction steps
Just by running start.sh
Expected behavior
It was expected to load successfully
Logs
Activating venv
pip 24.3.1 from /home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/pip (python 3.12)
Loaded your saved preferences from `start_options.json`
Starting TabbyAPI...
2025-04-30 13:18:44.754 INFO: ExllamaV2 version: 0.2.8
2025-04-30 13:18:44.786 WARNING: Disabling authentication makes your instance vulnerable. Set the `disable_auth` flag to False in config.yml if you want to share
this instance with others.
2025-04-30 13:18:44.787 INFO: Generation logging is enabled for: prompts
2025-04-30 13:18:44.817 WARNING: Draft model is disabled because a model name wasn't provided. Please check your config.yml!
2025-04-30 13:18:44.818 WARNING: An unsupported GPU is found in this configuration. Switching to compatibility mode.
2025-04-30 13:18:44.818 WARNING: This disables parallel batching and features that rely on it (ex. CFG).
2025-04-30 13:18:44.818 WARNING: To disable compatability mode, all GPUs must be ampere (30 series) or newer. AMD GPUs are not supported.
2025-04-30 13:18:44.819 INFO: Attempting to load a prompt template if present.
2025-04-30 13:18:44.820 WARNING: TemplateLoadError: Model JSON path "/home/thomas/text-generation-webui/models/Qwen3-32B-exl/chat_template.json" not found.
2025-04-30 13:18:44.842 INFO: Using template "from_tokenizer_config" for chat completions.
2025-04-30 13:18:45.388 INFO: Loading model: /home/thomas/text-generation-webui/models/Qwen3-32B-exl
2025-04-30 13:18:45.388 INFO: Loading with tensor parallel
Loading model modules ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 131/131 0:00:00
Traceback (most recent call last):
File "/home/thomas/tabbyAPI_NEW/start.py", line 291, in <module>
entrypoint(args, parser)
File "/home/thomas/tabbyAPI_NEW/main.py", line 166, in entrypoint
asyncio.run(entrypoint_async())
File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/thomas/tabbyAPI_NEW/main.py", line 71, in entrypoint_async
await model.load_model(
File "/home/thomas/tabbyAPI_NEW/common/model.py", line 112, in load_model
async for _ in load_model_gen(model_path, **kwargs):
File "/home/thomas/tabbyAPI_NEW/common/model.py", line 90, in load_model_gen
async for module, modules in load_status:
File "/home/thomas/tabbyAPI_NEW/backends/exllamav2/model.py", line 570, in load_gen
async for value in iterate_in_threadpool(model_load_generator):
File "/home/thomas/tabbyAPI_NEW/common/concurrency.py", line 30, in iterate_in_threadpool
yield await asyncio.to_thread(gen_next, generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/thomas/.local/share/uv/python/cpython-3.12.9-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/thomas/tabbyAPI_NEW/common/concurrency.py", line 20, in gen_next
return next(generator)
^^^^^^^^^^^^^^^
File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 57, in generator_context
response = gen.send(request)
^^^^^^^^^^^^^^^^^
File "/home/thomas/tabbyAPI_NEW/backends/exllamav2/model.py", line 731, in load_model_sync
self.model.forward(input_ids, cache=self.cache, preprocess_only=True)
File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/exllamav2/model.py", line 900, in forward
r = self.forward_chunk(
^^^^^^^^^^^^^^^^^^^
File "/home/thomas/tabbyAPI_NEW/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I understand the developers of this program are human, and I will ask my questions politely.
- I understand that the developers have lives and my issue will be answered when possible.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working