Can someone please give more details about : "AutoGPTQForCausalLM compatible model path from Hugging Face (example provided)" #139

bert269 · 2023-11-02T14:05:10Z

bert269
Nov 2, 2023

When I update custom_setting.py file (copied from settings.py) with:
'''# Path to chatbot model - download from HuggingFace at runtime by default (gets cached)
chatbot_model_path: str = 'TheBloke/vicuna-7b-v1.5-GPTQ'

And support_chatbot: bool = True and restarted WIS, I noticed these messages:
'''...
willow-inference-server-wis-1 | [2023-11-02 13:49:52 +0000] [99] [INFO] CUDA: Detected 1 device(s)
willow-inference-server-wis-1 | [2023-11-02 13:49:52 +0000] [99] [INFO] CUDA: Device 0 name: NVIDIA GeForce GTX 1080 Ti
willow-inference-server-wis-1 | [2023-11-02 13:49:52 +0000] [99] [INFO] CUDA: Device 0 capability: 61
willow-inference-server-wis-1 | [2023-11-02 13:49:53 +0000] [99] [INFO] CUDA: Device 0 total memory: 11711873024 bytes
willow-inference-server-wis-1 | [2023-11-02 13:49:53 +0000] [99] [INFO] CUDA: Device 0 free memory: 11214716928 bytes
willow-inference-server-wis-1 | [2023-11-02 13:49:53 +0000] [99] [WARNING] CUDA: Device 0 is pre-Volta, forcing int8
willow-inference-server-wis-1 | [2023-11-02 13:49:53 +0000] [99] [WARNING] CUDA: Device 0 is pre-Volta, disabling chatbot

Does this mean the GTX 1080ti is not compatible to do chatbot functions?

bert269 · 2023-11-02T14:08:19Z

bert269
Nov 2, 2023
Author

I know that 1080ti was Pascal, but are there any chat models that will run on Pascal - or is Volta (and later) a requirement?

5 replies

nikito Nov 2, 2023

In WIS the card has to be Volta or later. That said, we are planning to remove Chatbot support from WIS since there are many implementations out there that handle chatbot on various types of hardware including Pascal. One such implementation I personally use is OobaBooga. :)

mikey60 Nov 4, 2023

Does it degrade the performance of WIS to leave the Chatbot support in WIS? If a person has a RTX 3090 or RTX 4090 graphics card I would think it would be nice to also run a 13B LLM model that works with the Willow inference and TTS engines. I am still trying to learn more on how all this works so maybe my question doesn't make any sense.

ChaiAK Nov 5, 2023

Understandable but a bit unfortunate, I know your focus is on the willow box and high quality voice processing, but I haven't seen an all-in-one implementation for device->ASR->LLM like the chain of function future idea. Though I guess it's reasonable enough to just use the ooba api for that.

nikito Nov 5, 2023

I'll let @kristiankielhofner chime in here, but the future idea isn't gone, rather we would be utilizing an external engine/API (such as chatgpt or the APIs exposed by projects such as OobaBooga, KoboldAI, etc) that are fine tuned and focused on this specific area. 🙂

kristiankielhofner Nov 6, 2023
Maintainer

We now have (currently experimental) support in Willow Application Server for "command endpoint mode". This mode inserts WAS in the transcription input and handling for the configured command endpoint today.

What this enables us to do in the near future is use WAS for any number of supported applications like local LLMs, OpenAI Chatgpt, and nearly anything else that operates on text to apply additional processing, functionality, etc.

So a WAS chain potentially looks like (based on supported apps and user configuration):

Wake -> Stream audio to WIS -> Get transcript -> device sends transcript to WAS -> WAS does any number of things supported by installed+configured apps/command endpoints/etc -> output to Willow device.

In the case of locally hosted LLMs, the space moves so quickly we would need to take on the additional work (and burden) of all of the functionality implemented by fantastic projects like OobaBooga, KoboldAI, VLLM, etc. These projects have laser-focus on LLM functionality, and this enables us to have laser-focus on speech and assistant tasks. With the approach I've described we can focus development efforts solely on Willow-specific functionality and easily and automatically leverage any number of these projects via simple abstracted API calls in WAS.

bert269 · 2023-11-02T15:57:06Z

bert269
Nov 2, 2023
Author

Can you please provide me with a little more info regarding OobaBooga? How to set it up, etc?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can someone please give more details about : "AutoGPTQForCausalLM compatible model path from Hugging Face (example provided)" #139

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Can someone please give more details about : "AutoGPTQForCausalLM compatible model path from Hugging Face (example provided)" #139

Uh oh!

bert269 Nov 2, 2023

Replies: 2 comments · 5 replies

Uh oh!

Uh oh!

bert269 Nov 2, 2023 Author

Uh oh!

nikito Nov 2, 2023

Uh oh!

Uh oh!

mikey60 Nov 4, 2023

Uh oh!

ChaiAK Nov 5, 2023

Uh oh!

nikito Nov 5, 2023

Uh oh!

kristiankielhofner Nov 6, 2023 Maintainer

Uh oh!

bert269 Nov 2, 2023 Author

bert269
Nov 2, 2023

Replies: 2 comments 5 replies

bert269
Nov 2, 2023
Author

kristiankielhofner Nov 6, 2023
Maintainer

bert269
Nov 2, 2023
Author