Some Questions before getting started #68

lordratner · 2023-05-26T03:50:30Z

lordratner
May 26, 2023

Hi! First off, thanks for what looks like an incredible tool. Seriously, this is truly impressive work. I have a few questions I was hoping you could answer before I dive in.

When will you make WIS compatible with the Coral AI chi--- Just kidding 🤣
I'm deciding on what GPU I want to use. I have several other applications that will benefit from a GPU, like Jellyfin and Frigate NVR, but I'm also interested in other AI applications such as stable diffusion. The RTX3090 has 24GB of RAM. Is it feasible to run WIS and other GPU-using services at the same time? I have no idea how something like WIS uses a GPU. Will the computer allocate RAM based on what application is using it at the time? Will they block each other? I don't think I would get a 3090 solely for WIS, since it works on a simple 1070, but if I can use the 3090 for all my GPU applications, then it makes more sense.
Looking only at WIS, why might I want a 3090 over a 1070? I get that the inference time will be better, but is that the only advantage? What do I get for the extra $700? I see that it supports LLM, but what is the end-user difference?
Wake-words. I found your comments on Reddit about this, and I get why fully-custom wake words are a bad choice, but I beg you, offer a broad selection of functional wake-word name options. I absolutely deplore saying "Hey Google" and sadly Hi ESP isn't much better. Alexa is a name, and it just feels right. Could we have a selection of like a dozen names that work well as wake words to choose from? I do not know what goes into selecting them, and I would be happy to help in whatever why I can, but the worst part of smart home devices right now is saying things like "Hi ESP" is just weird and unnatural. Personal assistants have names, and as much as Alexa works, it seems like a shame to have the only good option be a clone of Amazon.

Thanks again for such a cool project. I look forward to bugging you with more questions as I get used to it!

Seth

Answered by kristiankielhofner

May 26, 2023

First of all - I'm cracking up at the Coral support! Good one!

GPUs have schedulers and memory allocation much like CPU does - they just don't have the concept of swap space (there are some approaches to shuffle memory between VRAM and RAM but let's ignore that for now). As long as everything can fit in VRAM you can have any number of applications share the same GPU and it will figure out the scheduling just like a CPU. Pascal series and 8GB VRAM is what I'm considering to be the "five year minimum" supported GPU for Willow. This is why I generally recommend the GTX 1070/Tesla P4.

However, of course an RTX 3090 is better in every way. First, it's significantly faster (of course). You can …

View full answer

kristiankielhofner · 2023-05-26T11:53:55Z

kristiankielhofner
May 26, 2023
Maintainer

First of all - I'm cracking up at the Coral support! Good one!

GPUs have schedulers and memory allocation much like CPU does - they just don't have the concept of swap space (there are some approaches to shuffle memory between VRAM and RAM but let's ignore that for now). As long as everything can fit in VRAM you can have any number of applications share the same GPU and it will figure out the scheduling just like a CPU. Pascal series and 8GB VRAM is what I'm considering to be the "five year minimum" supported GPU for Willow. This is why I generally recommend the GTX 1070/Tesla P4.

However, of course an RTX 3090 is better in every way. First, it's significantly faster (of course). You can view the benchmarks in the README to get an idea of how much faster. Secondly, here's the output from nvidia-smi on one of my RTX 3090 WIS instances with WIS running:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 30%   29C    P8    24W / 350W |   4707MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      7360      C   gunicorn: worker [main:app]      4704MiB |
+-----------------------------------------------------------------------------+

So you can see that WIS (with all three Whisper models and TTS loaded) consumes 4.7GB of memory. We're going to optimize the TTS models to bring that down and allow more granular model loading but this is why I suggest GPUs with 8GB VRAM - plenty of room to grow. In terms of the RTX 3090, you can easily run Stable Diffusion (approximately 4GB VRAM on my hosted instance) and a bunch of other CUDA applications. In fact, with 24GB VRAM you can run WIS, Vicuna 13B, and Stable Diffusion concurrently on an RTX 3090 (as one example of using VRAM).

WIS is an example of a perfect application for sharing a GPU - wake word speech recognition is a 100x a day task (or whatever) and the GPU is idle otherwise. Additionally, on something like an RTX 3090 most command-based speech recognition sessions are so fast and use so little data we barely hit something like 30% GPU utilization for a couple hundred milliseconds on an RTX 3090.

Wake words... I'm learning that this is an area where it will be impossible to make everyone happy. Training a real wake word that actually works is an expensive (at least tens of thousands of dollars) and time consuming task. No matter how many times I try to explain it a portion of the open source community simply refuses to understand the fundamental, practical issues related to wake word. They are free to try to put together a Raspberry Pi and one of the open source "wake engines" that "support" custom wake words. They can experience for themselves how atrocious the experience is. It simply does not work and I highly recommend they attempt it as I and many others have over the past decade. I guarantee the vast majority will be thrilled to come back to Willow and just get over saying one of our supported wake words.

My plan is to start Kickstarters for the top X of wake words suggested/voted by the community. The ones that raise enough to cover the process will go into Willow. Basically, if you want one of the candidate wake words put your money where your mouth is. Very apropos expression for a speech related project. That said, "Hi Willow"/"Hey Willow" will almost certainly be included in the candidates. However, another thing they also seem to refuse to understand. I get suggestions for "Computer" a-la Star Trek. That is a firm NO. The false activations will be off the charts. They like to suggest words with 1-2 syllables. That is also a firm NO (commercial standard is three syllable minimum).

I would really hope certain elements of the open source community could try to understand that this is now a 10 year old field that has demonstrated real world success and feasibility. We know what we're doing by now yet, somehow, they seem insistent on making the same mistakes we've already learned lessons from over the past decade. I am not going to let Willow go to the graveyard of failed speech projects that refuse to learn from these past mistakes. It's time to accept reality because that graveyard is full.

Thank you and you're welcome!

Kristian

4 replies

lordratner May 26, 2023
Author

Thanks for the quick reply!

One more quick question on the video card, since I don't know how the VRAM scheduler works. If a bunch of things are competing for the VRAM, would WIS just getting knocked out of ram at some point? And if so, would it crash? Wait until being called for and reload the models into VRAM? If it's the latter, how long does that take and would it have a dramatic effect on inference time?

And for someone with more limited knowledge, you mentioned that using a 3090 allows for Vicuna 13B. Ok, but what does that actually give me? If money wasn't a factor, what capabilities am I gaining with LLM/Vicuna/13B?

"My plan is to start Kickstarters for the top X of wake words suggested/voted by the community."

This is a fantastic idea! Like I said, I found some of your previous explanations for why fully-custom wake words is simply a horrible idea. I get it, and I agree. I only want to stress the huge difference between a name, like Alexa, and anything that requires a repetitive salutation (hey Google, ok Google, hi willow). I'm a life long tech addict, and it still seems odd to me. My parents find it ridiculous. The whole AI thing is about moving human output (at least as far as the public is concerned), so I think human names are going to be better for adoption.

Count me in for the first $100 towards the Kickstarter. In the mean time Alexa is still a good option.

Also, might I recommend putting together your explanation for this topic and adding it (via a link maybe) to the readme? At least that way you don't have to type it out a million more times. The question isn't going away, since the wake words are fundamentally obnoxious (except Alexa) and it's not intuitive to a regular person why it can't be better.

kristiankielhofner May 26, 2023
Maintainer

It works very much like CPU without swap space.

Tasks request memory and get assigned memory. If at any point a task asks for memory that isn't available, it gets killed (sort of). If you try to load tasks that attempt to exceed the available memory they will fail with a "CUDA out of memory: requested XXX but YYY available".

WIS is very stable on memory allocation. It loads the models, each session adds a tiny chunk during the inference itself, and then it is freed.

There's no way I could fully describe Vicuna 13B (or similar) here but think of it as a locally hosted ChatGPT. Of course with WIS and Willow this means you can have self and locally hosted flows like:

Willow speech recognition question in -> Vicuna -> answer text to speech -> playback on Willow

The thing is - Willow is a human name! I generally agree with your outlook on this and it's one of the reasons we selected Willow as the project name. It's a legitimate, established human name but it's still relatively rare (like Alexa) but it's still somewhat known (as a tree) and it happens to be almost universally pronounced as it should just from reading it aloud, even for non-native English speakers.

Yeah, I have to repeat myself A LOT. I'll be starting a blog/substack/something and dropping these various rants there so I can just send a link instead.

lordratner May 26, 2023
Author

Again, thanks for the explanation. Very, very helpful.

I like Willow, it's the "hi" part I hate. I know, three syllables. Would it be helpful if I went and grabbed some uncommon 3+ syllable names for when you eventually get around to additional wake-words? I recently created a database of all names in the Social Security database from 1880-2021 to help my wife and I name our daughter, so I can quickly check which names have a lot of use and which (like Willow) are uncommon. I'm sure you have bigger fish to fry right now, but I'll be honest, the first thing that drew me to this repo was the misconception that willow supports custom wake words. I'm sticking around because your work is amazing, but I suspect there are a lot of simpletons like me looking to solve the "Hey Google" problem, and even though custom wake works is a non-starter, a solid set of curated choices might be the ticket.

And as my wife said last night... "Why are they all women's names? Where's my digital manservant?"

kristiankielhofner May 26, 2023
Maintainer

I appreciate your understanding but custom wake words just WILL. NOT. DIE. There are plenty of things in this world I don't understand "How does my car turn on when I hit the button?" so I appreciate this to some extent but I have the advantage/disadvantage of being in this space for decades and the shear concept of it is outside what we know to be possible with current foundational technology. I know people don't want to hear that but we're literally talking about someone like Amazon, OpenAI, etc spending tens if not hundreds of millions of dollars on research into this field... All for an infinitesimal (but VERY VERY vocal, I've learned) portion of the population that cares AT ALL. If it were relatively easy and cheap (which is relative) and there was a market for it Amazon, Google, etc would have done it but because of all of these factors they haven't and likely won't ever.

Outside of these niche open source communities I have never once heard someone in the broader population say "You know, the biggest problem with Alexa is that I can't call her whatever I want". It's just not something 99.999% of the population using voice interfaces gives a thought to. Basically no one cares other than these 10 people that can't stop talking about it for whatever reason.

Again, my button analogy. That's why it's not getting done and likely never will.

I'm pretty close to summarily ending these discussions with "Go try to throw something together with porcupine or precise" and letting people try to live with something that's effectively useless.

Speaking of research, there has been a lot of research done on male vs female voices and identities when it comes to voice interfaces. Without getting into the societal/cultural aspects, the overwhelming result is that the vast majority of the population (across genders) prefers female voices and identities for whatever reasons. Because of this, you will tend to notice that speech synthesis for female voices tends to be significantly better than male voices. This is getting better with things like the neural models from large cloud providers but at least in the open source space the female voices are a step or two above male voices. For quality speech synthesis currently you will have a female voice. Female voice = female name.

That said, another positive aspect of Willow is that it is gender neutral. So as they become available/validated we can swap in male voices without breaking the coherent "human like" approach/identity.

tensiondriven · 2023-06-28T23:53:51Z

tensiondriven
Jun 28, 2023

Great thread. Kudos to @kristiankielhofner for the wonderfully thorough replies.

@lordratner If you can spare the dough, I strongly suggest getting a 3090. Especially if you're going to work with LLM's. VRAM isn't something you can upgrade, and as soon as you start exploring all the things you can run, you're very likely going to want more. The 3090 and 4090 are the only consumer-level cards with 24GB VRAM, and inference speed is only about 30% faster on the 4090 despite being twice as expensive. Also, 3090's are coming down to ~600 on ebay or FB marketplace if you're patient.

Once you go down from there, you're looking at 16GB max, which is a major drop, and the prices aren't much cheaper at that point. Taking one more step down in terms of VRAM, when I looked at Reddit's gaming forums, a lot of people say the 3060 12GB is a viable option at about $250. If you're price-constrained, this would be a reasonable choice.

I'd personally try to get a 30-series card as these will have the longest life. CUDA seems to have decent support going back as far as Pascal (P4 / P40), and Volta (20-series) does have Tensor Cores, but the 30's really seem to be the best bet for price/performance/longevity.

1 reply

kristiankielhofner Jun 29, 2023
Maintainer

My standard recommendation essentially follow this. The decision tree is:

Cheap, low power, single slot, no extra power connector, Willow only - Tesla P4
Cheap, low power, dual slot, Willow only - GTX 1070
Triple slot, higher power, ability to support Willow/Frigate/SD/LLM/who knows - RTX 3090
Money is no object and you have the space and power? RTX 4090.
You're really not messing around? Get an H100x8 from Lambda Labs (good luck).

I'm confident in suggesting people spend $100 on Pascal cards because the VRAM and Willow performance are more than adequate for expected response times with Willow in terms of voice interactivity. Additionally, as @tensiondriven notes they are officially supported in CUDA 12 which should give them a lifespan of at least a few years. Even pytorch (stable release) is still CUDA 11.8 so extrapolating that timeline out you should be able to use current releases of WIS for likely closer to five years (if not longer).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some Questions before getting started #68

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Some Questions before getting started #68

Uh oh!

lordratner May 26, 2023

Replies: 2 comments · 5 replies

Uh oh!

kristiankielhofner May 26, 2023 Maintainer

Uh oh!

Uh oh!

lordratner May 26, 2023 Author

Uh oh!

kristiankielhofner May 26, 2023 Maintainer

Uh oh!

lordratner May 26, 2023 Author

Uh oh!

Uh oh!

kristiankielhofner May 26, 2023 Maintainer

Uh oh!

tensiondriven Jun 28, 2023

Uh oh!

kristiankielhofner Jun 29, 2023 Maintainer

lordratner
May 26, 2023

Replies: 2 comments 5 replies

kristiankielhofner
May 26, 2023
Maintainer

lordratner May 26, 2023
Author

kristiankielhofner May 26, 2023
Maintainer

lordratner May 26, 2023
Author

kristiankielhofner May 26, 2023
Maintainer

tensiondriven
Jun 28, 2023

kristiankielhofner Jun 29, 2023
Maintainer