Some Questions before getting started #68
-
|
Hi! First off, thanks for what looks like an incredible tool. Seriously, this is truly impressive work. I have a few questions I was hoping you could answer before I dive in.
Thanks again for such a cool project. I look forward to bugging you with more questions as I get used to it! Seth |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
|
First of all - I'm cracking up at the Coral support! Good one! GPUs have schedulers and memory allocation much like CPU does - they just don't have the concept of swap space (there are some approaches to shuffle memory between VRAM and RAM but let's ignore that for now). As long as everything can fit in VRAM you can have any number of applications share the same GPU and it will figure out the scheduling just like a CPU. Pascal series and 8GB VRAM is what I'm considering to be the "five year minimum" supported GPU for Willow. This is why I generally recommend the GTX 1070/Tesla P4. However, of course an RTX 3090 is better in every way. First, it's significantly faster (of course). You can view the benchmarks in the README to get an idea of how much faster. Secondly, here's the output from nvidia-smi on one of my RTX 3090 WIS instances with WIS running: So you can see that WIS (with all three Whisper models and TTS loaded) consumes 4.7GB of memory. We're going to optimize the TTS models to bring that down and allow more granular model loading but this is why I suggest GPUs with 8GB VRAM - plenty of room to grow. In terms of the RTX 3090, you can easily run Stable Diffusion (approximately 4GB VRAM on my hosted instance) and a bunch of other CUDA applications. In fact, with 24GB VRAM you can run WIS, Vicuna 13B, and Stable Diffusion concurrently on an RTX 3090 (as one example of using VRAM). WIS is an example of a perfect application for sharing a GPU - wake word speech recognition is a 100x a day task (or whatever) and the GPU is idle otherwise. Additionally, on something like an RTX 3090 most command-based speech recognition sessions are so fast and use so little data we barely hit something like 30% GPU utilization for a couple hundred milliseconds on an RTX 3090. Wake words... I'm learning that this is an area where it will be impossible to make everyone happy. Training a real wake word that actually works is an expensive (at least tens of thousands of dollars) and time consuming task. No matter how many times I try to explain it a portion of the open source community simply refuses to understand the fundamental, practical issues related to wake word. They are free to try to put together a Raspberry Pi and one of the open source "wake engines" that "support" custom wake words. They can experience for themselves how atrocious the experience is. It simply does not work and I highly recommend they attempt it as I and many others have over the past decade. I guarantee the vast majority will be thrilled to come back to Willow and just get over saying one of our supported wake words. My plan is to start Kickstarters for the top X of wake words suggested/voted by the community. The ones that raise enough to cover the process will go into Willow. Basically, if you want one of the candidate wake words put your money where your mouth is. Very apropos expression for a speech related project. That said, "Hi Willow"/"Hey Willow" will almost certainly be included in the candidates. However, another thing they also seem to refuse to understand. I get suggestions for "Computer" a-la Star Trek. That is a firm NO. The false activations will be off the charts. They like to suggest words with 1-2 syllables. That is also a firm NO (commercial standard is three syllable minimum). I would really hope certain elements of the open source community could try to understand that this is now a 10 year old field that has demonstrated real world success and feasibility. We know what we're doing by now yet, somehow, they seem insistent on making the same mistakes we've already learned lessons from over the past decade. I am not going to let Willow go to the graveyard of failed speech projects that refuse to learn from these past mistakes. It's time to accept reality because that graveyard is full. Thank you and you're welcome! Kristian |
Beta Was this translation helpful? Give feedback.
-
|
Great thread. Kudos to @kristiankielhofner for the wonderfully thorough replies. @lordratner If you can spare the dough, I strongly suggest getting a 3090. Especially if you're going to work with LLM's. VRAM isn't something you can upgrade, and as soon as you start exploring all the things you can run, you're very likely going to want more. The 3090 and 4090 are the only consumer-level cards with 24GB VRAM, and inference speed is only about 30% faster on the 4090 despite being twice as expensive. Also, 3090's are coming down to ~600 on ebay or FB marketplace if you're patient. Once you go down from there, you're looking at 16GB max, which is a major drop, and the prices aren't much cheaper at that point. Taking one more step down in terms of VRAM, when I looked at Reddit's gaming forums, a lot of people say the 3060 12GB is a viable option at about $250. If you're price-constrained, this would be a reasonable choice. I'd personally try to get a 30-series card as these will have the longest life. CUDA seems to have decent support going back as far as Pascal (P4 / P40), and Volta (20-series) does have Tensor Cores, but the 30's really seem to be the best bet for price/performance/longevity. |
Beta Was this translation helpful? Give feedback.
First of all - I'm cracking up at the Coral support! Good one!
GPUs have schedulers and memory allocation much like CPU does - they just don't have the concept of swap space (there are some approaches to shuffle memory between VRAM and RAM but let's ignore that for now). As long as everything can fit in VRAM you can have any number of applications share the same GPU and it will figure out the scheduling just like a CPU. Pascal series and 8GB VRAM is what I'm considering to be the "five year minimum" supported GPU for Willow. This is why I generally recommend the GTX 1070/Tesla P4.
However, of course an RTX 3090 is better in every way. First, it's significantly faster (of course). You can …