-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
Improvement ideas:
- Adapt infer_postprocessing/vocoder to fall back to cpu device when CUDA is not available, allowing to use
CUDA_VISIBLE_DEVICES=""
to avoid OOM. (implemented in Allow to change the device used in postprocessing #21) - Check if vocoder uses a 32 bit model and may be easily adapted to fp16
- Check if there are more optimizations for the vocoder, e.g. by processing input in chunks or possibly also using 8 bit models.
When creating a long song, I get an OOM when postprocessing in soundstream_hubert_new.
I wonder if the postprocessing could be chunked. I am also not sure if the model is loaded in fp16 or full precision.
When trying with CPU only, I also encountered that some torch calls in the vocoder module need to set map_location. I will probably look into this for at least a cpu fallback, but probably also the cuda_idx parameter should be used there.
Metadata
Metadata
Assignees
Labels
No labels