OOM in infer_postprocessing.py

Improvement ideas:
- [x] Adapt infer_postprocessing/vocoder to fall back to cpu device when CUDA is not available, allowing to use `CUDA_VISIBLE_DEVICES=""` to avoid OOM. (implemented in #21)
- [ ] Check if vocoder uses a 32 bit model and may be easily adapted to fp16
- [ ] Check if there are more optimizations for the vocoder, e.g. by processing input in chunks or possibly also using 8 bit models.

When creating a long song, I get an OOM when postprocessing in soundstream_hubert_new.

I wonder if the postprocessing could be chunked. I am also not sure if the model is loaded in fp16 or full precision.

When trying with CPU only, I also encountered that some torch calls in the vocoder module need to set map_location. I will probably look into this for at least a cpu fallback, but probably also the cuda_idx parameter should be used there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM in infer_postprocessing.py #19

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OOM in infer_postprocessing.py #19

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions