Skip to content

OOM in infer_postprocessing.py #19

@allo-

Description

@allo-

Improvement ideas:

  • Adapt infer_postprocessing/vocoder to fall back to cpu device when CUDA is not available, allowing to use CUDA_VISIBLE_DEVICES="" to avoid OOM. (implemented in Allow to change the device used in postprocessing #21)
  • Check if vocoder uses a 32 bit model and may be easily adapted to fp16
  • Check if there are more optimizations for the vocoder, e.g. by processing input in chunks or possibly also using 8 bit models.

When creating a long song, I get an OOM when postprocessing in soundstream_hubert_new.

I wonder if the postprocessing could be chunked. I am also not sure if the model is loaded in fp16 or full precision.

When trying with CPU only, I also encountered that some torch calls in the vocoder module need to set map_location. I will probably look into this for at least a cpu fallback, but probably also the cuda_idx parameter should be used there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions