Issues with 0.6B model using either EXL2 or EXL2ASYNC

I've been using the 1.0B model with the EXL2 backend to great effect, but am having issues with the 0.6B model. 

Using `Backend.EXL2` with the 0.6B model throws an error: `RuntimeError: torch.cat(): expected a non-empty list of Tensors`

Using `Backend.EXL2ASYNC` with the 0.6B model does work, but VRAM usage spills over into shared memory significantly for some reason, making it impractical.

My ModelConfig settings:
`
MODEL_CONFIG = outetts.ModelConfig(
    model_path=r"C:\Users\me\.cache\huggingface\hub\models--OuteAI--OuteTTS-1.0-0.6B\snapshots\12345",
    interface_version=outetts.InterfaceVersion.V3,
    backend=outetts.Backend.EXL2, # or EXL2ASYNC
    device="cuda",
    dtype=torch.bfloat16
)
`

Running Windows 11, RTX 3080Ti, latest version of the oute lib, latest version of the exllama2 lib.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Issues with 0.6B model using either EXL2 or EXL2ASYNC #91

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Issues with 0.6B model using either EXL2 or EXL2ASYNC #91

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions