Could the quantization compressor run with a cuda device?

Hey folks, I was running NVFP4A16 quantization on Llama4 on a 8xB200 using llm-compressor.

It almost finished after 4+ hours but my process crashed! I then realized it was running on cpu, so I switched it to cuda with all 8 GPUs and it ran blazing fast.

So, I thought to myself why doesn't the `exec_device` have an option to use `cuda`? Thanks!

https://github.com/neuralmagic/compressed-tensors/blame/40ec65b878fa7996b9deb22e242f633b0d6ec338/src/compressed_tensors/compressors/model_compressors/model_compressor.py#L398

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could the quantization compressor run with a cuda device? #389

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could the quantization compressor run with a cuda device? #389

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions