Skip to content

Could the quantization compressor run with a cuda device? #389

@ColinPeppler

Description

@ColinPeppler

Hey folks, I was running NVFP4A16 quantization on Llama4 on a 8xB200 using llm-compressor.

It almost finished after 4+ hours but my process crashed! I then realized it was running on cpu, so I switched it to cuda with all 8 GPUs and it ran blazing fast.

So, I thought to myself why doesn't the exec_device have an option to use cuda? Thanks!

https://github.com/neuralmagic/compressed-tensors/blame/40ec65b878fa7996b9deb22e242f633b0d6ec338/src/compressed_tensors/compressors/model_compressors/model_compressor.py#L398

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions