You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey folks, I was running NVFP4A16 quantization on Llama4 on a 8xB200 using llm-compressor.
It almost finished after 4+ hours but my process crashed! I then realized it was running on cpu, so I switched it to cuda with all 8 GPUs and it ran blazing fast.
So, I thought to myself why doesn't the exec_device have an option to use cuda? Thanks!