Replies: 1 comment
-
I've discovered an additional issue, which is that even if you run
It will fail with the same errors when using FP8 quantization:
Here, 3/8 of the therads fail to load Presumably, the FP8 quantization kernels require hipBLASlt and so can't run w/ the current bug? BTW, surprisingly when testing at
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
FYI, I've filed a curious bug I've encountered w/ PyTorch but at least want to mention it here (if not create a duplicate issue yet unless they triage it as a not their problem): pytorch/pytorch#137695
Basically running the latest vLLM (HEAD) and the PyTorch nightly it depends on, hipBLASlt is used by default, and works for
-tp 1
to-tp 4
but at-tp 8
it consistently starts to report errors with loadingTensileLibrary_lazy_gfx942.dat
. The workaround is to useTORCH_BLAS_PREFER_HIPBLASLT=0
, and at least for tp 1-4, this is slightly faster anyway in my vLLM benchmark_throughput testing.Leaving this here to potentially save some people some hair-pulling, as it took me a while to debug (since it works on lower tps)
Beta Was this translation helpful? Give feedback.
All reactions