Potential Hardware Failure when running vllm #7682
Closed
NicolasDrapier
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
I've noticed some strange behavior when using
vllm
. After running a few requests on the server, my hardware sometimes crashes completely—this includes all four physical power supplies shutting off.I followed the hardware testing procedures as outlined in this guide, and all tests passed successfully.
However, there are instances where, if all four power supplies don't shut off, one or two of them will physically turn off. I tried running the same workload with
text-generation-inference
, and while I did not encounter a full server crash, some of the power supplies still tripped.My Questions:
Setup:
The command I use :
Any guidance or suggestions would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions