-
Notifications
You must be signed in to change notification settings - Fork 18
[WIP]🚨 set dtype=float16 for CPU as well #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
bc83e4b
to
3190d8d
Compare
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@@ -228,7 +229,8 @@ def generate_hf_output( | |||
if not isinstance(max_new_tokens, list): | |||
max_new_tokens = [max_new_tokens] * len(prompts) | |||
|
|||
hf_model = AutoModelForCausalLM.from_pretrained(model) | |||
hf_model = AutoModelForCausalLM.from_pretrained(model, | |||
torch_dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this should be float16
or bfloat16
Description
Use
float16
for CPU to try and speed testsRelated Issues