Inference speed worse on AMD CPU than on Intel CPU #119
Unanswered
CrazyChildren
asked this question in
Q&A
Replies: 1 comment
-
@CrazyChildren one quick check to verify if this is indeed due to bf16 (which is the likely case) is to load the model in fp32. Here's the relevant code: import pandas as pd # requires: pip install pandas
import torch
from chronos import ChronosPipeline
pipeline = ChronosPipeline.from_pretrained(
"amazon/chronos-t5-small",
device_map="cuda", # use "cpu" for CPU inference and "mps" for Apple Silicon
torch_dtype=torch.float32,
) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
i test chronos with intel core cpu(mac pro), linux with intel cpu(server), and linux with amd(server) on same code. it seems amd cpu has ~30x worse in inference time.
in intel cpu it approximate cost 0.7s with batch_num = 1, predict_len = 1, context_len = 70.
however in AMD, it about 30s.
i don't know it's my specific case. but i found some one said turn on AMP in AMD CPU by using auto_cast to bfloat16 would case decresing performance. Bfloat16 CPU inference speed is too slow on AMD cpu
i'm quite a newbie in torch. so if someone find a solution, please post here. thx
Beta Was this translation helpful? Give feedback.
All reactions