You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I wish I could get a methodology for the calculation of time to first token and time per output token given a CPU specs. Would the calculations be similar to those that I have for GPUs?
Just for clarity, given model specs and GPU specs, I perform the following calculations:
Model params:
s : sequence length
b : batch size
h : hidden dimension
L : number of transformer layers
N : model parameters GPU params:
FLOP rate: GPU FLOPs rate. For A100, this is FLOPs/second.
HBM rate: GPU High Bandwidth Memory (HBM) rate. For A100, this is TB/second.
Assuming a 16 bit precision, or 2 bytes per param:
Prefill Compute: 2 x N x b x s
Decode Compute: 2 x N x b x 1
Prefill Memory = Decode Memory = 2*N
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I wish I could get a methodology for the calculation of time to first token and time per output token given a CPU specs. Would the calculations be similar to those that I have for GPUs?
Just for clarity, given model specs and GPU specs, I perform the following calculations:
Model params:
s : sequence length
b : batch size
h : hidden dimension
L : number of transformer layers
N : model parameters
GPU params:
FLOP rate: GPU FLOPs rate. For A100, this is FLOPs/second.
HBM rate: GPU High Bandwidth Memory (HBM) rate. For A100, this is TB/second.
Assuming a 16 bit precision, or 2 bytes per param:
Prefill Compute: 2 x N x b x s
Decode Compute: 2 x N x b x 1
Prefill Memory = Decode Memory = 2*N
TTFT = (Prefill Compute / Flop rate) + (Prefill Memory / HBM rate)
TPOT = (Decode Compute / Flop rate) + (Decode Memory / HBM rate)
Beta Was this translation helpful? Give feedback.
All reactions