Falcon 180B sample and prompt eval time #3237

gileneusz · 2023-09-17T22:30:52Z

gileneusz
Sep 17, 2023

I saw Georgi test of raw Q4 Falcon 180B without speculative sampling on M2 Ultra 192 with a prompt of ~24 tokens and observed:

https://twitter.com/ggerganov/status/1699791226780975439

Load time: 7060.78 ms
Sample time: 381.81 ms /256 runs
Prompt eval time: 808.11 ms / 24 tokens
Eval time: 40479.05 ms/ 255 runs
Total time: 41788.00 ms

I'm curious: Would these timings, especially sample and prompt eval time, increase substantially with a longer prompt with long context, say 2000-4000 tokens? If anyone has data or experience with this, I'd appreciate your input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Falcon 180B sample and prompt eval time #3237

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Falcon 180B sample and prompt eval time #3237

Uh oh!

Uh oh!

gileneusz Sep 17, 2023

Replies: 0 comments

gileneusz
Sep 17, 2023