Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your current environment
🐛 Describe the bug
i use the command(
nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu -o mistral_vllm -f true -x true --cuda-graph-trace node python3 vllm_mistral.py
) to profile the vllm.first question
what's the
fill_reverse_indices_kernel
second question
For the red block, I think it's one iteration that will generate one token. But for the green block, the time gap is 85ms, why does this gap exist and why is the gap so large? i think after the green block, it's the next iteration that will generate next token.

Beta Was this translation helpful? Give feedback.
All reactions