How to support model with dynamic inference graph #9295
Unanswered
RunningLeon
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, thanks for your notice. I want to support model in llama.cpp that has a different compute graph between prefilling and decoding stages. I wonder if llama.cpp support dynamic inference graph(skip some layers in prefill stages). If so, how to do this ?
Beta Was this translation helpful? Give feedback.
All reactions