llama.cpp inference code/compute flow #8747
Closed
kaizizzzzzz
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This project is comprehensive and complex. I'm interested in the inference part, especially the low-bit quantization inference. I did see some cuda kernels for some kquantization techniques. But I didn't find the entire inference compute flow/code in llama.cpp (Where some kernels may be replaced by the cuda kernels). Could anyone provide some help? Thx!
Beta Was this translation helpful? Give feedback.
All reactions