Where to insert code to deploy to custom accelerator? #6070
-
I have a custom accelerator that can do matrix multiplications and an associated C++/C API. It accepts bfloat16 inputs, and int4 weights. I am curious if someone can help me figure out the best place to insert this API? I am thinking ggml-quant.c Additionally, the accelerator requires that weights be preloaded on off chip memory. So it would be nice if there was some pass where I can find all the matrix multiplication ops, preload the weight tensors, and cache an association between the op instance, and the device buffer. Any guidance or ideas on this would be greatly appreciated! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Does it support only matrix multiplications or can it potentially do all other ops as well? |
Beta Was this translation helpful? Give feedback.
Look into the way the OpenCL backend is implemented. If you need to upload the weights to the accelerator, at this point, that would be the easiest way to do it. Ideally, you would create a full backend implementing the ggml-backend interface, but that's not really an option at the moment for backends that can only do matrix multiplication.