Question Regarding Assigning Backend Memory for loading Data on Tensor for llama.cpp model #11993
akapoor3518
started this conversation in
General
Replies: 1 comment
-
Hi Ggerganov, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Ggerganov,
I am trying to run Tinylamma which has around 75 Tensors. Currently My backend only support ADD & MUL(Scalar or Vector). Hence i have two backend my GPU Backend(Custom Hardware) and CPU. The Tensor which i want to run at MY Custom Hardware must use my Custom Memory(coming from GPU) not CPU memory. since at every Tensor which i am running at my hardware i have added Customer header+Data and tensor->data pointing to Data hence i need Custom header in-order to run my customer Kernel, hence i need to do at my backend init_tensor
tensor->data = (void*)(sizeof(tensor_data_header) + (char *)tensor->data);
Now at my backend graph compute i do following
void *p = tensor->data
custom header *hp = (custom_header *) p;
--hp(this will have my header where i can fill my custom header detail) but since This memory(tensor->data) is created by CPU i am crashing here. How to make sure tensor->data memory allocated by GPU for supported OP, example particular node tensor, & leaf tensor: node->src[0] , node->src[1] for supported OP by GPU.
We dont have any documentation hence i am going over code, but it will help me if you can guide how to enforce tensor to use GPU memory for supported OP and related Leaf node.
SOme more information, I have following API some part of code implemented as below
ggml_backend_custom_buffer_type_get_alloc_size
return (sizeof(tensor_data_header) + ggml_nbytes(tensor));
##Backend setup
model.backend = ggml_backend_custom_init();
// loading the data to tensor
ggml_backend_tensor_set(model.a, a, 0, ggml_nbytes(model.a));
ggml_backend_tensor_set(model.b, b, 0, ggml_nbytes(model.b));
// build operations nodes
ggml_build_forward_expand(gf, result);
Validate result.
Thanks in advance,
Anoop
Beta Was this translation helpful? Give feedback.
All reactions