-
I'm working on a backend for Tenstorrent's devices. Due to their design, it's not always possible to get buffers to map to some pointer in the host address space (ex: the allocated memory lives on a chip connected via ethernet). This limitation prompted me to look into the SYCL split device backend and how they deal with the situation. They defer the actual allocation to That works for some time. I got LLMs working and coherent. However, I quickly notice I run into memory leaking on device. GGML is calling After looking at the And a related question. Why is GGML trying to reinitialize the tensors? It doesn't make much sense. It is also awkward that GGML knows to reuse the base address, but |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
You need to keep track of the tensor extras and free all of them when // (optional) reset any internal state due to tensor initialization, such as tensor extras
void (*reset) (ggml_backend_buffer_t buffer); |
Beta Was this translation helpful? Give feedback.
You need to keep track of the tensor extras and free all of them when
reset
is called. Most likely you will want to keep a pool of available extras and reuse them to avoid expensive allocations when initializing tensors.