Place to deinitialize tensor->extra after graph execution #10429

marty1885 · 2024-11-20T12:31:11Z

marty1885
Nov 20, 2024

I'm working on a backend for Tenstorrent's devices. Due to their design, it's not always possible to get buffers to map to some pointer in the host address space (ex: the allocated memory lives on a chip connected via ethernet). This limitation prompted me to look into the SYCL split device backend and how they deal with the situation. They defer the actual allocation to init_tensor and return a ad-hoc base address to make GGML happy.

That works for some time. I got LLMs working and coherent. However, I quickly notice I run into memory leaking on device. GGML is calling init_tensor multiple times with brand new tensors (same name and base address, but tensor->extra is wiped, these are all tensors to communicate with the CPU backend as I'm still lacking some operator support). Causing my backend to allocate multiple times for the same tensor.

After looking at the ggml_backend_buffer_i and other structures, I don't see a place to deallocate tensor->extra to avoid the leak. I believe the SYCL split device backend would experience the same problem, though I do not have the setup to test it.

And a related question. Why is GGML trying to reinitialize the tensors? It doesn't make much sense. It is also awkward that GGML knows to reuse the base address, but extra is being reset.

Answered by slaren

Nov 20, 2024

You need to keep track of the tensor extras and free all of them when reset is called. Most likely you will want to keep a pool of available extras and reuse them to avoid expensive allocations when initializing tensors.

        // (optional) reset any internal state due to tensor initialization, such as tensor extras
        void         (*reset)        (ggml_backend_buffer_t buffer);

View full answer

slaren · 2024-11-20T12:38:01Z

slaren
Nov 20, 2024
Maintainer

You need to keep track of the tensor extras and free all of them when reset is called. Most likely you will want to keep a pool of available extras and reuse them to avoid expensive allocations when initializing tensors.

        // (optional) reset any internal state due to tensor initialization, such as tensor extras
        void         (*reset)        (ggml_backend_buffer_t buffer);

3 replies

marty1885 Nov 20, 2024
Author

I have looked into reset. But it doesn't tell me which tensors can be safely sent back to the pool. I assume it causes all the tensors, including the weights to be reset?

slaren Nov 20, 2024
Maintainer

Only the tensors allocated in that buffer. This will only be called for compute buffers.

marty1885 Nov 20, 2024
Author

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Place to deinitialize tensor->extra after graph execution #10429

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Place to deinitialize tensor->extra after graph execution #10429

Uh oh!

marty1885 Nov 20, 2024

Replies: 1 comment · 3 replies

Uh oh!

slaren Nov 20, 2024 Maintainer

Uh oh!

marty1885 Nov 20, 2024 Author

Uh oh!

slaren Nov 20, 2024 Maintainer

Uh oh!

marty1885 Nov 20, 2024 Author

marty1885
Nov 20, 2024

Replies: 1 comment 3 replies

slaren
Nov 20, 2024
Maintainer

marty1885 Nov 20, 2024
Author

slaren Nov 20, 2024
Maintainer

marty1885 Nov 20, 2024
Author