You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to modify the kv cache so that each token is its own tensor, by adding another k_l and v_l which is a double pointer vector. That way each token could be stored on a different device. When the entire cache is needed, I concat the double pointer ggml_tensor array together by using the original k_l and v_l as copy buffers.
The main confusion I have is why I keep running out of memory from the memory pool, and how I should know ahead of time how much memory I need. I have just randomly made it larger until I didn't see the error mesage anymore.
I'm also getting an EXC_BAD_ACCESS segfault from llm_build_kv according to lldb, and not sure what could be causing this, is it because I'm making the context too large? I doubt it because shouldn't that print an error? It's probably how I take the view and copy into it.
The code is disgusting because I'm just trying this idea out. I'd appreciate any guidance. Here are the rough draft changes I've made if you'd like to take a peek.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm trying to modify the kv cache so that each token is its own tensor, by adding another k_l and v_l which is a double pointer vector. That way each token could be stored on a different device. When the entire cache is needed, I concat the double pointer ggml_tensor array together by using the original k_l and v_l as copy buffers.
The main confusion I have is why I keep running out of memory from the memory pool, and how I should know ahead of time how much memory I need. I have just randomly made it larger until I didn't see the error mesage anymore.
I'm also getting an
EXC_BAD_ACCESS
segfault fromllm_build_kv
according to lldb, and not sure what could be causing this, is it because I'm making the context too large? I doubt it because shouldn't that print an error? It's probably how I take the view and copy into it.The code is disgusting because I'm just trying this idea out. I'd appreciate any guidance. Here are the rough draft changes I've made if you'd like to take a peek.
Thanks!! Any comments are appreciated.
Beta Was this translation helpful? Give feedback.
All reactions