n_tokens <= n_batch, beta, conversation history #176
Replies: 4 comments 7 replies
-
todo : explore : or integrate with : If your application is GPL 3.0 compliant, feel free to inspire yourself here as to how that can go: https://github.com/nathanlesage/local-chat |
Beta Was this translation helpful? Give feedback.
-
@giladgd how does node-llama-cpp manage long conversation history if it is longer than the context of the model (with v3/beta ) ? |
Beta Was this translation helpful? Give feedback.
-
thxs @giladgd for your example. //session.dispose(); DisposedError: Object is disposed */ and i add to create a new context. here is what works for me with gpu:false
I think it should be a good idea to be openai Api compatible, using role: system, role: user, role: assistant, and content for each
translation from one template to another was made with TemplateChatWrapper in v2 but i don't know if it's possible in v3 ? |
Beta Was this translation helpful? Give feedback.
-
and why is response an array with only one text ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Transfered from #105 (comment)
for the issue of @scenaristeur scenaristeur mentioned this pull request Mar 4, 2024
n_tokens <= n_batch
i have tried to migrate to "node-llama-cpp": "^3.0.0-beta.13", but not i have a crash on my laptop ideapad (https://www.google.com/search?client=firefox-b-lm&q=ideapad+3+15alc6 ) (no GPU, AMD rizen 5000, 16 core CPU / 16GBram)
It worked like a charm with "node-llama-cpp": "^2.8.8", (i had no issue of memory apart n_tokens < n_batch with long ConversationHistory) but now it crashes even with a small conversationhistory with "radv/amdgpu: Not enough memory for command submission."
with this usage https://github.com/scenaristeur/igora/blob/node_llama_cpp_v3_beta/src/mcConnector/index.js
with V2.8.8 i got https://github.com/scenaristeur/igora/blob/3342a1a48172eae1d31489e33a64fe025e1cb522/src/mcConnector/index.js
and it works until token.length is about 300 with (328 is ok , 536 is ko)
if more token, i get n_tokens <= n_batch
it's a 16 core CPU only, no GPU, i'll try getLlama with GPU false . thxs. Perharps i've istalled some Vulkan tools trying some llm but it's a CPU only
thxs ,
Works with "gpu:false", but i've lost conversationHistory, how to deal with conversationHistory in the beta version ? I'm working on a server where there can be multiple sessions, with multiple history, in what format should history be injected to a session ? to which class ?
Beta Was this translation helpful? Give feedback.
All reactions