You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using local models, my agent is very slow when its processing the previous action + observation.
I've ran the same model with the same parameters using llama.cpp (the ./main script) using my agent's prompt minus the Final Answer: to see how long prompt processing and generation should take at the last step. It was very fast, within a few seconds.
This led me to suspect Agents aren't caching the prompt tokens to avoid re-tokenizing; which leads to the lag in between text generations. Is this true?
And if so, are there any downsides to saving the prompt tokens in the agent and appending the agent/tool outputs to it?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Using local models, my agent is very slow when its processing the previous action + observation.
I've ran the same model with the same parameters using
llama.cpp
(the./main
script) using my agent's prompt minus theFinal Answer:
to see how long prompt processing and generation should take at the last step. It was very fast, within a few seconds.This led me to suspect
Agents
aren't caching the prompt tokens to avoid re-tokenizing; which leads to the lag in between text generations. Is this true?And if so, are there any downsides to saving the prompt tokens in the agent and appending the agent/tool outputs to it?
Beta Was this translation helpful? Give feedback.
All reactions