set_llm_cache doesn't work #25077
Replies: 8 comments 6 replies
-
To resolve the issue with the Example Code with SQLite Cacheimport datetime
from langchain_openai import ChatOpenAI
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from langchain_community.callbacks import get_openai_callback
from langchain_core.utils.utils import version
print(f'{version("langchain")=}')
print(f'{version("langchain_community")=}')
print(f'{version("langchain_core")=}')
# Set the global cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
llm = ChatOpenAI(
model_name="gpt-4o-mini",
max_tokens=3500,
temperature=0,
verbose=False
)
start_time = datetime.datetime.now()
with get_openai_callback() as cb:
response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()
print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")
# Run the same query again to check if it hits the cache
start_time = datetime.datetime.now()
with get_openai_callback() as cb:
response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()
print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]") Key Points to Check
By following these steps, you should be able to use the cache correctly and prevent repeated calls to the LLM for the same prompt [1][2][3][4]. |
Beta Was this translation helpful? Give feedback.
-
@dosu, sorry, I can't find any difference between your code and my. It's exactly the same. And it doesn't work for me. |
Beta Was this translation helpful? Give feedback.
-
@dosu, I enabled debug mode and verbose, but there are nothing about cache here. |
Beta Was this translation helpful? Give feedback.
-
I see that DebugSQLiteCache can find value in cache:
But interesting that it is stored together with usage_metadata and it can be that it returns from cache together with used tokens as it was used in non-cached call. In fact it's not correct information, because tokens were not used already, but received from cache. |
Beta Was this translation helpful? Give feedback.
-
yes! that is the root of the problem.
Now it's clear that first call was to LLM, but another are from cache. But it has incorrect information about used tokens and cost, because it was copied from original request. |
Beta Was this translation helpful? Give feedback.
-
Now only question that I can't understand if answer was generated or restored from cache, because I always see count of tokens and cost. |
Beta Was this translation helpful? Give feedback.
-
@dosu, looks like I only can fork code, because these methods are in BaseChatModel. |
Beta Was this translation helpful? Give feedback.
-
How can we know whether the cache was used and which cache was used? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I call set_llm_cache(SQLiteCache()) and expect to read query from cache with the same prompt, but in fact I see call to LLM and used tokens. The same call time, the same count of tokens and cost. In previous versions I saw 0 count of used tokens, because it was from cache.
Output:
System Info
System Information
Package Information
Packages not installed (Not Necessarily a Problem)
The following packages were not found:
Beta Was this translation helpful? Give feedback.
All reactions