set_llm_cache doesn't work #25077

PavelAgurov · 2024-08-05T21:27:08Z

PavelAgurov
Aug 5, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

import datetime

from langchain_openai import ChatOpenAI
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from langchain_community.callbacks import get_openai_callback
from langchain_core.utils.utils import version

print(f'{version("langchain")=}')
print(f'{version("langchain_community")=}')
print(f'{version("langchain_core")=}')

set_llm_cache(SQLiteCache())

llm = ChatOpenAI(
    model_name     = "gpt-4o-mini",
    max_tokens     = 3500,
    temperature    = 0,
    verbose        = False
)

start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

Description

I call set_llm_cache(SQLiteCache()) and expect to read query from cache with the same prompt, but in fact I see call to LLM and used tokens. The same call time, the same count of tokens and cost. In previous versions I saw 0 count of used tokens, because it was from cache.

Output:

PS D:\> & python.exe test_cache.py

version("langchain")='0.2.12'
version("langchain_community")='0.2.11'
version("langchain_core")='0.2.28'
content='Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!' response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 12, 'total_tokens': 29}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_48196bc67a', 'finish_reason': 'stop', 'logprobs': None} id='run-d398cc1d-dc05-4389-895e-cf5f227ad75a-0' usage_metadata={'input_tokens': 12, 'output_tokens': 17, 'total_tokens': 29}
LLM Time: 0:00:00.591544
Tokens used: cb.prompt_tokens=12 cb.completion_tokens=17 [cb.total_cost=0.000012]

PS D:\> & python.exe test_cache.py
version("langchain")='0.2.12'
version("langchain_community")='0.2.11'
version("langchain_core")='0.2.28'
content='Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!' response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 12, 'total_tokens': 29}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_48196bc67a', 'finish_reason': 'stop', 'logprobs': None} id='run-d398cc1d-dc05-4389-895e-cf5f227ad75a-0' usage_metadata={'input_tokens': 12, 'output_tokens': 17, 'total_tokens': 29}
LLM Time: 0:00:00.373562
Tokens used: cb.prompt_tokens=12 cb.completion_tokens=17 [cb.total_cost=0.000012]

System Info

System Information

OS: Windows
OS Version: 10.0.22631
Python Version: 3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]

Package Information

langchain_core: 0.2.28
langchain: 0.2.12
langchain_community: 0.2.11
langsmith: 0.1.96
langchain_openai: 0.1.20
langchain_text_splitters: 0.2.2

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph
langserve

@dosu · 2024-08-05T21:32:54Z

dosubot[bot]
bot Aug 5, 2024

To resolve the issue with the set_llm_cache function in LangChain so that the cache works correctly and prevents repeated calls to the LLM for the same prompt, you need to ensure that the cache is properly set and used. Here is how you can use the SQLiteCache provided by LangChain:

Example Code with SQLite Cache

import datetime
from langchain_openai import ChatOpenAI
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from langchain_community.callbacks import get_openai_callback
from langchain_core.utils.utils import version

print(f'{version("langchain")=}')
print(f'{version("langchain_community")=}')
print(f'{version("langchain_core")=}')

# Set the global cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    max_tokens=3500,
    temperature=0,
    verbose=False
)

start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

# Run the same query again to check if it hits the cache
start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

Key Points to Check

Database Path: Ensure that the database_path is correctly set and accessible.
Cache Initialization: Verify that the cache is initialized before making any LLM calls.
Cache Methods: Ensure that the SQLiteCache class implements the necessary methods (update, aupdate, get, aget).

By following these steps, you should be able to use the cache correctly and prevent repeated calls to the LLM for the same prompt ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

0 replies

PavelAgurov · 2024-08-06T08:11:43Z

PavelAgurov
Aug 6, 2024
Author

@dosu, sorry, I can't find any difference between your code and my. It's exactly the same. And it doesn't work for me.

1 reply

dosubot[bot] bot Aug 6, 2024

The issue might be related to how the cache key is generated and used in the lookup and update methods. Ensure that the key generation is consistent between these methods. Here is the relevant code:

def lookup(self, prompt: str, llm_string: str) -> Optional[RETURN_VAL_TYPE]:
    """Look up based on prompt and llm_string."""
    return self._cache.get((prompt, llm_string), None)

def update(self, prompt: str, llm_string: str, return_val: RETURN_VAL_TYPE) -> None:
    """Update cache based on prompt and llm_string."""
    self._cache[(prompt, llm_string)] = return_val

Make sure that the prompt and llm_string are consistently serialized and used in both methods. Any discrepancy can lead to cache misses.

Additionally, there is a known issue with the SQLiteCache implementation related to deserialization of cache values. If the cache contains data in an older format, it might not be deserialized correctly, leading to cache misses ^[1].

To debug this further, you can enable verbose or debug mode to trace cache hits and misses:

Enable Verbose Mode

from langchain.globals import set_verbose

set_verbose(True)

Enable Debug Mode

from langchain.globals import set_debug

set_debug(True)
set_verbose(False)

These modes will help you trace the events and identify if the cache is being hit or missed ^[2]^[3]^[4].

If the issue persists, consider checking for any discrepancies in the cache key generation or potential deserialization issues in the SQLiteCache implementation.

To continue talking to Dosu, mention @dosu.

PavelAgurov · 2024-08-06T09:19:53Z

PavelAgurov
Aug 6, 2024
Author

@dosu, I enabled debug mode and verbose, but there are nothing about cache here.
I remove old cache file and run code 2 times, so it's not a problem with an older format of cache.
I also tried this code with InMemoryCache and it's the same issue.
For me it looks like a bug in ChatOpenAI.

2 replies

dosubot[bot] bot Aug 6, 2024

The issue might indeed be related to the SQLiteCache implementation. Here are a few steps to further diagnose and potentially resolve the problem:

Check Cache Initialization:
Ensure that the SQLiteCache is properly initialized and that the database file is being created.

Verify Cache Methods:
Confirm that the lookup and update methods in SQLiteCache are functioning correctly. Here is a simplified version of these methods:

from sqlalchemy import create_engine, select
from sqlalchemy.orm import Session
from langchain_community.cache import SQLAlchemyCache

class SQLiteCache(SQLAlchemyCache):
    def __init__(self, database_path: str = ".langchain.db"):
        engine = create_engine(f"sqlite:///{database_path}")
        super().__init__(engine)

    def lookup(self, prompt: str, llm_string: str):
        stmt = (
            select(self.cache_schema.response)
            .where(self.cache_schema.prompt == prompt)
            .where(self.cache_schema.llm == llm_string)
            .order_by(self.cache_schema.idx)
        )
        with Session(self.engine) as session:
            rows = session.execute(stmt).fetchall()
            if rows:
                try:
                    return [loads(row[0]) for row in rows]
                except Exception:
                    return [Generation(text=row[0]) for row in rows]
        return None

    def update(self, prompt: str, llm_string: str, return_val):
        items = [
            self.cache_schema(prompt=prompt, llm=llm_string, response=dumps(gen), idx=i)
            for i, gen in enumerate(return_val)
        ]
        with Session(self.engine) as session, session.begin():
            for item in items:
                session.merge(item)

Test Cache Functionality:
Add print statements or logging to the lookup and update methods to verify that they are being called and that the cache is being accessed and updated correctly.

Example Code with Logging:
Here is an example with added logging to help trace the cache operations:

import datetime
import logging
from langchain_openai import ChatOpenAI
from langchain.globals import set_llm_cache
from langchain_community.cache import SQLiteCache
from langchain_community.callbacks import get_openai_callback
from langchain_core.utils.utils import version

logging.basicConfig(level=logging.DEBUG)

class DebugSQLiteCache(SQLiteCache):
    def lookup(self, prompt: str, llm_string: str):
        logging.debug(f"Looking up cache for prompt: {prompt}, llm_string: {llm_string}")
        result = super().lookup(prompt, llm_string)
        logging.debug(f"Cache lookup result: {result}")
        return result

    def update(self, prompt: str, llm_string: str, return_val):
        logging.debug(f"Updating cache for prompt: {prompt}, llm_string: {llm_string}, return_val: {return_val}")
        super().update(prompt, llm_string, return_val)
        logging.debug("Cache update complete")

print(f'{version("langchain")=}')
print(f'{version("langchain_community")=}')
print(f'{version("langchain_core")=}')

set_llm_cache(DebugSQLiteCache(database_path=".langchain.db"))

llm = ChatOpenAI(
    model_name="gpt-4o-mini",
    max_tokens=3500,
    temperature=0,
    verbose=False
)

start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

# Run the same query again to check if it hits the cache
start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

By adding logging, you can trace whether the cache is being accessed and updated correctly. If the issue persists, it might be worth checking for any recent changes or known issues in the SQLiteCache implementation ^[1]^[2].

This is a tough nut to crack, could really use your help on this, @eyurtsev.

To continue talking to Dosu, mention @dosu.

JoyboyBrian Aug 6, 2024

@dosu, I enabled debug mode and verbose, but there are nothing about cache here. I remove old cache file and run code 2 times, so it's not a problem with an older format of cache. I also tried this code with InMemoryCache and it's the same issue. For me it looks like a bug in ChatOpenAI.

Test

PavelAgurov · 2024-08-06T09:33:18Z

PavelAgurov
Aug 6, 2024
Author

I see that DebugSQLiteCache can find value in cache:

DEBUG:root:Cache lookup result: [ChatGeneration(text='Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!', generation_info={'finish_reason': 'stop', 'logprobs': None}, message=AIMessage(content='Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!', response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 12, 'total_tokens': 29}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_48196bc67a', 'finish_reason': 'stop', 'logprobs': None}, id='run-d398cc1d-dc05-4389-895e-cf5f227ad75a-0', usage_metadata={'input_tokens': 12, 'output_tokens': 17, 'total_tokens': 29}))]
content='Why did the scarecrow win an award?\n\nBecause he was outstanding in his field!' response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 12, 'total_tokens': 29}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_48196bc67a', 'finish_reason': 'stop', 'logprobs': None} id='run-d398cc1d-dc05-4389-895e-cf5f227ad75a-0' usage_metadata={'input_tokens': 12, 'output_tokens': 17, 'total_tokens': 29}

But interesting that it is stored together with usage_metadata and it can be that it returns from cache together with used tokens as it was used in non-cached call. In fact it's not correct information, because tokens were not used already, but received from cache.

0 replies

PavelAgurov · 2024-08-06T09:39:35Z

PavelAgurov
Aug 6, 2024
Author

yes! that is the root of the problem.
I asked to generate a long text and now we can see difference:

LLM Time: 0:00:35.849433
Tokens used: cb.prompt_tokens=16 cb.completion_tokens=1557 [cb.total_cost=0.000937]
LLM Time: 0:00:00
Tokens used: cb.prompt_tokens=16 cb.completion_tokens=1557 [cb.total_cost=0.000937]
LLM Time: 0:00:00
Tokens used: cb.prompt_tokens=16 cb.completion_tokens=1557 [cb.total_cost=0.000937]

Now it's clear that first call was to LLM, but another are from cache. But it has incorrect information about used tokens and cost, because it was copied from original request.

1 reply

JoyboyBrian Aug 6, 2024

TEST

Test

PavelAgurov · 2024-08-06T09:47:14Z

PavelAgurov
Aug 6, 2024
Author

Now only question that I can't understand if answer was generated or restored from cache, because I always see count of tokens and cost.

1 reply

dosubot[bot] bot Aug 6, 2024

To clearly indicate whether a response was generated by the LLM or retrieved from the cache, you can modify the code to add a flag in the response_metadata to indicate the source of the response. Here is how you can do it:

Modify the `_generate_with_cache` Method

def _generate_with_cache(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        if isinstance(self.cache, BaseCache):
            llm_cache = self.cache
        else:
            llm_cache = get_llm_cache()
        check_cache = self.cache or self.cache is None
        if check_cache:
            if llm_cache:
                llm_string = self._get_llm_string(stop=stop, **kwargs)
                prompt = dumps(messages)
                cache_val = llm_cache.lookup(prompt, llm_string)
                if isinstance(cache_val, list):
                    for generation in cache_val:
                        generation.message.response_metadata['source'] = 'cache'
                    return ChatResult(generations=cache_val)
            elif self.cache is None:
                pass
            else:
                raise ValueError(
                    "Asked to cache, but no cache found at `langchain.cache`."
                )
        if type(self)._stream != BaseChatModel._stream and kwargs.pop(
            "stream",
            (
                next(
                    (
                        True
                        for h in run_manager.handlers
                        if isinstance(h, _StreamingCallbackHandler)
                    ),
                    False,
                )
                if run_manager
                else False
            ),
        ):
            chunks: List[ChatGenerationChunk] = []
            for chunk in self._stream(messages, stop=stop, **kwargs):
                chunk.message.response_metadata = _gen_info_and_msg_metadata(chunk)
                if run_manager:
                    if chunk.message.id is None:
                        chunk.message.id = f"run-{run_manager.run_id}"
                    run_manager.on_llm_new_token(
                        cast(str, chunk.message.content), chunk=chunk
                    )
                chunks.append(chunk)
            result = generate_from_stream(iter(chunks))
        else:
            if inspect.signature(self._generate).parameters.get("run_manager"):
                result = self._generate(
                    messages, stop=stop, run_manager=run_manager, **kwargs
                )
            else:
                result = self._generate(messages, stop=stop, **kwargs)

        for idx, generation in enumerate(result.generations):
            if run_manager and generation.message.id is None:
                generation.message.id = f"run-{run_manager.run_id}-{idx}"
            generation.message.response_metadata = _gen_info_and_msg_metadata(
                generation
            )
            generation.message.response_metadata['source'] = 'llm'
        if len(result.generations) == 1 and result.llm_output is not None:
            result.generations[0].message.response_metadata = {
                **result.llm_output,
                **result.generations[0].message.response_metadata,
            }
        if check_cache and llm_cache:
            llm_cache.update(prompt, llm_string, result.generations)
        return result

Modify the `_agenerate_with_cache` Method

async def _agenerate_with_cache(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        if isinstance(self.cache, BaseCache):
            llm_cache = self.cache
        else:
            llm_cache = get_llm_cache()
        check_cache = self.cache or self.cache is None
        if check_cache:
            if llm_cache:
                llm_string = self._get_llm_string(stop=stop, **kwargs)
                prompt = dumps(messages)
                cache_val = await llm_cache.alookup(prompt, llm_string)
                if isinstance(cache_val, list):
                    for generation in cache_val:
                        generation.message.response_metadata['source'] = 'cache'
                    return ChatResult(generations=cache_val)
            elif self.cache is None:
                pass
            else:
                raise ValueError(
                    "Asked to cache, but no cache found at `langchain.cache`."
                )
        if (
            type(self)._astream != BaseChatModel._astream
            or type(self)._stream != BaseChatModel._stream
        ) and kwargs.pop(
            "stream",
            (
                next(
                    (
                        True
                        for h in run_manager.handlers
                        if isinstance(h, _StreamingCallbackHandler)
                    ),
                    False,
                )
                if run_manager
                else False
            ),
        ):
            chunks: List[ChatGenerationChunk] = []
            async for chunk in self._astream(messages, stop=stop, **kwargs):
                chunk.message.response_metadata = _gen_info_and_msg_metadata(chunk)
                if run_manager:
                    if chunk.message.id is None:
                        chunk.message.id = f"run-{run_manager.run_id}"
                    await run_manager.on_llm_new_token(
                        cast(str, chunk.message.content), chunk=chunk
                    )
                chunks.append(chunk)
            result = generate_from_stream(iter(chunks))
        else:
            if inspect.signature(self._agenerate).parameters.get("run_manager"):
                result = await self._agenerate(
                    messages, stop=stop, run_manager=run_manager, **kwargs
                )
            else:
                result = await self._agenerate(messages, stop=stop, **kwargs)

        for idx, generation in enumerate(result.generations):
            if run_manager and generation.message.id is None:
                generation.message.id = f"run-{run_manager.run_id}-{idx}"
            generation.message.response_metadata = _gen_info_and_msg_metadata(
                generation
            )
            generation.message.response_metadata['source'] = 'llm'
        if len(result.generations) == 1 and result.llm_output is not None:
            result.generations[0].message.response_metadata = {
                **result.llm_output,
                **result.generations[0].message.response_metadata,
            }
        if check_cache and llm_cache:
            await llm_cache.aupdate(prompt, llm_string, result.generations)
        return result

In these modifications, the response_metadata dictionary is updated to include a source key, which is set to 'cache' if the response is retrieved from the cache and 'llm' if it is generated by the LLM. This way, you can clearly indicate the source of the response without showing the token usage and cost for cached responses ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

PavelAgurov · 2024-08-06T10:38:50Z

PavelAgurov
Aug 6, 2024
Author

@dosu, looks like I only can fork code, because these methods are in BaseChatModel.

1 reply

dosubot[bot] bot Aug 6, 2024

To address the issue without forking the code, you can subclass BaseChatModel and override the _generate_with_cache and _agenerate_with_cache methods. Here is how you can do it:

Subclassing `BaseChatModel`

from langchain_openai import ChatOpenAI
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_community.cache import SQLiteCache
from langchain.globals import set_llm_cache
from langchain_community.callbacks import get_openai_callback
import datetime

class CustomChatModel(ChatOpenAI):
    def _generate_with_cache(
            self,
            messages: List[BaseMessage],
            stop: Optional[List[str]] = None,
            run_manager: Optional[CallbackManagerForLLMRun] = None,
            **kwargs: Any,
        ) -> ChatResult:
        if isinstance(self.cache, BaseCache):
            llm_cache = self.cache
        else:
            llm_cache = get_llm_cache()
        check_cache = self.cache or self.cache is None
        if check_cache:
            if llm_cache:
                llm_string = self._get_llm_string(stop=stop, **kwargs)
                prompt = dumps(messages)
                cache_val = llm_cache.lookup(prompt, llm_string)
                if isinstance(cache_val, list):
                    for generation in cache_val:
                        generation.message.response_metadata['source'] = 'cache'
                    return ChatResult(generations=cache_val)
            elif self.cache is None:
                pass
            else:
                raise ValueError("Asked to cache, but no cache found at `langchain.cache`.")
        result = super()._generate_with_cache(messages, stop=stop, run_manager=run_manager, **kwargs)
        for generation in result.generations:
            generation.message.response_metadata['source'] = 'llm'
        if check_cache and llm_cache:
            llm_cache.update(prompt, llm_string, result.generations)
        return result

    async def _agenerate_with_cache(
            self,
            messages: List[BaseMessage],
            stop: Optional[List[str]] = None,
            run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
            **kwargs: Any,
        ) -> ChatResult:
        if isinstance(self.cache, BaseCache):
            llm_cache = self.cache
        else:
            llm_cache = get_llm_cache()
        check_cache = self.cache or self.cache is None
        if check_cache:
            if llm_cache:
                llm_string = self._get_llm_string(stop=stop, **kwargs)
                prompt = dumps(messages)
                cache_val = await llm_cache.alookup(prompt, llm_string)
                if isinstance(cache_val, list):
                    for generation in cache_val:
                        generation.message.response_metadata['source'] = 'cache'
                    return ChatResult(generations=cache_val)
            elif self.cache is None:
                pass
            else:
                raise ValueError("Asked to cache, but no cache found at `langchain.cache`.")
        result = await super()._agenerate_with_cache(messages, stop=stop, run_manager=run_manager, **kwargs)
        for generation in result.generations:
            generation.message.response_metadata['source'] = 'llm'
        if check_cache and llm_cache:
            await llm_cache.aupdate(prompt, llm_string, result.generations)
        return result

# Set the global cache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))

llm = CustomChatModel(
    model_name="gpt-4o-mini",
    max_tokens=3500,
    temperature=0,
    verbose=False
)

start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

# Run the same query again to check if it hits the cache
start_time = datetime.datetime.now()
with get_openai_callback() as cb:
    response = llm.invoke("Tell me a joke!")
end_time = datetime.datetime.now()

print(response)
print(f"LLM Time: {end_time - start_time}")
print(f"Tokens used: {cb.prompt_tokens=} {cb.completion_tokens=} [{cb.total_cost=:.6f}]")

By subclassing ChatOpenAI and overriding the _generate_with_cache and _agenerate_with_cache methods, you can add the source metadata to indicate whether the response was retrieved from the cache or generated by the LLM ^[1].

To continue talking to Dosu, mention @dosu.

markNZed · 2024-11-15T12:37:52Z

markNZed
Nov 15, 2024

How can we know whether the cache was used and which cache was used?

0 replies

set_llm_cache doesn't work #25077

Uh oh!

PavelAgurov Aug 5, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Packages not installed (Not Necessarily a Problem)

Replies: 8 comments · 6 replies

Uh oh!

dosubot[bot] bot Aug 5, 2024

Example Code with SQLite Cache

Key Points to Check

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

dosubot[bot] bot Aug 6, 2024

Enable Verbose Mode

Enable Debug Mode

Uh oh!

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

dosubot[bot] bot Aug 6, 2024

Uh oh!

JoyboyBrian Aug 6, 2024

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

JoyboyBrian Aug 6, 2024

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

dosubot[bot] bot Aug 6, 2024

Modify the _generate_with_cache Method

Modify the _agenerate_with_cache Method

Uh oh!

PavelAgurov Aug 6, 2024 Author

Uh oh!

dosubot[bot] bot Aug 6, 2024

Subclassing BaseChatModel

Uh oh!

markNZed Nov 15, 2024

PavelAgurov
Aug 5, 2024

Replies: 8 comments 6 replies

dosubot[bot]
bot Aug 5, 2024

PavelAgurov
Aug 6, 2024
Author

PavelAgurov
Aug 6, 2024
Author

PavelAgurov
Aug 6, 2024
Author

PavelAgurov
Aug 6, 2024
Author

PavelAgurov
Aug 6, 2024
Author

Modify the `_generate_with_cache` Method

Modify the `_agenerate_with_cache` Method

PavelAgurov
Aug 6, 2024
Author

Subclassing `BaseChatModel`

markNZed
Nov 15, 2024