RAG on different documents at different times: usage of "old" non-Chat LLM classes? #31562

raffaem · 2025-06-11T06:46:28Z

raffaem
Jun 11, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

for pdf_file in pdf_files:
    
    # Use PyMuPDF
    logging.info("Reading PDF")
    loader = PyMuPDFLoader(
        pdf_file,
        mode="single",
        # images_inner_format="markdown-img",
        # images_parser=LLMImageBlobParser(model=llm),
        # extract_tables="markdown",
    )
    docs = loader.load()

    # Split document
    logging.info("Splitting PDF")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, chunk_overlap=200
    )
    all_splits = text_splitter.split_documents(docs)

    # Index chunks
    logging.info("Indexing PDF")
    assert vector_store.delete() # delete all documents
    _ = vector_store.add_documents(documents=all_splits)

    # Define prompt for question-answering
    # N.B. for non-US LangSmith endpoints, you may need to specify
    # api_url="https://api.smith.langchain.com" in hub.pull.
    prompt = hub.pull("rlm/rag-prompt")

    # Define state for application
    class State(TypedDict):
        question: str
        context: List[Document]
        answer: str

    # Define application steps
    def retrieve(state: State):
        retrieved_docs = vector_store.similarity_search(state["question"])
        return {"context": retrieved_docs}

    def generate(state: State):
        docs_content = "\n\n".join(doc.page_content for doc in state["context"])
        messages = prompt.invoke({"question": state["question"], "context": docs_content})
        response = llm.invoke(messages)
        return {"answer": response.content}

    # Compile application and test
    graph_builder = StateGraph(State).add_sequence([retrieve, generate])
    graph_builder.add_edge(START, "retrieve")
    graph = graph_builder.compile()

    resp = graph.invoke({"question": question})

Description

Ollama has two endpoints: /api/chat and /api/generate.

As stated in this Ollama gitHub issue:

The /api/chat endpoint takes a history of messages and provides the next message in the conversation. This is ideal for conversations with history.

The /api/generate API provides a one-time completion based on the input.

So basically /api/chat remebers history, while /api/generate does not.

LangChain has two interfaces for Ollama: ChatOllama and OllamaLLM.

It is easy to show that the former calls the /api/chat endpoint:

llm = ChatOllama(model="deepseek-r1", temperature=0.0)
llm.invoke("What is the meaning of life?")
# logs HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"

While the latter calls the /api/generate endpoint:

llm = OllamaLLM(model="gemma3:1b-it-qat", temperature=0.0)
llm.invoke("What is the meaning of life?")
# POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"

The LangChain documentation discourages the use of the latter model:

However, LangChain also has implementations of older LLMs that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output. These models are typically named without the "Chat" prefix (e.g., Ollama, Anthropic, OpenAI, etc.). These models implement the BaseLLM interface and may be named with the "LLM" suffix (e.g., OllamaLLM, AnthropicLLM, OpenAILLM, etc.). Generally, users should not use these models.

Even the RAG tutorial uses a chat model.

I'm building a RAG where I ask the same question for different documents.

So I loop over the documents, each time read the document, clean the vector store, put the document in the vector store, ask the question, and repeat for another document.

However, if I'm asking a question about document B, I don't want the answer to be tainted by the previous answer about document A. That's why I clean the vector store before adding the new document.

But regarding the LLM side, I feel the "traditional" LLM model (i.e. the /api/generate endpoint) would be better, since the /api/chat endpoint would somehow "remember" the previous ocnversations and so the previous document.

What am I missing?

System Info

System Information

OS: Windows
OS Version: 10.0.26100
Python Version: 3.12.10 (main, Apr 9 2025, 04:06:22) [MSC v.1943 64 bit (AMD64)]

Package Information

langchain_core: 0.3.63
langchain: 0.3.25
langchain_community: 0.3.24
langsmith: 0.3.43
langchain_huggingface: 0.2.0
langchain_ollama: 0.3.3
langchain_text_splitters: 0.3.8
langgraph_sdk: 0.1.70

Optional packages not installed

langserve

Other Dependencies

aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
httpx>=0.25.2: Installed. No version info available.
huggingface-hub>=0.30.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
langchain-core<1.0.0,>=0.3.58: Installed. No version info available.
langchain-core<1.0.0,>=0.3.59: Installed. No version info available.
langchain-core<1.0.0,>=0.3.60: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.25: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.126: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy>=1.26.2;: Installed. No version info available.
numpy>=2.1.0;: Installed. No version info available.
ollama<1.0.0,>=0.4.8: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.33.1
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.33.1
orjson: 3.10.18
orjson>=3.10.1: Installed. No version info available.
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.11.5
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 14.0.0
sentence-transformers>=2.6.0: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers>=0.19.1: Installed. No version info available.
transformers>=4.39.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RAG on different documents at different times: usage of "old" non-Chat LLM classes? #31562

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

RAG on different documents at different times: usage of "old" non-Chat LLM classes? #31562

Uh oh!

raffaem Jun 11, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

Replies: 0 comments

raffaem
Jun 11, 2025