You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is easy to show that the former calls the /api/chat endpoint:
llm = ChatOllama(model="deepseek-r1", temperature=0.0)
llm.invoke("What is the meaning of life?")
# logs HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"
While the latter calls the /api/generate endpoint:
llm = OllamaLLM(model="gemma3:1b-it-qat", temperature=0.0)
llm.invoke("What is the meaning of life?")
# POST http://127.0.0.1:11434/api/generate "HTTP/1.1 200 OK"
However, LangChain also has implementations of older LLMs that do not follow the chat model interface and instead use an interface that takes a string as input and returns a string as output. These models are typically named without the "Chat" prefix (e.g., Ollama, Anthropic, OpenAI, etc.). These models implement the BaseLLM interface and may be named with the "LLM" suffix (e.g., OllamaLLM, AnthropicLLM, OpenAILLM, etc.). Generally, users should not use these models.
I'm building a RAG where I ask the same question for different documents.
So I loop over the documents, each time read the document, clean the vector store, put the document in the vector store, ask the question, and repeat for another document.
However, if I'm asking a question about document B, I don't want the answer to be tainted by the previous answer about document A. That's why I clean the vector store before adding the new document.
But regarding the LLM side, I feel the "traditional" LLM model (i.e. the /api/generate endpoint) would be better, since the /api/chat endpoint would somehow "remember" the previous ocnversations and so the previous document.
What am I missing?
System Info
System Information
OS: Windows
OS Version: 10.0.26100
Python Version: 3.12.10 (main, Apr 9 2025, 04:06:22) [MSC v.1943 64 bit (AMD64)]
aiohttp<4.0.0,>=3.8.3: Installed. No version info available.
async-timeout<5.0.0,>=4.0.0;: Installed. No version info available.
dataclasses-json<0.7,>=0.5.7: Installed. No version info available.
httpx: 0.28.1
httpx-sse<1.0.0,>=0.4.0: Installed. No version info available.
httpx>=0.25.2: Installed. No version info available.
huggingface-hub>=0.30.2: Installed. No version info available.
jsonpatch<2.0,>=1.33: Installed. No version info available.
langchain-anthropic;: Installed. No version info available.
langchain-aws;: Installed. No version info available.
langchain-azure-ai;: Installed. No version info available.
langchain-cohere;: Installed. No version info available.
langchain-community;: Installed. No version info available.
langchain-core<1.0.0,>=0.3.51: Installed. No version info available.
langchain-core<1.0.0,>=0.3.58: Installed. No version info available.
langchain-core<1.0.0,>=0.3.59: Installed. No version info available.
langchain-core<1.0.0,>=0.3.60: Installed. No version info available.
langchain-deepseek;: Installed. No version info available.
langchain-fireworks;: Installed. No version info available.
langchain-google-genai;: Installed. No version info available.
langchain-google-vertexai;: Installed. No version info available.
langchain-groq;: Installed. No version info available.
langchain-huggingface;: Installed. No version info available.
langchain-mistralai;: Installed. No version info available.
langchain-ollama;: Installed. No version info available.
langchain-openai;: Installed. No version info available.
langchain-perplexity;: Installed. No version info available.
langchain-text-splitters<1.0.0,>=0.3.8: Installed. No version info available.
langchain-together;: Installed. No version info available.
langchain-xai;: Installed. No version info available.
langchain<1.0.0,>=0.3.25: Installed. No version info available.
langsmith-pyo3: Installed. No version info available.
langsmith<0.4,>=0.1.125: Installed. No version info available.
langsmith<0.4,>=0.1.126: Installed. No version info available.
langsmith<0.4,>=0.1.17: Installed. No version info available.
numpy>=1.26.2;: Installed. No version info available.
numpy>=2.1.0;: Installed. No version info available.
ollama<1.0.0,>=0.4.8: Installed. No version info available.
openai-agents: Installed. No version info available.
opentelemetry-api: 1.33.1
opentelemetry-exporter-otlp-proto-http: Installed. No version info available.
opentelemetry-sdk: 1.33.1
orjson: 3.10.18
orjson>=3.10.1: Installed. No version info available.
packaging: 24.2
packaging<25,>=23.2: Installed. No version info available.
pydantic: 2.11.5
pydantic-settings<3.0.0,>=2.4.0: Installed. No version info available.
pydantic<3.0.0,>=2.7.4: Installed. No version info available.
pydantic>=2.7.4: Installed. No version info available.
pytest: Installed. No version info available.
PyYAML>=5.3: Installed. No version info available.
requests: 2.32.3
requests-toolbelt: 1.0.0
requests<3,>=2: Installed. No version info available.
rich: 14.0.0
sentence-transformers>=2.6.0: Installed. No version info available.
SQLAlchemy<3,>=1.4: Installed. No version info available.
tenacity!=8.4.0,<10,>=8.1.0: Installed. No version info available.
tenacity!=8.4.0,<10.0.0,>=8.1.0: Installed. No version info available.
tokenizers>=0.19.1: Installed. No version info available.
transformers>=4.39.0: Installed. No version info available.
typing-extensions>=4.7: Installed. No version info available.
zstandard: 0.23.0
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
Ollama has two endpoints:
/api/chat
and/api/generate
.As stated in this Ollama gitHub issue:
So basically
/api/chat
remebers history, while/api/generate
does not.LangChain has two interfaces for Ollama: ChatOllama and OllamaLLM.
It is easy to show that the former calls the
/api/chat
endpoint:While the latter calls the
/api/generate
endpoint:The LangChain documentation discourages the use of the latter model:
Even the RAG tutorial uses a chat model.
I'm building a RAG where I ask the same question for different documents.
So I loop over the documents, each time read the document, clean the vector store, put the document in the vector store, ask the question, and repeat for another document.
However, if I'm asking a question about document B, I don't want the answer to be tainted by the previous answer about document A. That's why I clean the vector store before adding the new document.
But regarding the LLM side, I feel the "traditional" LLM model (i.e. the
/api/generate
endpoint) would be better, since the/api/chat
endpoint would somehow "remember" the previous ocnversations and so the previous document.What am I missing?
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions