Getting the documents used by an Agent #26227
Replies: 1 comment 3 replies
-
To access the document metadata used by an agent in the middle of the processing when using LangChain, you can use the from operator import itemgetter
from langchain.schema.runnable import RunnableMap
rag_chain_from_docs = (
{
"context": lambda input: format_docs(input["documents"]),
"question": itemgetter("question"),
}
| prompt
| llm
| StrOutputParser()
)
rag_chain_with_source = RunnableMap(
{"documents": retriever, "question": RunnablePassthrough()}
) | {
"documents": lambda input: [doc.metadata for doc in input["documents"]],
"answer": rag_chain_from_docs,
}
rag_chain_with_source.invoke("<your question>") This code will return the retrieved documents or certain source metadata from the documents along with the answer to your question. Additionally, if you are using a model that includes response metadata, you can access this metadata via the from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4-turbo")
msg = llm.invoke([("human", "What's the oldest known example of cuneiform")])
print(msg.response_metadata) This approach can be adapted for other model providers such as Anthropic, Google VertexAI, Bedrock, MistralAI, Groq, TogetherAI, and FireworksAI by using their respective libraries and models. For example, when using FireworksAI, you can set up the agent and retriever as follows: from langchain_community.chat_models.fireworks import ChatFireworks
from langchain_core.retrievers import BaseRetriever
from langchain_core.documents import Document
class ArxivRetriever(BaseRetriever):
def _get_relevant_documents(self, query: str, *, run_manager):
# Implementation to retrieve documents from Arxiv
pass
arxiv_tool = create_retriever_tool(ArxivRetriever(), "arxiv", "Arxiv retriever description")
tools = [arxiv_tool]
llm = ChatFireworks(model="accounts/fireworks/models/mixtral-8x7b-instruct", model_kwargs={"temperature": 0, "max_tokens": 2048, "top_p": 1}, cache=True)
prompt = hub.pull("hwchase17/react-json").partial(tools=render_text_description(tools), tool_names=", ".join([t.name for t in tools]))
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_log_to_str(x["intermediate_steps"]),
}
| prompt
| llm.bind(stop=["\nObservation"])
| ReActJsonSingleInputOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True).with_types(input_type=InputType) This setup allows you to retrieve and process document metadata during the agent's execution. For a more detailed example, you can refer to the LangChain documentation on citations, which demonstrates how to create a chain that returns both the answer and the retrieved documents: from operator import itemgetter
from typing import List
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import (
RunnableLambda,
RunnableParallel,
RunnablePassthrough,
)
def format_docs(docs: List[Document]) -> str:
formatted = [
f"Article Title: {doc.metadata['title']}\nArticle Snippet: {doc.page_content}"
for doc in docs
]
return "\n\n" + "\n\n".join(formatted)
format = itemgetter("docs") | RunnableLambda(format_docs)
answer = prompt | llm | StrOutputParser()
chain = (
RunnableParallel(question=RunnablePassthrough(), docs=wiki)
.assign(context=format)
.assign(answer=answer)
.pick(["answer", "docs"])
)
chain.invoke("How fast are cheetahs?") This example shows how to format the retrieved documents and include their metadata in the processing chain [1][2][3][4]. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
The code above works flawlessly, when I invoke the agent that uses the
model
above, it "reads" the two example webpages, creates documents for them, and searches them. However, I need to know which documents were used by the agent so I can know what URL(s) was(were) used. If I run something likevectordb.similarity_search('gifts for boys')
I get the documents with the metadata I want:Document(metadata={'source': 'https://www.example1.com/gifts', 'title': '22 Best Gifts for Teens, According to Experts (and Teens).', 'description': 'Find the best Christmas...
However, if I run the agent, I don't have access to documents, I have only the final answer from the agent.
The code above is simplified, I am using LangGraph, but the question is the same, how do I access the document metadata in the "middle" of the processing? I tried using callbacks, but I had no luck and it doesn't seem to be the right solution.
I spent days on this, reading everything I could, but I couldn't figure out how to do it.
System Info
Langchain version:
Platform: Mac
Python version: 3.11.8
Beta Was this translation helpful? Give feedback.
All reactions