Skip to content

tobias-gp/pinecone-langchain-hybrid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangChain Retrieval Tool for Pinecone Hybrid Search with Ingest

I couldn't find a lot of supporting information when implementing a hybrid search retriever for the vector database service Pinecone in combination with LangChain. Even if you are not using LangChain, you may use the tool as a standalone version (see below).

I hope that this example helps others to implement a high-quality semantic search! Get in touch to buy me a coffee ;)

Prerequisites

  • A Pinecone index with dimensions 1536 (when using OpenAI text-embedding-3-small) and dotproduct for similarity. You can use the free starter subscription.
  • An OpenAI API key with access to GPT-4. If you are using Azure, you will have to modify the instantiation of OpenAIEmbeddings.

Preparation

  1. Create a new virtual environment or conda environment and install dependencies:
pip install -r requirements.txt
  1. Set environment variables:
export PINECONE_INDEX="index_name"
export PINECONE_API_KEY="your_pinecone_api_key"
export OPENAI_API_KEY="your_openai_api_key"

Ingester

The folder documents already contains an example document. To upload chunks, run:

python -m pinecone_langchain_hybrid.uploader

You can modify the ingester to add additional parsers or folders. Currently, there's only one default parser provided as an example.

Retriever

Importing the Retriever

You can import the retriever in your own project or run it directly.

Running the Retriever

This is just for testing, you can modify the main method and call:

python -m pinecone_langchain_hybrid.retrievers

Using the Retriever in Python

You can use the retriever in Python by importing the corresponding LangChain tool:

from pinecone_langchain_hybrid.retrievers import DocumentsPineconeRetrieverTool

tool = DocumentsPineconeRetrieverTool()
input = {
    "query": "Which features can I use for emotion detection?"
}

documents_prompt = tool.invoke(input=input)
print(documents_prompt)

Alternatively, you can use the retriever directly by importing PineconeRetrieverTool and calling retrieve_documents:

from pinecone_langchain_hybrid.retrievers import DocumentsPineconeRetrieverTool

tool = DocumentsPineconeRetrieverTool(top_k=5, alpha=0.5)
docs = tool.retrieve_documents(query="Which features can I use for emotion detection?")

print(docs)

Contributing

Feel free to submit issues or pull requests if you find any bugs or have feature requests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

LangChain Retrieval Tool for Pinecone Hybrid Search with Ingest

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages