Skip to content

Commit 7c30250

Browse files
Add external vector index support for Pinecone (neo4j#52)
* Added pinecone-client to poetry * Copied weaviate populate_dbs.py to pinecone_e2e * Updated tests/e2e/pinecone_e2e/populate_dbs.py to work with Pinecone * Copied the Weaviate retriever to use as a base for the Pinecone retriever * Added first draft of PineconeNeo4jRetriever * Added input validation to PineconeNeo4jRetriever * Renamed PineconeModel to PineconeClientModel * Added Pinecone retriever unit tests * Added E2E tests for pinecone retriever * Updated pinecone retriever to work with new return types * Updated E2E tests to work with new types * Added __init__.py to E2E tests * Update src/neo4j_genai/retrievers/external/pinecone/pinecone.py Co-authored-by: willtai <william.tai@neo4j.com> * Updated Pinecone retriever with new Exceptions * Added Pinecone example scripts * Moved Pinecone Pydantic types to their own file * Ran import organizer on examples/weaviate/text_search_local_embedder.py * Added docs for PineconeNeo4jRetriever * Pinecone E2E test fix * Pinecone E2E test update * Pincone E2E test fix * Moved populate_neo4j to its own file * Moved populate_neo4j to utils * Updated PineconeNeo4jRetriever import in test_pinecone_e2e.py * Removed copyright from Pinecone and Weaviate examples * Refactored E2E test utils * Small fix to tests/e2e/utils.py * Ran poetry lock --no-update * Small formatting changes * More formatting changes * Moved embedding_biology.py to the examples folder * mypy fixes * Removed unused imports * More mypy fixes * More mypy fixes * Added .venv to .gitignore * Fixed mypy pytest conflict --------- Co-authored-by: willtai <william.tai@neo4j.com>
1 parent f8dc542 commit 7c30250

35 files changed

+1033
-221
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,4 @@ docs/build/
1010
.vscode/
1111
.python-version
1212
.DS_Store
13+
.venv

docs/source/api.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,13 @@ WeaviateNeo4jRetriever
5757
:members:
5858

5959

60+
PineconeNeo4jRetriever
61+
======================
62+
63+
.. autoclass:: neo4j_genai.retrievers.external.pinecone.PineconeNeo4jRetriever
64+
:members:
65+
66+
6067
******
6168
Errors
6269
******

examples/__init__.py

Whitespace-only changes.

examples/pinecone/README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
### Usage Instructions
2+
3+
You will need both a Pinecone vector database and a Neo4j database to use this retriever.
4+
5+
### Writing Test Data
6+
7+
Update `NEO4J_AUTH`, `NEO4J_URL`, and `PC_API_KEY` variables in the `tests/e2e/pinecone_e2e/populate_dbs.py` script then run this from the project root to write test data to both dbs.
8+
9+
```
10+
poetry run python tests/e2e/pinecone_e2e/populate_dbs.py
11+
```
12+
13+
### Install Pinecone client
14+
15+
You need to install the `pinecone-client` package to use this retriever.
16+
17+
```bash
18+
pip install pinecone-client
19+
```
20+
21+
### Search
22+
Update the `NEO4J_AUTH`, `NEO4J_URL`, and `PC_API_KEY` variables in each file then run one of the following from the project root to test the retriever.
23+
24+
```
25+
# Search by vector
26+
poetry run python -m examples.pinecone.vector_search
27+
28+
# Search by text, with embeddings generated locally
29+
poetry run python -m examples.pinecone.text_search
30+
```

examples/pinecone/__init__.py

Whitespace-only changes.

examples/pinecone/text_search.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
from langchain_community.embeddings import HuggingFaceEmbeddings
2+
from neo4j import GraphDatabase
3+
from neo4j_genai.retrievers.external.pinecone import PineconeNeo4jRetriever
4+
from pinecone import Pinecone
5+
6+
NEO4J_AUTH = ("neo4j", "password")
7+
NEO4J_URL = "neo4j://localhost:7687"
8+
PC_API_KEY = "API_KEY"
9+
10+
11+
def main() -> None:
12+
with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
13+
pc_client = Pinecone(PC_API_KEY)
14+
embedder = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
15+
16+
retriever = PineconeNeo4jRetriever(
17+
driver=neo4j_driver,
18+
client=pc_client,
19+
index_name="jeopardy",
20+
id_property_neo4j="id",
21+
embedder=embedder, # type: ignore
22+
)
23+
24+
res = retriever.search(query_text="biology", top_k=2)
25+
print(res)
26+
27+
28+
if __name__ == "__main__":
29+
main()

examples/pinecone/vector_search.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from neo4j import GraphDatabase
2+
from neo4j_genai.retrievers.external.pinecone import PineconeNeo4jRetriever
3+
from pinecone import Pinecone
4+
5+
from examples.embedding_biology import EMBEDDING_BIOLOGY
6+
7+
NEO4J_AUTH = ("neo4j", "password")
8+
NEO4J_URL = "neo4j://localhost:7687"
9+
PC_API_KEY = "API_KEY"
10+
11+
12+
def main() -> None:
13+
with GraphDatabase.driver(NEO4J_URL, auth=NEO4J_AUTH) as neo4j_driver:
14+
pc_client = Pinecone(PC_API_KEY)
15+
retriever = PineconeNeo4jRetriever(
16+
driver=neo4j_driver,
17+
client=pc_client,
18+
index_name="jeopardy",
19+
id_property_neo4j="id",
20+
)
21+
22+
res = retriever.search(query_vector=EMBEDDING_BIOLOGY, top_k=2)
23+
print(res)
24+
25+
26+
if __name__ == "__main__":
27+
main()

examples/weaviate/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@ pip install weaviate-client
2828

2929
```
3030
# search by vector
31-
poetry run python examples/weaviate/vector_search.py
31+
poetry run python -m examples.weaviate.vector_search
3232
3333
# search by text, with embeddings generated locally (via embedder argument)
34-
poetry run python examples/weaviate/text_search_local_embedder.py
34+
poetry run python -m examples.weaviate.text_search_local_embedder
3535
3636
# search by text, with embeddings generated on the Weaviate side, via configured vectorizer
37-
poetry run python examples/weaviate/text_search_remote_embedder.py
37+
poetry run python -m examples.weaviate.text_search_remote_embedder
3838
```

examples/weaviate/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)