Skip to content

Commit 18757ab

Browse files
vgvolegazevaykinanton-bobkov
authored andcommitted
langhain integration docs (#17160)
Co-authored-by: azevaykin <145343289+azevaykin@users.noreply.github.com> Co-authored-by: anton-bobkov <anton-bobkov@ydb.tech>
1 parent fac5e1c commit 18757ab

File tree

12 files changed

+557
-1
lines changed

12 files changed

+557
-1
lines changed

ydb/docs/en/core/integrations/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ In addition to its own native protocol, {{ ydb-name }} has a compatibility layer
3737

3838
{% include notitle [Table of contents](orm/_includes/toc-table.md) %}
3939

40+
## Vector search {#vectorsearch}
41+
42+
{% include notitle [Table of contents](vectorsearch/_includes/toc-table.md) %}
43+
4044
## See also
4145

4246
* [{#T}](../reference/ydb-sdk/index.md)

ydb/docs/en/core/integrations/toc_i.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,9 @@ items:
2828
href: orm/index.md
2929
include:
3030
mode: link
31-
path: orm/toc-orm.yaml
31+
path: orm/toc-orm.yaml
32+
- name: Vector search
33+
href: vectorsearch/index.md
34+
include:
35+
mode: link
36+
path: vectorsearch/toc-vectorsearch.yaml
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Vector search
2+
3+
| System | Instruction |
4+
| --- | --- |
5+
| [LangChain](https://python.langchain.com/docs/introduction/) | [Instruction](../langchain.md) |
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Vector search
2+
3+
{% include notitle [Table of contents](_includes/toc-table.md) %}
Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# LangChain
2+
3+
Integration of {{ ydb-short-name }} with [langchain](https://python.langchain.com/docs/introduction/) enables the use of {{ ydb-short-name }} as a [vector store](https://python.langchain.com/docs/concepts/vectorstores/) for [RAG](https://python.langchain.com/docs/concepts/rag/) applications.
4+
5+
This integration allows developers to efficiently manage, query, and retrieve vectorized data, which is fundamental for modern applications involving natural language processing, search, and data analysis. By leveraging embedding models, users can create sophisticated systems that understand and retrieve information based on semantic similarity.
6+
7+
## Setup {#setup}
8+
9+
To use this integration, install the following software:
10+
11+
- `langchain-ydb`
12+
13+
To install `langchain-ydb`, run the following command:
14+
15+
```shell
16+
pip install -qU langchain-ydb
17+
```
18+
- embedding model
19+
20+
This tutorial uses `HuggingFaceEmbeddings`. To install this package, run the following command:
21+
22+
```shell
23+
pip install -qU langchain-huggingface
24+
```
25+
26+
- Local {{ ydb-short-name }}
27+
28+
For more information, see [{#T}](../../quickstart.md#install).
29+
30+
## Initialization {#initialization}
31+
32+
Creating a {{ ydb-short-name }} vector store requires specifying an embedding model. In this instance, `HuggingFaceEmbeddings` is used:
33+
34+
```python
35+
from langchain_huggingface import HuggingFaceEmbeddings
36+
37+
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
38+
```
39+
40+
Once the embedding model is created, the {{ ydb-short-name }} vector store can be initiated:
41+
42+
```python
43+
from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings
44+
45+
settings = YDBSettings(
46+
host="localhost",
47+
port=2136,
48+
database="/local",
49+
table="ydb_example",
50+
strategy=YDBSearchStrategy.COSINE_SIMILARITY,
51+
)
52+
vector_store = YDB(embeddings, config=settings)
53+
```
54+
55+
## Manage Vector Store {#manage_vector_store}
56+
57+
After the vector store has been established, you can start adding and removing items from the store.
58+
59+
### Add items to vector store {#add_items_to_vector_store}
60+
61+
The following code prepares the documents:
62+
63+
```python
64+
from uuid import uuid4
65+
66+
from langchain_core.documents import Document
67+
68+
document_1 = Document(
69+
page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
70+
metadata={"source": "tweet"},
71+
)
72+
73+
document_2 = Document(
74+
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
75+
metadata={"source": "news"},
76+
)
77+
78+
document_3 = Document(
79+
page_content="Building an exciting new project with LangChain - come check it out!",
80+
metadata={"source": "tweet"},
81+
)
82+
83+
document_4 = Document(
84+
page_content="Robbers broke into the city bank and stole $1 million in cash.",
85+
metadata={"source": "news"},
86+
)
87+
88+
document_5 = Document(
89+
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
90+
metadata={"source": "tweet"},
91+
)
92+
93+
document_6 = Document(
94+
page_content="Is the new iPhone worth the price? Read this review to find out.",
95+
metadata={"source": "website"},
96+
)
97+
98+
document_7 = Document(
99+
page_content="The top 10 soccer players in the world right now.",
100+
metadata={"source": "website"},
101+
)
102+
103+
document_8 = Document(
104+
page_content="LangGraph is the best framework for building stateful, agentic applications!",
105+
metadata={"source": "tweet"},
106+
)
107+
108+
document_9 = Document(
109+
page_content="The stock market is down 500 points today due to fears of a recession.",
110+
metadata={"source": "news"},
111+
)
112+
113+
document_10 = Document(
114+
page_content="I have a bad feeling I am going to get deleted :(",
115+
metadata={"source": "tweet"},
116+
)
117+
118+
documents = [
119+
document_1,
120+
document_2,
121+
document_3,
122+
document_4,
123+
document_5,
124+
document_6,
125+
document_7,
126+
document_8,
127+
document_9,
128+
document_10,
129+
]
130+
uuids = [str(uuid4()) for _ in range(len(documents))]
131+
```
132+
133+
Items are added to the vector store using the `add_documents` function.
134+
135+
```python
136+
vector_store.add_documents(documents=documents, ids=uuids)
137+
```
138+
139+
Output:
140+
141+
```shell
142+
Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]
143+
['947be6aa-d489-44c5-910e-62e4d58d2ffb',
144+
'7a62904d-9db3-412b-83b6-f01b34dd7de3',
145+
'e5a49c64-c985-4ed7-ac58-5ffa31ade699',
146+
'99cf4104-36ab-4bd5-b0da-e210d260e512',
147+
'5810bcd0-b46e-443e-a663-e888c9e028d1',
148+
'190c193d-844e-4dbb-9a4b-b8f5f16cfae6',
149+
'f8912944-f80a-4178-954e-4595bf59e341',
150+
'34fc7b09-6000-42c9-95f7-7d49f430b904',
151+
'0f6b6783-f300-4a4d-bb04-8025c4dfd409',
152+
'46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']
153+
```
154+
155+
### Delete items from vector store {#delete_items_from_vector_store}
156+
157+
To delete items from the vector store by ID, use the `delete` function:
158+
159+
```python
160+
vector_store.delete(ids=[uuids[-1]])
161+
```
162+
163+
Output:
164+
165+
```shell
166+
True
167+
```
168+
169+
## Query Vector Store {#query_vector_store}
170+
171+
After establishing the vector store and adding relevant documents, you can query the store during chain or agent execution.
172+
173+
### Query directly {#query_directly}
174+
175+
#### Similarity search
176+
177+
A simple similarity search can be performed as follows:
178+
179+
```python
180+
results = vector_store.similarity_search(
181+
"LangChain provides abstractions to make working with LLMs easy", k=2
182+
)
183+
for res in results:
184+
print(f"* {res.page_content} [{res.metadata}]")
185+
```
186+
187+
Output:
188+
189+
```shell
190+
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
191+
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
192+
```
193+
194+
#### Similarity search with score
195+
196+
To perform a similarity search with score, use the following code:
197+
198+
```python
199+
results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=3)
200+
for res, score in results:
201+
print(f"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]")
202+
```
203+
204+
Output:
205+
206+
```shell
207+
* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]
208+
* [SIM=0.212] I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
209+
* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
210+
```
211+
212+
### Filtering {#filtering}
213+
214+
Searching with filters is performed as described below:
215+
216+
```python
217+
results = vector_store.similarity_search_with_score(
218+
"What did I eat for breakfast?",
219+
k=4,
220+
filter={"source": "tweet"},
221+
)
222+
for res, _ in results:
223+
print(f"* {res.page_content} [{res.metadata}]")
224+
```
225+
226+
Output:
227+
228+
```shell
229+
* I had chocalate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
230+
* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
231+
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
232+
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]
233+
```
234+
235+
236+
### Query by turning into retriever {#query_by_turning_into_retriever}
237+
238+
The vector store can also be transformed into a retriever for easier use in chains.
239+
240+
Here's how to transform the vector store into a retriever and invoke it with a simple query and filter.
241+
242+
```python
243+
retriever = vector_store.as_retriever(
244+
search_kwargs={"k": 2},
245+
)
246+
results = retriever.invoke(
247+
"Stealing from the bank is a crime", filter={"source": "news"}
248+
)
249+
for res in results:
250+
print(f"* {res.page_content} [{res.metadata}]")
251+
```
252+
253+
Output:
254+
255+
```shell
256+
* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]
257+
* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]
258+
```
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
items:
2+
- name: LangChain
3+
href: langchain.md

ydb/docs/ru/core/integrations/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@
3939

4040
{% include notitle [Содержание](orm/_includes/toc-table.md) %}
4141

42+
## Векторный поиск {#vectorsearch}
43+
44+
{% include notitle [Содержание](vectorsearch/_includes/toc-table.md) %}
45+
4246
## Смотрите также
4347

4448
* [{#T}](../reference/ydb-sdk/index.md)

ydb/docs/ru/core/integrations/toc_i.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,8 @@ items:
2929
include:
3030
mode: link
3131
path: orm/toc-orm.yaml
32+
- name: Векторный поиск
33+
href: vectorsearch/index.md
34+
include:
35+
mode: link
36+
path: vectorsearch/toc-vectorsearch.yaml
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Векторный поиск
2+
3+
| Инструмент | Инструкция |
4+
| --- | --- |
5+
| [LangChain](https://python.langchain.com/docs/introduction/) | [Инструкция](../langchain.md) |
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Векторный поиск
2+
3+
{% include notitle [Содержание](_includes/toc-table.md) %}
4+

0 commit comments

Comments
 (0)