2
2
3
3
Use TiDB Vector Search with Python.
4
4
5
- ## Installation
6
-
7
- ``` bash
8
- pip install tidb-vector
9
- ```
10
-
11
5
## Usage
12
6
13
7
TiDB is a SQL database so that this package introduces Vector Search capability for Python ORMs:
14
8
15
9
- [ #SQLAlchemy] ( #sqlalchemy )
16
- - [ #Django] ( #django )
17
10
- [ #Peewee] ( #peewee )
11
+ - [ #Django] ( #django )
18
12
19
13
Pick one that you are familiar with to get started. If you are not using any of them, we recommend [ #SQLAlchemy] ( #sqlalchemy ) .
20
14
@@ -24,13 +18,18 @@ We also provide a Vector Search client for simple usage:
24
18
25
19
### SQLAlchemy
26
20
21
+ Install:
22
+
27
23
``` bash
28
24
pip install tidb-vector sqlalchemy pymysql
29
25
```
30
26
27
+ Usage:
28
+
31
29
``` python
32
- from sqlalchemy import Integer, Text, Column
30
+ from sqlalchemy import Integer, Column
33
31
from sqlalchemy import create_engine, select
32
+ from sqlalchemy.dialects.mysql import LONGTEXT
34
33
from sqlalchemy.orm import Session, declarative_base
35
34
36
35
import tidb_vector
@@ -44,15 +43,15 @@ Base = declarative_base()
44
43
class Doc (Base ):
45
44
__tablename__ = " doc"
46
45
id = Column(Integer, primary_key = True )
47
- embedding = Column(VectorType(3 )) # Vector with 3 dimensions
48
- content = Column(Text )
46
+ embedding = Column(VectorType(dim = 3 ))
47
+ content = Column(LONGTEXT )
49
48
50
49
51
50
# Create empty table
52
51
Base.metadata.drop_all(engine) # clean data from last run
53
52
Base.metadata.create_all(engine)
54
53
55
- # Create index using L2 distance
54
+ # Create index for L2 distance
56
55
adaptor = VectorAdaptor(engine)
57
56
adaptor.create_vector_index(
58
57
Doc.embedding, tidb_vector.DistanceMetric.L2, skip_existing = True
@@ -69,7 +68,7 @@ with Session(engine) as session:
69
68
with Session(engine) as session:
70
69
results = session.execute(
71
70
select(Doc.id, Doc.content)
72
- .order_by(Doc.embedding.cosine_distance ([1 , 2 , 3 ]))
71
+ .order_by(Doc.embedding.l2_distance ([1 , 2 , 3 ]))
73
72
.limit(1 )
74
73
).all()
75
74
print (results)
@@ -78,16 +77,90 @@ with Session(engine) as session:
78
77
with Session(engine) as session:
79
78
results = session.execute(
80
79
select(Doc.id, Doc.content)
81
- .where(Doc.id > 2 )
82
- .order_by(Doc.embedding.cosine_distance ([1 , 2 , 3 ]))
80
+ .where(Doc.content == " dog " )
81
+ .order_by(Doc.embedding.l2_distance ([1 , 2 , 3 ]))
83
82
.limit(1 )
84
83
).all()
85
84
print (results)
86
85
```
87
86
88
87
### Django
89
88
90
- To use vector field in Django, you need to use [ ` django-tidb ` ] ( https://github.com/pingcap/django-tidb?tab=readme-ov-file#vector-beta ) .
89
+ > [ !TIP]
90
+ >
91
+ > Django is a full-featured web framework, not just an ORM. The following usage introducutions are provided for existing Django users.
92
+ >
93
+ > For new users to get started, consider using SQLAlchemy or Peewee.
94
+
95
+ Install:
96
+
97
+ ``` bash
98
+ pip install ' django-tidb[vector]~=5.0.0' ' django~=5.0.0' mysqlclient
99
+ ```
100
+
101
+ Usage:
102
+
103
+ 1\. Configure ` django_tidb ` as engine, like:
104
+
105
+ ``` python
106
+ DATABASES = {
107
+ ' default' : {
108
+ ' ENGINE' : ' django_tidb' ,
109
+ ' NAME' : ' django' ,
110
+ ' USER' : ' root' ,
111
+ ' PASSWORD' : ' ' ,
112
+ ' HOST' : ' 127.0.0.1' ,
113
+ ' PORT' : 4000 ,
114
+ },
115
+ }
116
+ ```
117
+
118
+ 2\. Define a model with a vector field and vector index:
119
+
120
+ ``` python
121
+ from django.db import models
122
+ from django_tidb.fields.vector import VectorField, VectorIndex, L2Distance
123
+
124
+ class Doc (models .Model ):
125
+ id = models.IntegerField(primary_key = True )
126
+ embedding = VectorField(dimensions = 3 )
127
+ content = models.TextField()
128
+ class Meta :
129
+ indexes = [VectorIndex(L2Distance(" embedding" ), name = " idx" )]
130
+ ```
131
+
132
+ 3\. Insert data:
133
+
134
+ ``` python
135
+ Doc.objects.create(id = 1 , content = " dog" , embedding = [1 , 2 , 1 ])
136
+ Doc.objects.create(id = 2 , content = " fish" , embedding = [1 , 2 , 4 ])
137
+ Doc.objects.create(id = 3 , content = " tree" , embedding = [1 , 0 , 0 ])
138
+ ```
139
+
140
+ 4\. Perform Vector Search for Top K=1:
141
+
142
+ ``` python
143
+ queryset = (
144
+ Doc.objects
145
+ .order_by(L2Distance(" embedding" , [1 , 2 , 3 ]))
146
+ .values(" id" , " content" )[:1 ]
147
+ )
148
+ print (queryset)
149
+ ```
150
+
151
+ 5\. Perform filtered Vector Search by adding a Where Clause:
152
+
153
+ ``` python
154
+ queryset = (
155
+ Doc.objects
156
+ .filter(content = " dog" )
157
+ .order_by(L2Distance(" embedding" , [1 , 2 , 3 ]))
158
+ .values(" id" , " content" )[:1 ]
159
+ )
160
+ print (queryset)
161
+ ```
162
+
163
+ For more details, see [ django-tidb] ( https://github.com/pingcap/django-tidb?tab=readme-ov-file#vector-beta ) .
91
164
92
165
### Peewee
93
166
0 commit comments