@@ -28,10 +28,12 @@ similarity of an embedding generated from some query text with embeddings stored
28
28
or JSON fields, Redis can retrieve documents that closely match the query in terms
29
29
of their meaning.
30
30
31
- In the example below, we use the
31
+ The example below uses the
32
32
[ ` sentence-transformers ` ] ( https://pypi.org/project/sentence-transformers/ )
33
33
library to generate vector embeddings to store and index with
34
- Redis Query Engine.
34
+ Redis Query Engine. The code is first demonstrated for hash documents with a
35
+ separate section to explain the
36
+ [ differences with JSON documents] ( #differences-with-json-documents ) .
35
37
36
38
## Initialize
37
39
@@ -50,6 +52,7 @@ from sentence_transformers import SentenceTransformer
50
52
from redis.commands.search.query import Query
51
53
from redis.commands.search.field import TextField, TagField, VectorField
52
54
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
55
+ from redis.commands.json.path import Path
53
56
54
57
import numpy as np
55
58
import redis
@@ -86,7 +89,7 @@ except redis.exceptions.ResponseError:
86
89
pass
87
90
```
88
91
89
- Next, we create the index.
92
+ Next, create the index.
90
93
The schema in the example below specifies hash objects for storage and includes
91
94
three fields: the text content to index, a
92
95
[ tag] ({{< relref "/develop/interact/search-and-query/advanced-concepts/tags" >}})
@@ -127,10 +130,10 @@ Use the `model.encode()` method of `SentenceTransformer`
127
130
as shown below to create the embedding that represents the ` content ` field.
128
131
The ` astype() ` option that follows the ` model.encode() ` call specifies that
129
132
we want a vector of ` float32 ` values. The ` tobytes() ` option encodes the
130
- vector components together as a single binary string rather than the
131
- default Python list of ` float ` values.
132
- Use the binary string representation when you are indexing hash objects
133
- (as we are here), but use the default list of ` float ` for JSON objects .
133
+ vector components together as a single binary string.
134
+ Use the binary string representation when you are indexing hashes
135
+ or running a query (but use a list of ` float ` for
136
+ [ JSON documents ] ( #differences-with-json-documents ) ) .
134
137
135
138
``` python
136
139
content = " That is a very happy person"
@@ -226,6 +229,116 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
226
229
is the result that is most similar in meaning to the query text
227
230
* "That is a happy person"* .
228
231
232
+ ## Differences with JSON documents
233
+
234
+ Indexing JSON documents is similar to hash indexing, but there are some
235
+ important differences. JSON allows much richer data modelling with nested fields, so
236
+ you must supply a [ path] ({{< relref "/develop/data-types/json/path" >}}) in the schema
237
+ to identify each field you want to index. However, you can declare a short alias for each
238
+ of these paths (using the ` as_name ` keyword argument) to avoid typing it in full for
239
+ every query. Also, you must specify ` IndexType.JSON ` when you create the index.
240
+
241
+ The code below shows these differences, but the index is otherwise very similar to
242
+ the one created previously for hashes:
243
+
244
+ ``` py
245
+ schema = (
246
+ TextField(" $.content" , as_name = " content" ),
247
+ TagField(" $.genre" , as_name = " genre" ),
248
+ VectorField(
249
+ " $.embedding" , " HNSW" , {
250
+ " TYPE" : " FLOAT32" ,
251
+ " DIM" : 384 ,
252
+ " DISTANCE_METRIC" : " L2"
253
+ },
254
+ as_name = " embedding"
255
+ )
256
+ )
257
+
258
+ r.ft(" vector_json_idx" ).create_index(
259
+ schema,
260
+ definition = IndexDefinition(
261
+ prefix = [" jdoc:" ], index_type = IndexType.JSON
262
+ )
263
+ )
264
+ ```
265
+
266
+ Use [ ` json().set() ` ] ({{< relref "/commands/json.set" >}}) to add the data
267
+ instead of [ ` hset() ` ] ({{< relref "/commands/hset" >}}). The dictionaries
268
+ that specify the fields have the same structure as the ones used for ` hset() `
269
+ but ` json().set() ` receives them in a positional argument instead of
270
+ the ` mapping ` keyword argument.
271
+
272
+ An important difference with JSON indexing is that the vectors are
273
+ specified using lists instead of binary strings. Generate the list
274
+ using the ` tolist() ` method instead of ` tobytes() ` as you would with a
275
+ hash.
276
+
277
+ ``` py
278
+ content = " That is a very happy person"
279
+
280
+ r.json().set(" jdoc:0" , Path.root_path(), {
281
+ " content" : content,
282
+ " genre" : " persons" ,
283
+ " embedding" : model.encode(content).astype(np.float32).tolist(),
284
+ })
285
+
286
+ content = " That is a happy dog"
287
+
288
+ r.json().set(" jdoc:1" , Path.root_path(), {
289
+ " content" : content,
290
+ " genre" : " pets" ,
291
+ " embedding" : model.encode(content).astype(np.float32).tolist(),
292
+ })
293
+
294
+ content = " Today is a sunny day"
295
+
296
+ r.json().set(" jdoc:2" , Path.root_path(), {
297
+ " content" : content,
298
+ " genre" : " weather" ,
299
+ " embedding" : model.encode(content).astype(np.float32).tolist(),
300
+ })
301
+ ```
302
+
303
+ The query is almost identical to the one for the hash documents. This
304
+ demonstrates how the right choice of aliases for the JSON paths can
305
+ save you having to write complex queries. An important thing to notice
306
+ is that the vector parameter for the query is still specified as a
307
+ binary string (using the ` tobytes() ` method), even though the data for
308
+ the ` embedding ` field of the JSON was specified as a list.
309
+
310
+ ``` py
311
+ q = Query(
312
+ " *=>[KNN 3 @embedding $vec AS vector_distance]"
313
+ ).return_field(" vector_distance" ).return_field(" content" ).dialect(2 )
314
+
315
+ query_text = " That is a happy person"
316
+
317
+ res = r.ft(" vector_json_idx" ).search(
318
+ q, query_params = {
319
+ " vec" : model.encode(query_text).astype(np.float32).tobytes()
320
+ }
321
+ )
322
+ ```
323
+
324
+ Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
325
+ query is the same as for hash:
326
+
327
+ ```
328
+ Result{
329
+ 3 total,
330
+ docs: [
331
+ Document {
332
+ 'id': 'jdoc:0',
333
+ 'payload': None,
334
+ 'vector_distance': '0.114169985056',
335
+ 'content': 'That is a very happy person'
336
+ },
337
+ .
338
+ .
339
+ .
340
+ ```
341
+
229
342
## Learn more
230
343
231
344
See
0 commit comments