@@ -31,6 +31,9 @@ of their meaning.
31
31
The example below uses the [ HuggingFace] ( https://huggingface.co/ ) model
32
32
[ ` all-MiniLM-L6-v2 ` ] ( https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 )
33
33
to generate the vector embeddings to store and index with Redis Query Engine.
34
+ The code is first demonstrated for hash documents with a
35
+ separate section to explain the
36
+ [ differences with JSON documents] ( #differences-with-json-documents ) .
34
37
35
38
## Initialize
36
39
@@ -155,7 +158,8 @@ array of `float` values. Note that if you are using
155
158
[ JSON] ({{< relref "/develop/data-types/json" >}})
156
159
objects to store your documents instead of hashes, then you should store
157
160
the ` float ` array directly without first converting it to a binary
158
- string.
161
+ string (see [ Differences with JSON documents] ( #differences-with-json-documents )
162
+ below).
159
163
160
164
``` php
161
165
$content = "That is a very happy person";
@@ -264,6 +268,134 @@ document)
264
268
is the result judged to be most similar in meaning to the query text
265
269
* "That is a happy person"* .
266
270
271
+ ## Differences with JSON documents
272
+
273
+ Indexing JSON documents is similar to hash indexing, but there are some
274
+ important differences. JSON allows much richer data modeling with nested fields, so
275
+ you must supply a [ path] ({{< relref "/develop/data-types/json/path" >}}) in the schema
276
+ to identify each field you want to index. However, you can declare a short alias for each
277
+ of these paths to avoid typing it in full for
278
+ every query. Also, you must specify ` JSON ` with the ` on() ` option when you create the index.
279
+
280
+ The code below shows these differences, but the index is otherwise very similar to
281
+ the one created previously for hashes:
282
+
283
+ ``` php
284
+ $jsonSchema = [
285
+ new TextField("$.content", "content"),
286
+ new TagField("$.genre", "genre"),
287
+ new VectorField(
288
+ "$.embedding",
289
+ "HNSW",
290
+ [
291
+ "TYPE", "FLOAT32",
292
+ "DIM", 384,
293
+ "DISTANCE_METRIC", "L2"
294
+ ],
295
+ "embedding",
296
+ )
297
+ ];
298
+
299
+ $client->ftcreate("vector_json_idx", $jsonSchema,
300
+ (new CreateArguments())
301
+ ->on('JSON')
302
+ ->prefix(["jdoc:"])
303
+ );
304
+ ```
305
+
306
+ Use [ ` jsonset() ` ] ({{< relref "/commands/json.set" >}}) to add the data
307
+ instead of [ ` hmset() ` ] ({{< relref "/commands/hset" >}}). The arrays
308
+ that specify the fields have roughly the same structure as the ones used for
309
+ ` hmset() ` but you should use the standard library function
310
+ [ ` json_encode() ` ] ( https://www.php.net/manual/en/function.json-encode.php )
311
+ to generate a JSON string representation of the array.
312
+
313
+ An important difference with JSON indexing is that the vectors are
314
+ specified using arrays instead of binary strings. Simply add the
315
+ embedding as an array field without using the ` pack() ` function as you
316
+ would with a hash.
317
+
318
+ ``` php
319
+ $content = "That is a very happy person";
320
+ $emb = $extractor($content, normalize: true, pooling: 'mean');
321
+
322
+ $client->jsonset("jdoc:0", "$",
323
+ json_encode(
324
+ [
325
+ "content" => $content,
326
+ "genre" => "persons",
327
+ "embedding" => $emb[0]
328
+ ],
329
+ JSON_THROW_ON_ERROR
330
+ )
331
+ );
332
+
333
+ $content = "That is a happy dog";
334
+ $emb = $extractor($content, normalize: true, pooling: 'mean');
335
+
336
+ $client->jsonset("jdoc:1","$",
337
+ json_encode(
338
+ [
339
+ "content" => $content,
340
+ "genre" => "pets",
341
+ "embedding" => $emb[0]
342
+ ],
343
+ JSON_THROW_ON_ERROR
344
+ )
345
+ );
346
+
347
+ $content = "Today is a sunny day";
348
+ $emb = $extractor($content, normalize: true, pooling: 'mean');
349
+
350
+ $client->jsonset("jdoc:2", "$",
351
+ json_encode(
352
+ [
353
+ "content" => $content,
354
+ "genre" => "weather",
355
+ "embedding" => $emb[0]
356
+ ],
357
+ JSON_THROW_ON_ERROR
358
+ )
359
+ );
360
+ ```
361
+
362
+ The query is almost identical to the one for the hash documents. This
363
+ demonstrates how the right choice of aliases for the JSON paths can
364
+ save you having to write complex queries. An important thing to notice
365
+ is that the vector parameter for the query is still specified as a
366
+ binary string (using the ` pack() ` function), even though the data for
367
+ the ` embedding ` field of the JSON was specified as an array.
368
+
369
+ ``` php
370
+ $queryText = "That is a happy person";
371
+ $queryEmb = $extractor($queryText, normalize: true, pooling: 'mean');
372
+
373
+ $result = $client->ftsearch(
374
+ "vector_json_idx",
375
+ '*=>[KNN 3 @embedding $vec AS vector_distance]',
376
+ new SearchArguments()
377
+ ->addReturn(1, "vector_distance")
378
+ ->dialect("2")
379
+ ->params([
380
+ "vec", pack('g*', ...$queryEmb[0])
381
+ ])
382
+ ->sortBy("vector_distance")
383
+ );
384
+ ```
385
+
386
+ Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
387
+ query is the same as for hash:
388
+
389
+ ```
390
+ Number of results: 3
391
+ Key: jdoc:0
392
+ Field: vector_distance, Value: 3.76152896881
393
+ Key: jdoc:1
394
+ Field: vector_distance, Value: 18.6544265747
395
+ Key: jdoc:2
396
+ Field: vector_distance, Value: 44.6189727783
397
+ ```
398
+
267
399
## Learn more
268
400
269
401
See
0 commit comments