@@ -32,7 +32,9 @@ In the example below, we use [Microsoft.ML](https://dotnet.microsoft.com/en-us/a
32
32
to generate the vector embeddings to store and index with Redis Query Engine.
33
33
We also show how to adapt the code to use
34
34
[ Azure OpenAI] ( https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=csharp )
35
- for the embeddings.
35
+ for the embeddings. The code is first demonstrated for hash documents with a
36
+ separate section to explain the
37
+ [ differences with JSON documents] ( #differences-with-json-documents ) .
36
38
37
39
## Initialize
38
40
@@ -89,7 +91,6 @@ using Azure;
89
91
using Azure .AI .OpenAI ;
90
92
```
91
93
92
-
93
94
## Define a function to obtain the embedding model
94
95
95
96
{{< note >}}Ignore this step if you are using an Azure OpenAI
@@ -154,7 +155,9 @@ array as a `byte` string. To simplify this, we declare a
154
155
then encodes the returned ` float ` array as a ` byte ` string. If you are
155
156
storing your documents as JSON objects instead of hashes, then you should
156
157
use the ` float ` array for the embedding directly, without first converting
157
- it to a ` byte ` string.
158
+ it to a ` byte ` string (see [ Differences with JSON documents] ( #differences-with-json-documents )
159
+ below).
160
+
158
161
159
162
``` csharp
160
163
static byte [] GetEmbedding (
@@ -414,6 +417,156 @@ As you would expect, the result for `doc:1` with the content text
414
417
is the result that is most similar in meaning to the query text
415
418
*"That is a happy person "*.
416
419
420
+ ## Differences with JSON documents
421
+
422
+ Indexing JSON documents is similar to hash indexing , but there are some
423
+ important differences . JSON allows much richer data modeling with nested fields , so
424
+ you must supply a [path ]({{< relref " /develop/data-types/json/path" > }}) in the schema
425
+ to identify each field you want to index . However , you can declare a short alias for each
426
+ of these paths to avoid typing it in full for
427
+ every query . Also , you must specify `IndexType .JSON ` with the `On ()` option when you
428
+ create the index .
429
+
430
+ The code below shows these differences , but the index is otherwise very similar to
431
+ the one created previously for hashes :
432
+
433
+ ```cs
434
+ var jsonSchema = new Schema ()
435
+ .AddTextField (new FieldName (" $.content" , " content" ))
436
+ .AddTagField (new FieldName (" $.genre" , " genre" ))
437
+ .AddVectorField (
438
+ new FieldName (" $.embedding" , " embedding" ),
439
+ VectorField .VectorAlgo .HNSW ,
440
+ new Dictionary <string , object >()
441
+ {
442
+ [" TYPE" ] = " FLOAT32" ,
443
+ [" DIM" ] = " 150" ,
444
+ [" DISTANCE_METRIC" ] = " L2"
445
+ }
446
+ );
447
+
448
+
449
+ db .FT ().Create (
450
+ " vector_json_idx" ,
451
+ new FTCreateParams ()
452
+ .On (IndexDataType .JSON )
453
+ .Prefix (" jdoc:" ),
454
+ jsonSchema
455
+ );
456
+ ```
457
+
458
+ An important difference with JSON indexing is that the vectors are
459
+ specified using arrays of `float ` instead of binary strings . This requires a modification
460
+ to the `GetEmbedding ()` function declared in
461
+ [Define a function to generate an embedding ](#define - a - function - to - generate - an - embedding )
462
+ above :
463
+
464
+ ```cs
465
+ static float [] GetFloatEmbedding (
466
+ PredictionEngine < TextData , TransformedTextData > model , string sentence
467
+ )
468
+ {
469
+ // Call the prediction API to convert the text into embedding vector.
470
+ var data = new TextData ()
471
+ {
472
+ Text = sentence
473
+ };
474
+
475
+ var prediction = model .Predict (data );
476
+
477
+ float [] floatArray = Array .ConvertAll (prediction .Features , x => (float )x );
478
+ return floatArray ;
479
+ }
480
+ ```
481
+
482
+ You should make a similar modification to the `GetEmbeddingFromAzure ()` function
483
+ if you are using Azure OpenAI with JSON .
484
+
485
+ Use [`JSON ().set ()`]({{< relref " /commands/json.set" > }}) to add the data
486
+ instead of [`HashSet ()`]({{< relref " /commands/hset" > }}):
487
+
488
+ ```cs
489
+ var jSentence1 = " That is a very happy person" ;
490
+
491
+ var jdoc1 = new {
492
+ content = jSentence1 ,
493
+ genre = " persons" ,
494
+ embedding = GetFloatEmbedding (predEngine , jSentence1 ),
495
+ };
496
+
497
+ db .JSON ().Set (" jdoc:1" , " $" , jdoc1 );
498
+
499
+ var jSentence2 = " That is a happy dog" ;
500
+
501
+ var jdoc2 = new {
502
+ content = jSentence2 ,
503
+ genre = " pets" ,
504
+ embedding = GetFloatEmbedding (predEngine , jSentence2 ),
505
+ };
506
+
507
+ db .JSON ().Set (" jdoc:2" , " $" , jdoc2 );
508
+
509
+ var jSentence3 = " Today is a sunny day" ;
510
+
511
+ var jdoc3 = new {
512
+ content = jSentence3 ,
513
+ genre = " weather" ,
514
+ embedding = GetFloatEmbedding (predEngine , jSentence3 ),
515
+ };
516
+
517
+ db .JSON ().Set (" jdoc:3" , " $" , jdoc3 );
518
+ ```
519
+
520
+ The query is almost identical to the one for the hash documents . This
521
+ demonstrates how the right choice of aliases for the JSON paths can
522
+ save you having to write complex queries . The only significant difference is
523
+ that the `FieldName ` objects created for the `ReturnFields ()` option must
524
+ include the JSON path for the field .
525
+
526
+ An important thing to notice
527
+ is that the vector parameter for the query is still specified as a
528
+ binary string (using the `GetEmbedding ()` method ), even though the data for
529
+ the `embedding ` field of the JSON was specified as a `float ` array .
530
+
531
+ ```cs
532
+ var jRes = db .FT ().Search (" vector_json_idx" ,
533
+ new Query (" *=>[KNN 3 @embedding $query_vec AS score]" )
534
+ .AddParam (" query_vec" , GetEmbedding (predEngine , " That is a happy person" ))
535
+ .ReturnFields (
536
+ new FieldName (" $.content" , " content" ),
537
+ new FieldName (" $.score" , " score" )
538
+ )
539
+ .SetSortBy (" score" )
540
+ .Dialect (2 ));
541
+
542
+ foreach (var doc in jRes .Documents ) {
543
+ var props = doc .GetProperties ();
544
+ var propText = string .Join (
545
+ " , " ,
546
+ props .Select (p => $" {p .Key }: '{p .Value }'" )
547
+ );
548
+
549
+ Console .WriteLine (
550
+ $" ID: {doc .Id }, Properties: [\n {propText }\n ]"
551
+ );
552
+ }
553
+ ```
554
+
555
+ Apart from the `jdoc : ` prefixes for the keys , the result from the JSON
556
+ query is the same as for hash :
557
+
558
+ ```
559
+ ID : jdoc : 1 , Properties : [
560
+ score : '4.30777168274' , content : 'That is a very happy person'
561
+ ]
562
+ ID : jdoc : 2 , Properties : [
563
+ score : '25.9752807617' , content : 'That is a happy dog'
564
+ ]
565
+ ID : jdoc : 3 , Properties : [
566
+ score : '68.8638000488' , content : 'Today is a sunny day'
567
+ ]
568
+ ```
569
+
417
570
## Learn more
418
571
419
572
See
0 commit comments