Skip to content

Commit 1889ff4

Browse files
Merge pull request #1471 from redis/DOC-5150-js-vec-json-examples
DOC-5150 and DOC-5151 JavaScript and C# vector json examples
2 parents 80c8445 + dec1f83 commit 1889ff4

File tree

2 files changed

+312
-102
lines changed

2 files changed

+312
-102
lines changed

content/develop/clients/dotnet/vecsearch.md

Lines changed: 156 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ In the example below, we use [Microsoft.ML](https://dotnet.microsoft.com/en-us/a
3232
to generate the vector embeddings to store and index with Redis Query Engine.
3333
We also show how to adapt the code to use
3434
[Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=csharp)
35-
for the embeddings.
35+
for the embeddings. The code is first demonstrated for hash documents with a
36+
separate section to explain the
37+
[differences with JSON documents](#differences-with-json-documents).
3638

3739
## Initialize
3840

@@ -89,7 +91,6 @@ using Azure;
8991
using Azure.AI.OpenAI;
9092
```
9193

92-
9394
## Define a function to obtain the embedding model
9495

9596
{{< note >}}Ignore this step if you are using an Azure OpenAI
@@ -154,7 +155,9 @@ array as a `byte` string. To simplify this, we declare a
154155
then encodes the returned `float` array as a `byte` string. If you are
155156
storing your documents as JSON objects instead of hashes, then you should
156157
use the `float` array for the embedding directly, without first converting
157-
it to a `byte` string.
158+
it to a `byte` string (see [Differences with JSON documents](#differences-with-json-documents)
159+
below).
160+
158161

159162
```csharp
160163
static byte[] GetEmbedding(
@@ -414,6 +417,156 @@ As you would expect, the result for `doc:1` with the content text
414417
is the result that is most similar in meaning to the query text
415418
*"That is a happy person"*.
416419

420+
## Differences with JSON documents
421+
422+
Indexing JSON documents is similar to hash indexing, but there are some
423+
important differences. JSON allows much richer data modeling with nested fields, so
424+
you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
425+
to identify each field you want to index. However, you can declare a short alias for each
426+
of these paths to avoid typing it in full for
427+
every query. Also, you must specify `IndexType.JSON` with the `On()` option when you
428+
create the index.
429+
430+
The code below shows these differences, but the index is otherwise very similar to
431+
the one created previously for hashes:
432+
433+
```cs
434+
var jsonSchema = new Schema()
435+
.AddTextField(new FieldName("$.content", "content"))
436+
.AddTagField(new FieldName("$.genre", "genre"))
437+
.AddVectorField(
438+
new FieldName("$.embedding", "embedding"),
439+
VectorField.VectorAlgo.HNSW,
440+
new Dictionary<string, object>()
441+
{
442+
["TYPE"] = "FLOAT32",
443+
["DIM"] = "150",
444+
["DISTANCE_METRIC"] = "L2"
445+
}
446+
);
447+
448+
449+
db.FT().Create(
450+
"vector_json_idx",
451+
new FTCreateParams()
452+
.On(IndexDataType.JSON)
453+
.Prefix("jdoc:"),
454+
jsonSchema
455+
);
456+
```
457+
458+
An important difference with JSON indexing is that the vectors are
459+
specified using arrays of `float` instead of binary strings. This requires a modification
460+
to the `GetEmbedding()` function declared in
461+
[Define a function to generate an embedding](#define-a-function-to-generate-an-embedding)
462+
above:
463+
464+
```cs
465+
static float[] GetFloatEmbedding(
466+
PredictionEngine<TextData, TransformedTextData> model, string sentence
467+
)
468+
{
469+
// Call the prediction API to convert the text into embedding vector.
470+
var data = new TextData()
471+
{
472+
Text = sentence
473+
};
474+
475+
var prediction = model.Predict(data);
476+
477+
float[] floatArray = Array.ConvertAll(prediction.Features, x => (float)x);
478+
return floatArray;
479+
}
480+
```
481+
482+
You should make a similar modification to the `GetEmbeddingFromAzure()` function
483+
if you are using Azure OpenAI with JSON.
484+
485+
Use [`JSON().set()`]({{< relref "/commands/json.set" >}}) to add the data
486+
instead of [`HashSet()`]({{< relref "/commands/hset" >}}):
487+
488+
```cs
489+
var jSentence1 = "That is a very happy person";
490+
491+
var jdoc1 = new {
492+
content = jSentence1,
493+
genre = "persons",
494+
embedding = GetFloatEmbedding(predEngine, jSentence1),
495+
};
496+
497+
db.JSON().Set("jdoc:1", "$", jdoc1);
498+
499+
var jSentence2 = "That is a happy dog";
500+
501+
var jdoc2 = new {
502+
content = jSentence2,
503+
genre = "pets",
504+
embedding = GetFloatEmbedding(predEngine, jSentence2),
505+
};
506+
507+
db.JSON().Set("jdoc:2", "$", jdoc2);
508+
509+
var jSentence3 = "Today is a sunny day";
510+
511+
var jdoc3 = new {
512+
content = jSentence3,
513+
genre = "weather",
514+
embedding = GetFloatEmbedding(predEngine, jSentence3),
515+
};
516+
517+
db.JSON().Set("jdoc:3", "$", jdoc3);
518+
```
519+
520+
The query is almost identical to the one for the hash documents. This
521+
demonstrates how the right choice of aliases for the JSON paths can
522+
save you having to write complex queries. The only significant difference is
523+
that the `FieldName` objects created for the `ReturnFields()` option must
524+
include the JSON path for the field.
525+
526+
An important thing to notice
527+
is that the vector parameter for the query is still specified as a
528+
binary string (using the `GetEmbedding()` method), even though the data for
529+
the `embedding` field of the JSON was specified as a `float` array.
530+
531+
```cs
532+
var jRes = db.FT().Search("vector_json_idx",
533+
new Query("*=>[KNN 3 @embedding $query_vec AS score]")
534+
.AddParam("query_vec", GetEmbedding(predEngine, "That is a happy person"))
535+
.ReturnFields(
536+
new FieldName("$.content", "content"),
537+
new FieldName("$.score", "score")
538+
)
539+
.SetSortBy("score")
540+
.Dialect(2));
541+
542+
foreach (var doc in jRes.Documents) {
543+
var props = doc.GetProperties();
544+
var propText = string.Join(
545+
", ",
546+
props.Select(p => $"{p.Key}: '{p.Value}'")
547+
);
548+
549+
Console.WriteLine(
550+
$"ID: {doc.Id}, Properties: [\n {propText}\n]"
551+
);
552+
}
553+
```
554+
555+
Apart from the `jdoc:` prefixes for the keys, the result from the JSON
556+
query is the same as for hash:
557+
558+
```
559+
ID: jdoc:1, Properties: [
560+
score: '4.30777168274', content: 'That is a very happy person'
561+
]
562+
ID: jdoc:2, Properties: [
563+
score: '25.9752807617', content: 'That is a happy dog'
564+
]
565+
ID: jdoc:3, Properties: [
566+
score: '68.8638000488', content: 'Today is a sunny day'
567+
]
568+
```
569+
417570
## Learn more
418571

419572
See

0 commit comments

Comments
 (0)