Skip to content

Commit 93fb85f

Browse files
DOC-5153 added Go JSON examples for vector search
1 parent ec74293 commit 93fb85f

File tree

1 file changed

+124
-7
lines changed

1 file changed

+124
-7
lines changed

content/develop/clients/go/vecsearch.md

Lines changed: 124 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ In the example below, we use the
3232
[`huggingfaceembedder`](https://pkg.go.dev/github.com/henomis/lingoose@v0.3.0/embedder/huggingface)
3333
package from the [`LinGoose`](https://pkg.go.dev/github.com/henomis/lingoose@v0.3.0)
3434
framework to generate vector embeddings to store and index with
35-
Redis Query Engine.
35+
Redis Query Engine. The code is first demonstrated for hash documents with a
36+
separate section to explain the
37+
[differences with JSON documents](#differences-with-json-documents).
3638

3739
## Initialize
3840

@@ -80,10 +82,10 @@ the embeddings for this example are both available for free.
8082

8183
The `huggingfaceembedder` model outputs the embeddings as a
8284
`[]float32` array. If you are storing your documents as
83-
[hash]({{< relref "/develop/data-types/hashes" >}}) objects
84-
(as we are in this example), then you must convert this array
85-
to a `byte` string before adding it as a hash field. In this example,
86-
we will use the function below to produce the `byte` string:
85+
[hash]({{< relref "/develop/data-types/hashes" >}}) objects, then you
86+
must convert this array to a `byte` string before adding it as a hash field.
87+
The function shown below uses Go's [`binary`](https://pkg.go.dev/encoding/binary)
88+
package to produce the `byte` string:
8789

8890
```go
8991
func floatsToBytes(fs []float32) []byte {
@@ -101,7 +103,8 @@ func floatsToBytes(fs []float32) []byte {
101103
Note that if you are using [JSON]({{< relref "/develop/data-types/json" >}})
102104
objects to store your documents instead of hashes, then you should store
103105
the `[]float32` array directly without first converting it to a `byte`
104-
string.
106+
string (see [Differences with JSON documents](#differences-with-json-documents)
107+
below).
105108

106109
## Create the index
107110

@@ -187,7 +190,7 @@ hf := huggingfaceembedder.New().
187190
## Add data
188191

189192
You can now supply the data objects, which will be indexed automatically
190-
when you add them with [`hset()`]({{< relref "/commands/hset" >}}), as long as
193+
when you add them with [`HSet()`]({{< relref "/commands/hset" >}}), as long as
191194
you use the `doc:` prefix specified in the index definition.
192195

193196
Use the `Embed()` method of `huggingfacetransformer`
@@ -310,6 +313,120 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
310313
is the result that is most similar in meaning to the query text
311314
*"That is a happy person"*.
312315

316+
## Differences with JSON documents
317+
318+
Indexing JSON documents is similar to hash indexing, but there are some
319+
important differences. JSON allows much richer data modelling with nested fields, so
320+
you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
321+
to identify each field you want to index. However, you can declare a short alias for each
322+
of these paths (using the `As` option) to avoid typing it in full for
323+
every query. Also, you must set `OnJSON` to `true` when you create the index.
324+
325+
The code below shows these differences, but the index is otherwise very similar to
326+
the one created previously for hashes:
327+
328+
```go
329+
_, err = rdb.FTCreate(ctx,
330+
"vector_json_idx",
331+
&redis.FTCreateOptions{
332+
OnJSON: true,
333+
Prefix: []any{"jdoc:"},
334+
},
335+
&redis.FieldSchema{
336+
FieldName: "$.content",
337+
As: "content",
338+
FieldType: redis.SearchFieldTypeText,
339+
},
340+
&redis.FieldSchema{
341+
FieldName: "$.genre",
342+
As: "genre",
343+
FieldType: redis.SearchFieldTypeTag,
344+
},
345+
&redis.FieldSchema{
346+
FieldName: "$.embedding",
347+
As: "embedding",
348+
FieldType: redis.SearchFieldTypeVector,
349+
VectorArgs: &redis.FTVectorArgs{
350+
HNSWOptions: &redis.FTHNSWOptions{
351+
Dim: 384,
352+
DistanceMetric: "L2",
353+
Type: "FLOAT32",
354+
},
355+
},
356+
},
357+
).Result()
358+
```
359+
360+
Use [`JSONSet()`]({{< relref "/commands/json.set" >}}) to add the data
361+
instead of [`HSet()`]({{< relref "/commands/hset" >}}). The maps
362+
that specify the fields have the same structure as the ones used for `HSet()`.
363+
364+
An important difference with JSON indexing is that the vectors are
365+
specified using lists instead of binary strings. The loop below is similar
366+
to the one used previously to add the hash data, but it doesn't use the
367+
`floatsToBytes()` function to encode the `float32` array.
368+
369+
```go
370+
for i, emb := range embeddings {
371+
_, err = rdb.JSONSet(ctx,
372+
fmt.Sprintf("jdoc:%v", i),
373+
"$",
374+
map[string]any{
375+
"content": sentences[i],
376+
"genre": tags[i],
377+
"embedding": emb.ToFloat32(),
378+
},
379+
).Result()
380+
381+
if err != nil {
382+
panic(err)
383+
}
384+
}
385+
```
386+
387+
The query is almost identical to the one for the hash documents. This
388+
demonstrates how the right choice of aliases for the JSON paths can
389+
save you having to write complex queries. An important thing to notice
390+
is that the vector parameter for the query is still specified as a
391+
binary string (using the `floatsToBytes()` method), even though the data for
392+
the `embedding` field of the JSON was specified as an array.
393+
394+
```go
395+
jsonQueryEmbedding, err := hf.Embed(ctx, []string{
396+
"That is a happy person",
397+
})
398+
399+
if err != nil {
400+
panic(err)
401+
}
402+
403+
jsonBuffer := floatsToBytes(jsonQueryEmbedding[0].ToFloat32())
404+
405+
jsonResults, err := rdb.FTSearchWithArgs(ctx,
406+
"vector_json_idx",
407+
"*=>[KNN 3 @embedding $vec AS vector_distance]",
408+
&redis.FTSearchOptions{
409+
Return: []redis.FTSearchReturn{
410+
{FieldName: "vector_distance"},
411+
{FieldName: "content"},
412+
},
413+
DialectVersion: 2,
414+
Params: map[string]any{
415+
"vec": jsonBuffer,
416+
},
417+
},
418+
).Result()
419+
```
420+
421+
Apart from the `jdoc:` prefixes for the keys, the result from the JSON
422+
query is the same as for hash:
423+
424+
```
425+
ID: jdoc:0, Distance:0.114169843495, Content:'That is a very happy person'
426+
ID: jdoc:1, Distance:0.610845327377, Content:'That is a happy dog'
427+
ID: jdoc:2, Distance:1.48624765873, Content:'Today is a sunny day'
428+
```
429+
313430
## Learn more
314431

315432
See

0 commit comments

Comments
 (0)