Skip to content

Commit b5d6963

Browse files
DOC-5152 added Java vector index/query examples with JSON
1 parent 565bdf7 commit b5d6963

File tree

1 file changed

+148
-2
lines changed

1 file changed

+148
-2
lines changed

content/develop/clients/jedis/vecsearch.md

Lines changed: 148 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ of their meaning.
3131
In the example below, we use the [HuggingFace](https://huggingface.co/) model
3232
[`all-mpnet-base-v2`](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
3333
to generate the vector embeddings to store and index with Redis Query Engine.
34+
The code is first demonstrated for hash documents with a
35+
separate section to explain the
36+
[differences with JSON documents](#differences-with-json-documents).
3437

3538
## Initialize
3639

@@ -75,6 +78,7 @@ import java.nio.ByteBuffer;
7578
import java.nio.ByteOrder;
7679
import java.util.Map;
7780
import java.util.List;
81+
import org.json.JSONObject;
7882

7983
// Tokenizer to generate the vector embeddings.
8084
import ai.djl.huggingface.tokenizers.HuggingFaceTokenizer;
@@ -185,8 +189,9 @@ as shown below to create the embedding that represents the `content` field.
185189
The `getIds()` method that follows `encode()` obtains the vector
186190
of `long` values which we then convert to a `float` array stored as a `byte`
187191
string using our helper method. Use the `byte` string representation when you are
188-
indexing hash objects (as we are here), but use the default list of `float` for
189-
JSON objects. Note that when we set the `embedding` field, we must use an overload
192+
indexing hash objects (as we are here), but use an array of `float` for
193+
JSON objects (see [Differences with JSON objects](#differences-with-json-documents)
194+
below). Note that when we set the `embedding` field, we must use an overload
190195
of `hset()` that requires `byte` arrays for each of the key, the field name, and
191196
the value, which is why we include the `getBytes()` calls on the strings.
192197

@@ -281,6 +286,147 @@ For this model, the text *"That is a happy dog"*
281286
is the result judged to be most similar in meaning to the query text
282287
*"That is a happy person"*.
283288

289+
## Differences with JSON documents
290+
291+
Indexing JSON documents is similar to hash indexing, but there are some
292+
important differences. JSON allows much richer data modelling with nested fields, so
293+
you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
294+
to identify each field you want to index. However, you can declare a short alias for each
295+
of these paths (using the `as()` option) to avoid typing it in full for
296+
every query. Also, you must specify `IndexDataType.JSON` when you create the index.
297+
298+
The code below shows these differences, but the index is otherwise very similar to
299+
the one created previously for hashes:
300+
301+
```java
302+
SchemaField[] jsonSchema = {
303+
TextField.of("$.content").as("content"),
304+
TagField.of("$.genre").as("genre"),
305+
VectorField.builder()
306+
.fieldName("$.embedding").as("embedding")
307+
.algorithm(VectorAlgorithm.HNSW)
308+
.attributes(
309+
Map.of(
310+
"TYPE", "FLOAT32",
311+
"DIM", 768,
312+
"DISTANCE_METRIC", "L2"
313+
)
314+
)
315+
.build()
316+
};
317+
318+
jedis.ftCreate("vector_json_idx",
319+
FTCreateParams.createParams()
320+
.addPrefix("jdoc:")
321+
.on(IndexDataType.JSON),
322+
jsonSchema
323+
);
324+
```
325+
326+
An important difference with JSON indexing is that the vectors are
327+
specified using arrays of `float` instead of binary strings. This requires
328+
a modified version of the `longsToFloatsByteString()` method
329+
used previously:
330+
331+
```java
332+
public static float[] longArrayToFloatArray(long[] input) {
333+
float[] floats = new float[input.length];
334+
for (int i = 0; i < input.length; i++) {
335+
floats[i] = input[i];
336+
}
337+
return floats;
338+
}
339+
```
340+
341+
Use [`jsonSet()`]({{< relref "/commands/json.set" >}}) to add the data
342+
instead of [`hset()`]({{< relref "/commands/hset" >}}). Use instances
343+
of `JSONObject` to supply the data instead of `Map`, as you would for
344+
hash objects.
345+
346+
```java
347+
String jSentence1 = "That is a very happy person";
348+
349+
JSONObject jdoc1 = new JSONObject()
350+
.put("content", jSentence1)
351+
.put("genre", "persons")
352+
.put(
353+
"embedding",
354+
longArrayToFloatArray(
355+
sentenceTokenizer.encode(jSentence1).getIds()
356+
)
357+
);
358+
359+
jedis.jsonSet("jdoc:1", Path2.ROOT_PATH, jdoc1);
360+
361+
String jSentence2 = "That is a happy dog";
362+
363+
JSONObject jdoc2 = new JSONObject()
364+
.put("content", jSentence2)
365+
.put("genre", "pets")
366+
.put(
367+
"embedding",
368+
longArrayToFloatArray(
369+
sentenceTokenizer.encode(jSentence2).getIds()
370+
)
371+
);
372+
373+
jedis.jsonSet("jdoc:2", Path2.ROOT_PATH, jdoc2);
374+
375+
String jSentence3 = "Today is a sunny day";
376+
377+
JSONObject jdoc3 = new JSONObject()
378+
.put("content", jSentence3)
379+
.put("genre", "weather")
380+
.put(
381+
"embedding",
382+
longArrayToFloatArray(
383+
sentenceTokenizer.encode(jSentence3).getIds()
384+
)
385+
);
386+
387+
jedis.jsonSet("jdoc:3", Path2.ROOT_PATH, jdoc3);
388+
```
389+
390+
The query is almost identical to the one for the hash documents. This
391+
demonstrates how the right choice of aliases for the JSON paths can
392+
save you having to write complex queries. An important thing to notice
393+
is that the vector parameter for the query is still specified as a
394+
binary string (using the `longsToFloatsByteString()` method), even though
395+
the data for the `embedding` field of the JSON was specified as an array.
396+
397+
```java
398+
String jSentence = "That is a happy person";
399+
400+
int jK = 3;
401+
Query jq = new Query("*=>[KNN $K @embedding $BLOB AS distance]").
402+
returnFields("content", "distance").
403+
addParam("K", jK).
404+
addParam(
405+
"BLOB",
406+
longsToFloatsByteString(
407+
sentenceTokenizer.encode(jSentence).getIds()
408+
)
409+
)
410+
.setSortBy("distance", true)
411+
.dialect(2);
412+
413+
// Execute the query
414+
List<Document> jDocs = jedis
415+
.ftSearch("vector_json_idx", jq)
416+
.getDocuments();
417+
418+
```
419+
420+
Apart from the `jdoc:` prefixes for the keys, the result from the JSON
421+
query is the same as for hash:
422+
423+
```
424+
Results:
425+
ID: jdoc:2, Distance: 1411344, Content: That is a happy dog
426+
ID: jdoc:1, Distance: 9301635, Content: That is a very happy person
427+
ID: jdoc:3, Distance: 67178800, Content: Today is a sunny day
428+
```
429+
284430
## Learn more
285431

286432
See

0 commit comments

Comments
 (0)