Skip to content

Commit 565bdf7

Browse files
Merge pull request #1477 from redis/DOC-5154-php-vector-json-examples
DOC-5154 added JSON vector search examples for PHP
2 parents 596ba23 + 70de160 commit 565bdf7

File tree

1 file changed

+133
-1
lines changed

1 file changed

+133
-1
lines changed

content/develop/clients/php/vecsearch.md

Lines changed: 133 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ of their meaning.
3131
The example below uses the [HuggingFace](https://huggingface.co/) model
3232
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
3333
to generate the vector embeddings to store and index with Redis Query Engine.
34+
The code is first demonstrated for hash documents with a
35+
separate section to explain the
36+
[differences with JSON documents](#differences-with-json-documents).
3437

3538
## Initialize
3639

@@ -155,7 +158,8 @@ array of `float` values. Note that if you are using
155158
[JSON]({{< relref "/develop/data-types/json" >}})
156159
objects to store your documents instead of hashes, then you should store
157160
the `float` array directly without first converting it to a binary
158-
string.
161+
string (see [Differences with JSON documents](#differences-with-json-documents)
162+
below).
159163

160164
```php
161165
$content = "That is a very happy person";
@@ -264,6 +268,134 @@ document)
264268
is the result judged to be most similar in meaning to the query text
265269
*"That is a happy person"*.
266270

271+
## Differences with JSON documents
272+
273+
Indexing JSON documents is similar to hash indexing, but there are some
274+
important differences. JSON allows much richer data modeling with nested fields, so
275+
you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
276+
to identify each field you want to index. However, you can declare a short alias for each
277+
of these paths to avoid typing it in full for
278+
every query. Also, you must specify `JSON` with the `on()` option when you create the index.
279+
280+
The code below shows these differences, but the index is otherwise very similar to
281+
the one created previously for hashes:
282+
283+
```php
284+
$jsonSchema = [
285+
new TextField("$.content", "content"),
286+
new TagField("$.genre", "genre"),
287+
new VectorField(
288+
"$.embedding",
289+
"HNSW",
290+
[
291+
"TYPE", "FLOAT32",
292+
"DIM", 384,
293+
"DISTANCE_METRIC", "L2"
294+
],
295+
"embedding",
296+
)
297+
];
298+
299+
$client->ftcreate("vector_json_idx", $jsonSchema,
300+
(new CreateArguments())
301+
->on('JSON')
302+
->prefix(["jdoc:"])
303+
);
304+
```
305+
306+
Use [`jsonset()`]({{< relref "/commands/json.set" >}}) to add the data
307+
instead of [`hmset()`]({{< relref "/commands/hset" >}}). The arrays
308+
that specify the fields have roughly the same structure as the ones used for
309+
`hmset()` but you should use the standard library function
310+
[`json_encode()`](https://www.php.net/manual/en/function.json-encode.php)
311+
to generate a JSON string representation of the array.
312+
313+
An important difference with JSON indexing is that the vectors are
314+
specified using arrays instead of binary strings. Simply add the
315+
embedding as an array field without using the `pack()` function as you
316+
would with a hash.
317+
318+
```php
319+
$content = "That is a very happy person";
320+
$emb = $extractor($content, normalize: true, pooling: 'mean');
321+
322+
$client->jsonset("jdoc:0", "$",
323+
json_encode(
324+
[
325+
"content" => $content,
326+
"genre" => "persons",
327+
"embedding" => $emb[0]
328+
],
329+
JSON_THROW_ON_ERROR
330+
)
331+
);
332+
333+
$content = "That is a happy dog";
334+
$emb = $extractor($content, normalize: true, pooling: 'mean');
335+
336+
$client->jsonset("jdoc:1","$",
337+
json_encode(
338+
[
339+
"content" => $content,
340+
"genre" => "pets",
341+
"embedding" => $emb[0]
342+
],
343+
JSON_THROW_ON_ERROR
344+
)
345+
);
346+
347+
$content = "Today is a sunny day";
348+
$emb = $extractor($content, normalize: true, pooling: 'mean');
349+
350+
$client->jsonset("jdoc:2", "$",
351+
json_encode(
352+
[
353+
"content" => $content,
354+
"genre" => "weather",
355+
"embedding" => $emb[0]
356+
],
357+
JSON_THROW_ON_ERROR
358+
)
359+
);
360+
```
361+
362+
The query is almost identical to the one for the hash documents. This
363+
demonstrates how the right choice of aliases for the JSON paths can
364+
save you having to write complex queries. An important thing to notice
365+
is that the vector parameter for the query is still specified as a
366+
binary string (using the `pack()` function), even though the data for
367+
the `embedding` field of the JSON was specified as an array.
368+
369+
```php
370+
$queryText = "That is a happy person";
371+
$queryEmb = $extractor($queryText, normalize: true, pooling: 'mean');
372+
373+
$result = $client->ftsearch(
374+
"vector_json_idx",
375+
'*=>[KNN 3 @embedding $vec AS vector_distance]',
376+
new SearchArguments()
377+
->addReturn(1, "vector_distance")
378+
->dialect("2")
379+
->params([
380+
"vec", pack('g*', ...$queryEmb[0])
381+
])
382+
->sortBy("vector_distance")
383+
);
384+
```
385+
386+
Apart from the `jdoc:` prefixes for the keys, the result from the JSON
387+
query is the same as for hash:
388+
389+
```
390+
Number of results: 3
391+
Key: jdoc:0
392+
Field: vector_distance, Value: 3.76152896881
393+
Key: jdoc:1
394+
Field: vector_distance, Value: 18.6544265747
395+
Key: jdoc:2
396+
Field: vector_distance, Value: 44.6189727783
397+
```
398+
267399
## Learn more
268400

269401
See

0 commit comments

Comments
 (0)