Skip to content

Commit 73ce2fc

Browse files
authored
feat(redis): vectorstore custom schema (langchain-ai#8963)
1 parent dc2fae4 commit 73ce2fc

File tree

3 files changed

+1177
-9
lines changed

3 files changed

+1177
-9
lines changed

docs/core_docs/docs/integrations/vectorstores/redis.ipynb

Lines changed: 323 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,328 @@
285285
"await retriever.invoke(\"biology\");"
286286
]
287287
},
288+
{
289+
"cell_type": "markdown",
290+
"id": "46781841",
291+
"metadata": {},
292+
"source": [
293+
"## Advanced Features\n",
294+
"\n",
295+
"### Custom Schema and Metadata Filtering\n",
296+
"\n",
297+
"The Redis vector store supports custom schema definitions for metadata fields, enabling more efficient filtering and searching. This feature allows you to define specific field types and validation rules for your metadata.\n",
298+
"\n",
299+
"#### Defining a Custom Schema\n",
300+
"\n",
301+
"You can define a custom schema when creating your vector store to specify field types, validation rules, and indexing options:\n"
302+
]
303+
},
304+
{
305+
"cell_type": "code",
306+
"execution_count": null,
307+
"id": "e31d7081",
308+
"metadata": {},
309+
"outputs": [],
310+
"source": [
311+
"import { RedisVectorStore } from \"@langchain/redis\";\n",
312+
"import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
313+
"import { SchemaFieldTypes } from \"redis\";\n",
314+
"import { createClient } from \"redis\";\n",
315+
"\n",
316+
"const embeddings = new OpenAIEmbeddings({\n",
317+
" model: \"text-embedding-3-small\",\n",
318+
"});\n",
319+
"\n",
320+
"const client = createClient({\n",
321+
" url: process.env.REDIS_URL ?? \"redis://localhost:6379\",\n",
322+
"});\n",
323+
"await client.connect();\n",
324+
"\n",
325+
"// Define custom schema for metadata fields\n",
326+
"const customSchema: RedisVectorStoreConfig[\"customSchema\"] = {\n",
327+
" userId: { \n",
328+
" type: SchemaFieldTypes.TEXT, \n",
329+
" required: true,\n",
330+
" SORTABLE: true \n",
331+
" },\n",
332+
" category: { \n",
333+
" type: SchemaFieldTypes.TAG, \n",
334+
" SORTABLE: true,\n",
335+
" SEPARATOR: \",\" \n",
336+
" },\n",
337+
" score: { \n",
338+
" type: SchemaFieldTypes.NUMERIC, \n",
339+
" SORTABLE: true \n",
340+
" },\n",
341+
" tags: { \n",
342+
" type: SchemaFieldTypes.TAG, \n",
343+
" SEPARATOR: \",\",\n",
344+
" CASESENSITIVE: true \n",
345+
" },\n",
346+
" description: { \n",
347+
" type: SchemaFieldTypes.TEXT, \n",
348+
" NOSTEM: true,\n",
349+
" WEIGHT: 2.0 \n",
350+
" }\n",
351+
"};\n",
352+
"\n",
353+
"const vectorStoreWithSchema = new RedisVectorStore(embeddings, {\n",
354+
" redisClient: client,\n",
355+
" indexName: \"langchainjs-custom-schema\",\n",
356+
" customSchema\n",
357+
"});\n"
358+
]
359+
},
360+
{
361+
"cell_type": "markdown",
362+
"id": "002f7c10",
363+
"metadata": {},
364+
"source": [
365+
"#### Schema Field Types\n",
366+
"\n",
367+
"The custom schema supports three main field types:\n",
368+
"\n",
369+
"- **TEXT**: Full-text searchable fields with optional stemming, weighting, and sorting\n",
370+
"- **TAG**: Categorical fields for exact matching, with support for multiple values and custom separators\n",
371+
"- **NUMERIC**: Numeric fields supporting range queries and sorting\n",
372+
"\n",
373+
"#### Field Configuration Options\n",
374+
"\n",
375+
"Each field can be configured with various options:\n",
376+
"\n",
377+
"- `required`: Whether the field must be present in metadata (default: false)\n",
378+
"- `SORTABLE`: Enable sorting on this field (default: undefined)\n",
379+
"- `SEPARATOR`: For TAG fields, specify the separator for multiple values (default: \",\")\n",
380+
"- `CASESENSITIVE`: For TAG fields, enable case-sensitive matching (Redis expects `true`, not boolean)\n",
381+
"- `NOSTEM`: For TEXT fields, disable stemming (Redis expects `true`, not boolean) \n",
382+
"- `WEIGHT`: For TEXT fields, specify search weight (default: 1.0)\n"
383+
]
384+
},
385+
{
386+
"cell_type": "markdown",
387+
"id": "766e2a4c",
388+
"metadata": {},
389+
"source": [
390+
"#### Adding Documents with Schema Validation\n",
391+
"\n",
392+
"When using a custom schema, documents are automatically validated against the defined schema:\n"
393+
]
394+
},
395+
{
396+
"cell_type": "code",
397+
"execution_count": null,
398+
"id": "4e9fcb1d",
399+
"metadata": {},
400+
"outputs": [],
401+
"source": [
402+
"import type { Document } from \"@langchain/core/documents\";\n",
403+
"\n",
404+
"const documentsWithMetadata: Document[] = [\n",
405+
" {\n",
406+
" pageContent: \"Advanced JavaScript techniques for modern web development\",\n",
407+
" metadata: {\n",
408+
" userId: \"user123\",\n",
409+
" category: \"programming\",\n",
410+
" score: 95,\n",
411+
" tags: [\"javascript\", \"web-development\", \"frontend\"],\n",
412+
" description: \"Comprehensive guide to JavaScript best practices\"\n",
413+
" }\n",
414+
" },\n",
415+
" {\n",
416+
" pageContent: \"Machine learning fundamentals and applications\",\n",
417+
" metadata: {\n",
418+
" userId: \"user456\", \n",
419+
" category: \"ai\",\n",
420+
" score: 88,\n",
421+
" tags: [\"machine-learning\", \"python\", \"data-science\"],\n",
422+
" description: \"Introduction to ML concepts and practical applications\"\n",
423+
" }\n",
424+
" },\n",
425+
" {\n",
426+
" pageContent: \"Database optimization strategies for high performance\",\n",
427+
" metadata: {\n",
428+
" userId: \"user789\",\n",
429+
" category: \"database\",\n",
430+
" score: 92,\n",
431+
" tags: [\"database\", \"optimization\", \"performance\"],\n",
432+
" description: \"Advanced techniques for database performance tuning\"\n",
433+
" }\n",
434+
" }\n",
435+
"];\n",
436+
"\n",
437+
"// This will validate each document's metadata against the custom schema\n",
438+
"await vectorStoreWithSchema.addDocuments(documentsWithMetadata);\n"
439+
]
440+
},
441+
{
442+
"cell_type": "markdown",
443+
"id": "89919bf3",
444+
"metadata": {},
445+
"source": [
446+
"#### Advanced Similarity Search with Metadata Filtering\n",
447+
"\n",
448+
"The custom schema enables powerful metadata filtering capabilities using the `similaritySearchVectorWithScoreAndMetadata` method:\n"
449+
]
450+
},
451+
{
452+
"cell_type": "code",
453+
"execution_count": null,
454+
"id": "5d7d8971",
455+
"metadata": {},
456+
"outputs": [],
457+
"source": [
458+
"// Search with TAG filtering\n",
459+
"const tagFilterResults = await vectorStoreWithSchema.similaritySearchVectorWithScoreAndMetadata(\n",
460+
" await embeddings.embedQuery(\"programming tutorial\"),\n",
461+
" 3,\n",
462+
" {\n",
463+
" category: \"programming\", // Exact tag match\n",
464+
" tags: [\"javascript\", \"frontend\"] // Multiple tag OR search\n",
465+
" }\n",
466+
");\n",
467+
"\n",
468+
"console.log(\"Tag filter results:\");\n",
469+
"for (const [doc, score] of tagFilterResults) {\n",
470+
" console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent}`);\n",
471+
" console.log(` Metadata: ${JSON.stringify(doc.metadata)}`);\n",
472+
"}\n"
473+
]
474+
},
475+
{
476+
"cell_type": "code",
477+
"execution_count": null,
478+
"id": "be7d4a67",
479+
"metadata": {},
480+
"outputs": [],
481+
"source": [
482+
"// Search with NUMERIC range filtering\n",
483+
"const numericFilterResults = await vectorStoreWithSchema.similaritySearchVectorWithScoreAndMetadata(\n",
484+
" await embeddings.embedQuery(\"high quality content\"),\n",
485+
" 5,\n",
486+
" {\n",
487+
" score: { min: 90, max: 100 }, // Score between 90 and 100\n",
488+
" category: [\"programming\", \"ai\"] // Multiple categories\n",
489+
" }\n",
490+
");\n",
491+
"\n",
492+
"console.log(\"Numeric filter results:\");\n",
493+
"for (const [doc, score] of numericFilterResults) {\n",
494+
" console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent}`);\n",
495+
" console.log(` Score: ${doc.metadata.score}, Category: ${doc.metadata.category}`);\n",
496+
"}\n"
497+
]
498+
},
499+
{
500+
"cell_type": "code",
501+
"execution_count": null,
502+
"id": "dffc7a45",
503+
"metadata": {},
504+
"outputs": [],
505+
"source": [
506+
"// Search with TEXT field filtering \n",
507+
"const textFilterResults = await vectorStoreWithSchema.similaritySearchVectorWithScoreAndMetadata(\n",
508+
" await embeddings.embedQuery(\"development guide\"),\n",
509+
" 3,\n",
510+
" {\n",
511+
" description: \"comprehensive guide\", // Text search in description field\n",
512+
" score: { min: 85 } // Minimum score of 85\n",
513+
" }\n",
514+
");\n",
515+
"\n",
516+
"console.log(\"Text filter results:\");\n",
517+
"for (const [doc, score] of textFilterResults) {\n",
518+
" console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent}`);\n",
519+
" console.log(` Description: ${doc.metadata.description}`);\n",
520+
"}\n"
521+
]
522+
},
523+
{
524+
"cell_type": "markdown",
525+
"id": "81d59504",
526+
"metadata": {},
527+
"source": [
528+
"#### Numeric Range Query Options\n",
529+
"\n",
530+
"For numeric fields, you can specify various range queries:\n",
531+
"\n",
532+
"```typescript\n",
533+
"// Exact value match\n",
534+
"{ score: 95 }\n",
535+
"\n",
536+
"// Range with both min and max\n",
537+
"{ score: { min: 80, max: 100 } }\n",
538+
"\n",
539+
"// Only minimum value\n",
540+
"{ score: { min: 90 } }\n",
541+
"\n",
542+
"// Only maximum value \n",
543+
"{ score: { max: 95 } }\n",
544+
"```\n",
545+
"\n",
546+
"#### Error Handling and Validation\n",
547+
"\n",
548+
"The custom schema provides automatic validation with helpful error messages:\n"
549+
]
550+
},
551+
{
552+
"cell_type": "code",
553+
"execution_count": null,
554+
"id": "ecfae15f",
555+
"metadata": {},
556+
"outputs": [],
557+
"source": [
558+
"try {\n",
559+
" // This will fail validation - missing required userId field\n",
560+
" const invalidDoc: Document = {\n",
561+
" pageContent: \"Some content without required metadata\",\n",
562+
" metadata: {\n",
563+
" category: \"test\",\n",
564+
" // Missing required userId field\n",
565+
" }\n",
566+
" };\n",
567+
" \n",
568+
" await vectorStoreWithSchema.addDocuments([invalidDoc]);\n",
569+
"} catch (error) {\n",
570+
" console.log(\"Validation error:\", error.message);\n",
571+
" // Output: \"Required metadata field 'userId' is missing\"\n",
572+
"}\n",
573+
"\n",
574+
"try {\n",
575+
" // This will fail validation - wrong type for score field\n",
576+
" const wrongTypeDoc: Document = {\n",
577+
" pageContent: \"Content with wrong metadata type\",\n",
578+
" metadata: {\n",
579+
" userId: \"user123\",\n",
580+
" score: \"not-a-number\", // Should be number, not string\n",
581+
" }\n",
582+
" };\n",
583+
" \n",
584+
" await vectorStoreWithSchema.addDocuments([wrongTypeDoc]);\n",
585+
"} catch (error) {\n",
586+
" console.log(\"Type validation error:\", error.message);\n",
587+
" // Output: \"Metadata field 'score' must be a number, got string\"\n",
588+
"}\n"
589+
]
590+
},
591+
{
592+
"cell_type": "markdown",
593+
"id": "0da8dc00",
594+
"metadata": {},
595+
"source": [
596+
"#### Performance Benefits\n",
597+
"\n",
598+
"Using custom schema provides several performance advantages:\n",
599+
"\n",
600+
"1. **Indexed Metadata Fields**: Individual metadata fields are indexed separately, enabling fast filtering\n",
601+
"2. **Type-Optimized Queries**: Numeric and tag fields use optimized query structures \n",
602+
"3. **Reduced Data Transfer**: Only relevant fields are returned in search results\n",
603+
"4. **Better Query Planning**: Redis can optimize queries based on field types and indexes\n",
604+
"\n",
605+
"#### Backward Compatibility\n",
606+
"\n",
607+
"The custom schema feature is fully backward compatible. Existing Redis vector stores without custom schemas will continue to work exactly as before. You can gradually migrate to custom schemas for new indexes or when rebuilding existing ones.\n"
608+
]
609+
},
288610
{
289611
"cell_type": "markdown",
290612
"id": "e2e0a211",
@@ -401,4 +723,4 @@
401723
},
402724
"nbformat": 4,
403725
"nbformat_minor": 5
404-
}
726+
}

0 commit comments

Comments
 (0)