Skip to content

Commit 212ae0b

Browse files
Update documentation
1 parent 511bc3e commit 212ae0b

File tree

1 file changed

+136
-65
lines changed

1 file changed

+136
-65
lines changed

docs/source/user_guide_kg_builder.rst

Lines changed: 136 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components (some of
2121
- **Data loader**: extract text from files (PDFs, ...).
2222
- **Text splitter**: split the text into smaller pieces of text (chunks), manageable by the LLM context window (token limit).
2323
- **Chunk embedder** (optional): compute the chunk embeddings.
24-
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG.
24+
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG. Schema can be provided manually or extracted automatically using LLMs.
2525
- **Lexical graph builder**: build the lexical graph (Document, Chunk and their relationships) (optional).
2626
- **Entity and relation extractor**: extract relevant entities and relations from the text.
2727
- **Knowledge Graph writer**: save the identified entities and relations.
@@ -75,10 +75,11 @@ Graph Schema
7575

7676
It is possible to guide the LLM by supplying a list of entities, relationships,
7777
and instructions on how to connect them. However, note that the extracted graph
78-
may not fully adhere to these guidelines. Entities and relationships can be
79-
represented as either simple strings (for their labels) or dictionaries. If using
80-
a dictionary, it must include a label key and can optionally include description
81-
and properties keys, as shown below:
78+
may not fully adhere to these guidelines unless schema enforcement is enabled
79+
(see :ref:`Schema Enforcement Behaviour`). Entities and relationships can be represented
80+
as either simple strings (for their labels) or dictionaries. If using a dictionary,
81+
it must include a label key and can optionally include description and properties keys,
82+
as shown below:
8283

8384
.. code:: python
8485
@@ -117,6 +118,18 @@ This schema information can be provided to the `SimpleKGBuilder` as demonstrated
117118

118119
.. code:: python
119120
121+
# Using the schema parameter (recommended approach)
122+
kg_builder = SimpleKGPipeline(
123+
# ...
124+
schema={
125+
"entities": ENTITIES,
126+
"relations": RELATIONS,
127+
"potential_schema": POTENTIAL_SCHEMA
128+
},
129+
# ...
130+
)
131+
132+
# Using individual schema parameters (deprecated approach)
120133
kg_builder = SimpleKGPipeline(
121134
# ...
122135
entities=ENTITIES,
@@ -125,6 +138,9 @@ This schema information can be provided to the `SimpleKGBuilder` as demonstrated
125138
# ...
126139
)
127140
141+
.. note::
142+
By default, if no schema is provided to the SimpleKGPipeline, automatic schema extraction will be performed using the LLM (See the :ref:`Automatic Schema Extraction with SchemaFromText` section.
143+
128144
Extra configurations
129145
--------------------
130146

@@ -412,41 +428,46 @@ within the configuration file.
412428
"neo4j_database": "myDb",
413429
"on_error": "IGNORE",
414430
"prompt_template": "...",
415-
"entities": [
416-
"Person",
417-
{
418-
"label": "House",
419-
"description": "Family the person belongs to",
420-
"properties": [
421-
{"name": "name", "type": "STRING"}
422-
]
423-
},
424-
{
425-
"label": "Planet",
426-
"properties": [
427-
{"name": "name", "type": "STRING"},
428-
{"name": "weather", "type": "STRING"}
429-
]
430-
}
431-
],
432-
"relations": [
433-
"PARENT_OF",
434-
{
435-
"label": "HEIR_OF",
436-
"description": "Used for inheritor relationship between father and sons"
437-
},
438-
{
439-
"label": "RULES",
440-
"properties": [
441-
{"name": "fromYear", "type": "INTEGER"}
442-
]
443-
}
444-
],
445-
"potential_schema": [
446-
["Person", "PARENT_OF", "Person"],
447-
["Person", "HEIR_OF", "House"],
448-
["House", "RULES", "Planet"]
449-
],
431+
432+
"schema": {
433+
"entities": [
434+
"Person",
435+
{
436+
"label": "House",
437+
"description": "Family the person belongs to",
438+
"properties": [
439+
{"name": "name", "type": "STRING"}
440+
]
441+
},
442+
{
443+
"label": "Planet",
444+
"properties": [
445+
{"name": "name", "type": "STRING"},
446+
{"name": "weather", "type": "STRING"}
447+
]
448+
}
449+
],
450+
"relations": [
451+
"PARENT_OF",
452+
{
453+
"label": "HEIR_OF",
454+
"description": "Used for inheritor relationship between father and sons"
455+
},
456+
{
457+
"label": "RULES",
458+
"properties": [
459+
{"name": "fromYear", "type": "INTEGER"}
460+
]
461+
}
462+
],
463+
"potential_schema": [
464+
["Person", "PARENT_OF", "Person"],
465+
["Person", "HEIR_OF", "House"],
466+
["House", "RULES", "Planet"]
467+
]
468+
},
469+
/* Control automatic schema extraction */
470+
"auto_schema_extraction": false,
450471
"lexical_graph_config": {
451472
"chunk_node_label": "TextPart"
452473
}
@@ -462,31 +483,36 @@ or in YAML:
462483
neo4j_database: myDb
463484
on_error: IGNORE
464485
prompt_template: ...
465-
entities:
466-
- label: Person
467-
- label: House
468-
description: Family the person belongs to
469-
properties:
470-
- name: name
471-
type: STRING
472-
- label: Planet
473-
properties:
474-
- name: name
475-
type: STRING
476-
- name: weather
477-
type: STRING
478-
relations:
479-
- label: PARENT_OF
480-
- label: HEIR_OF
481-
description: Used for inheritor relationship between father and sons
482-
- label: RULES
483-
properties:
484-
- name: fromYear
485-
type: INTEGER
486-
potential_schema:
487-
- ["Person", "PARENT_OF", "Person"]
488-
- ["Person", "HEIR_OF", "House"]
489-
- ["House", "RULES", "Planet"]
486+
487+
# Using the schema parameter (recommended approach)
488+
schema:
489+
entities:
490+
- Person
491+
- label: House
492+
description: Family the person belongs to
493+
properties:
494+
- name: name
495+
type: STRING
496+
- label: Planet
497+
properties:
498+
- name: name
499+
type: STRING
500+
- name: weather
501+
type: STRING
502+
relations:
503+
- PARENT_OF
504+
- label: HEIR_OF
505+
description: Used for inheritor relationship between father and sons
506+
- label: RULES
507+
properties:
508+
- name: fromYear
509+
type: INTEGER
510+
potential_schema:
511+
- ["Person", "PARENT_OF", "Person"]
512+
- ["Person", "HEIR_OF", "House"]
513+
- ["House", "RULES", "Planet"]
514+
# Control automatic schema extraction
515+
auto_schema_extraction: false
490516
lexical_graph_config:
491517
chunk_node_label: TextPart
492518
@@ -791,6 +817,49 @@ Here is a code block illustrating these concepts:
791817
After validation, this schema is saved in a `SchemaConfig` object, whose dict representation is passed
792818
to the LLM.
793819

820+
Automatic Schema Extraction with SchemaFromText
821+
----------------------------------------------
822+
.. _automatic-schema-extraction:
823+
824+
Instead of manually defining the schema, you can use the `SchemaFromText` component to automatically extract a schema from your text using an LLM:
825+
826+
.. code:: python
827+
828+
from neo4j_graphrag.experimental.components.schema import SchemaFromText
829+
from neo4j_graphrag.llm import OpenAILLM
830+
831+
# Create the automatic schema extractor
832+
schema_extractor = SchemaFromText(
833+
llm=OpenAILLM(
834+
model_name="gpt-4o",
835+
model_params={
836+
"max_tokens": 2000,
837+
"response_format": {"type": "json_object"},
838+
},
839+
)
840+
)
841+
842+
# Extract schema from text
843+
schema_config = await schema_extractor.run(text="Your document text here...")
844+
845+
# Use the extracted schema with other components
846+
extractor = LLMEntityRelationExtractor(llm=llm)
847+
result = await extractor.run(chunks=chunks, schema=schema_config)
848+
849+
The `SchemaFromText` component analyzes the text and identifies entity types, relationship types, and their property types. It creates a complete `SchemaConfig` object that can be used in the same way as a manually defined schema.
850+
851+
You can also save and reload the extracted schema:
852+
853+
.. code:: python
854+
855+
# Save the schema to JSON or YAML files
856+
schema_config.store_as_json("my_schema.json")
857+
schema_config.store_as_yaml("my_schema.yaml")
858+
859+
# Later, reload the schema from file
860+
from neo4j_graphrag.experimental.components.schema import SchemaConfig
861+
restored_schema = SchemaConfig.from_file("my_schema.json") # or my_schema.yaml
862+
794863
795864
Entity and Relation Extractor
796865
=============================
@@ -832,6 +901,8 @@ The LLM to use can be customized, the only constraint is that it obeys the :ref:
832901

833902
Schema Enforcement Behaviour
834903
----------------------------
904+
.. _schema-enforcement-behaviour:
905+
835906
By default, even if a schema is provided to guide the LLM in the entity and relation extraction, the LLM response is not validated against that schema.
836907
This behaviour can be changed by using the `enforce_schema` flag in the `LLMEntityRelationExtractor` constructor:
837908

0 commit comments

Comments
 (0)