Skip to content

New schema and pruning #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 11, 2025
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

#### Strict mode

- Strict mode in `SimpleKGPipeline`: now properties and relationships are pruned only if they are defined in the input schema.
- Strict mode in `SimpleKGPipeline`: the `enforce_schema` option is removed and replaced by a schema-driven pruning.

#### Schema definition

Expand Down
104 changes: 65 additions & 39 deletions docs/source/user_guide_kg_builder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,10 @@ Customizing the SimpleKGPipeline
Graph Schema
------------

It is possible to guide the LLM by supplying a list of node and relationship types,
and instructions on how to connect them (patterns). However, note that the extracted graph
may not fully adhere to these guidelines unless schema enforcement is enabled
(see :ref:`Schema Enforcement Behaviour`). Node and relationship types can be represented
It is possible to guide the LLM by supplying a list of node and relationship types (
with, optionally, a list of their expected properties)
and instructions on how to connect them (patterns).
Node and relationship types can be represented
as either simple strings (for their labels) or dictionaries. If using a dictionary,
it must include a label key and can optionally include description and properties keys,
as shown below:
Expand All @@ -90,7 +90,7 @@ as shown below:
# such as a description:
{"label": "House", "description": "Family the person belongs to"},
# or a list of properties the LLM will try to attach to the entity:
{"label": "Planet", "properties": [{"name": "weather", "type": "STRING"}]},
{"label": "Planet", "properties": [{"name": "name", "type": "STRING", "required": True}, {"name": "weather", "type": "STRING"}]},
]
# same thing for relationships:
RELATIONSHIP_TYPES = [
Expand Down Expand Up @@ -124,7 +124,8 @@ This schema information can be provided to the `SimpleKGBuilder` as demonstrated
schema={
"node_types": NODE_TYPES,
"relationship_types": RELATIONSHIP_TYPES,
"patterns": PATTERNS
"patterns": PATTERNS,
"additional_node_types": False,
},
# ...
)
Expand All @@ -145,7 +146,6 @@ They are also accessible via the `SimpleKGPipeline` interface.
# ...
prompt_template="",
lexical_graph_config=my_config,
enforce_schema="STRICT"
on_error="RAISE",
# ...
)
Expand Down Expand Up @@ -878,38 +878,6 @@ It can be used in this way:

The LLM to use can be customized, the only constraint is that it obeys the :ref:`LLMInterface <llminterface>`.

Schema Enforcement Behaviour
----------------------------
.. _schema-enforcement-behaviour:

By default, even if a schema is provided to guide the LLM in the entity and relation extraction, the LLM response is not validated against that schema.
This behaviour can be changed by using the `enforce_schema` flag in the `LLMEntityRelationExtractor` constructor:

.. code:: python

from neo4j_graphrag.experimental.components.entity_relation_extractor import LLMEntityRelationExtractor
from neo4j_graphrag.experimental.components.types import SchemaEnforcementMode

extractor = LLMEntityRelationExtractor(
# ...
enforce_schema=SchemaEnforcementMode.STRICT,
)

In this scenario, any extracted node/relation/property that is not part of the provided schema will be pruned.
Any relation whose start node or end node does not conform to the provided tuple in `potential_schema` will be pruned.
If a relation start/end nodes are valid but the direction is incorrect, the latter will be inverted.
If a node is left with no properties, it will be also pruned.

.. note::

If the input schema lacks a certain type of information, pruning is skipped.
For example, if an entity is defined only by a label and has no properties,
property pruning is not performed and all properties returned by the LLM are kept.


.. warning::

Note that if the schema enforcement mode is on but the schema is not provided, no schema enforcement will be applied.

Error Behaviour
---------------
Expand Down Expand Up @@ -1017,6 +985,64 @@ If more customization is needed, it is possible to subclass the `EntityRelationE
See :ref:`entityrelationextractor`.


Schema Guidance and Graph Filtering
===================================

The provided schema serves as a guiding structure for the language model during graph construction. However, it does not impose strict constraints on the model's output. As a result, the model may generate additional node labels, relationship types, or properties that are not explicitly defined in the schema.

By default, all extracted elements — including nodes, relationships, and properties — are retained in the constructed graph. This behavior can be configured using the following schema options:
(see :ref:`graphschema`)


Configuration Options
---------------------

- **Required Properties**
Required properties may be specified at the node or relationship type level. Any extracted node or relationship missing one or more of its required properties will be pruned from the graph.

- **Additional Properties** *(default: True)*
This node- or relationship-level option determines whether extra properties not listed in the schema should be retained.

- If set to ``True`` (default), all extracted properties are retained.
- If set to ``False``, only the properties defined in the schema are preserved; all others are removed.


.. note:: Node pruning

If, after property pruning using the above rule, a node is left without any property, it is removed from the graph.


- **Additional Node Types** *(default: True)*
This schema-level option specifies whether node types not defined in the schema are included in the graph.

- If set to ``True`` (default), such node types are retained.
- If set to ``False``, nodes with undefined types are removed.

- **Additional Relationship Types** *(default: True)*
This schema-level option specifies whether relationship types not defined in the schema are included in the graph.

- If set to ``True`` (default), such relationships are retained.
- If set to ``False``, relationships with undefined types are removed.

- **Additional Patterns** *(default: True)*
This schema-level option determines whether relationship patterns not explicitly listed in the schema are allowed.

- If set to ``True`` (default), all patterns are retained.
- If set to ``False``, only patterns defined in the schema are kept. **Note** `additional_relationship_types` must also be `False`.



Enforcement rules
_________________

In addition to the user-defined configuration options described above,
the `GraphPruning` component performs the following cleanup operations:

- Nodes with missing required properties are pruned.
- Nodes with no remaining properties are pruned.
- Relationships with invalid source or target nodes (i.e., nodes no longer present in the graph) are pruned.
- Relationships with incorrect direction have their direction corrected.

.. _kg-writer-section:

Knowledge Graph Writer
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ are listed in [the last section of this file](#customize).
- [LLM-based](./customize/build_graph/components/extractors/llm_entity_relation_extractor.py)
- [LLM-based with custom prompt](./customize/build_graph/components/extractors/llm_entity_relation_extractor_with_custom_prompt.py)
- [Custom](./customize/build_graph/components/extractors/custom_extractor.py)
- [Graph Pruner](./customize/build_graph/components/pruners/graph_pruner.py)
- Knowledge Graph Writer:
- [Neo4j writer](./customize/build_graph/components/writers/neo4j_writer.py)
- [Custom](./customize/build_graph/components/writers/custom_writer.py)
Expand Down
136 changes: 136 additions & 0 deletions examples/customize/build_graph/components/pruners/graph_pruner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
"""This example demonstrates how to use the GraphPruner component."""

import asyncio

from neo4j_graphrag.experimental.components.graph_pruning import GraphPruning
from neo4j_graphrag.experimental.components.schema import (
GraphSchema,
NodeType,
PropertyType,
RelationshipType,
)
from neo4j_graphrag.experimental.components.types import (
Neo4jGraph,
Neo4jNode,
Neo4jRelationship,
)

graph = Neo4jGraph(
nodes=[
Neo4jNode(
id="Person/John",
label="Person",
properties={
"firstName": "John",
"lastName": "Doe",
"occupation": "employee",
},
),
Neo4jNode(
id="Person/Jane",
label="Person",
properties={
"firstName": "Jane",
},
),
Neo4jNode(
id="Person/Jack",
label="Person",
properties={"firstName": "Jack", "lastName": "Dae"},
),
Neo4jNode(
id="Organization/Corp1",
label="Organization",
properties={"name": "CorpA"},
),
],
relationships=[
Neo4jRelationship(
start_node_id="Person/John",
end_node_id="Person/Jack",
type="KNOWS",
),
Neo4jRelationship(
start_node_id="Organization/CorpA",
end_node_id="Person/Jack",
type="WORKS_FOR",
),
Neo4jRelationship(
start_node_id="Person/John",
end_node_id="Person/Jack",
type="PARENT_OF",
),
],
)

schema = GraphSchema(
node_types=(
NodeType(
label="Person",
properties=[
PropertyType(name="firstName", type="STRING", required=True),
PropertyType(name="lastName", type="STRING", required=True),
PropertyType(name="age", type="INTEGER"),
],
additional_properties=False,
),
NodeType(
label="Organization",
properties=[
PropertyType(name="name", type="STRING", required=True),
PropertyType(name="address", type="STRING"),
],
),
),
relationship_types=(
RelationshipType(
label="WORKS_FOR",
properties=[PropertyType(name="since", type="LOCAL_DATETIME")],
),
RelationshipType(
label="KNOWS",
),
),
patterns=(
("Person", "KNOWS", "Person"),
("Person", "WORKS_FOR", "Organization"),
),
additional_node_types=False,
additional_relationship_types=False,
additional_patterns=False,
)


async def main() -> None:
pruner = GraphPruning()
res = await pruner.run(graph, schema)
print("=" * 20, "FINAL CLEANED GRAPH:", "=" * 20)
print(res.graph)
print("=" * 20, "PRUNED ITEM:", "=" * 20)
print(res.pruning_stats)
print("-" * 10, "PRUNED NODES:")
for node in res.pruning_stats.pruned_nodes:
print(
node.item.label,
"with properties",
node.item.properties,
"pruned because",
node.pruned_reason,
node.metadata,
)
print("-" * 10, "PRUNED RELATIONSHIPS:")
for rel in res.pruning_stats.pruned_relationships:
print(rel.item.type, "pruned because", rel.pruned_reason)
print("-" * 10, "PRUNED PROPERTIES:")
for prop in res.pruning_stats.pruned_properties:
print(
prop.item,
"from node label",
prop.label,
"pruned because",
prop.pruned_reason,
)


if __name__ == "__main__":
asyncio.run(main())
Loading
Loading