Skip to content

Commit df1c205

Browse files
committed
Add schema-database constraint validation to SchemaBuilder
1 parent 61c4f1a commit df1c205

File tree

6 files changed

+697
-46
lines changed

6 files changed

+697
-46
lines changed

docs/source/api.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -475,6 +475,8 @@ Errors
475475

476476
* :class:`neo4j_graphrag.exceptions.SchemaValidationError`
477477

478+
* :class:`neo4j_graphrag.exceptions.SchemaDatabaseConflictError`
479+
478480
* :class:`neo4j_graphrag.exceptions.PdfLoaderError`
479481

480482
* :class:`neo4j_graphrag.exceptions.PromptMissingPlaceholderError`
@@ -604,6 +606,13 @@ SchemaValidationError
604606
:show-inheritance:
605607

606608

609+
SchemaDatabaseConflictError
610+
=========================
611+
612+
.. autoclass:: neo4j_graphrag.exceptions.SchemaDatabaseConflictError
613+
:show-inheritance:
614+
615+
607616
PdfLoaderError
608617
==============
609618

docs/source/user_guide_kg_builder.rst

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,183 @@ You can also save and reload the extracted schema:
841841
restored_schema = GraphSchema.from_file("my_schema.json") # or my_schema.yaml
842842
843843
844+
Schema-Database Constraint Validation
845+
=====================================
846+
847+
When using `SchemaBuilder` with an existing Neo4j database that contains constraints,
848+
the component automatically validates that your user-defined schema is compatible with
849+
the database constraints. This validation helps ensure data consistency and prevents
850+
runtime errors when writing to the database.
851+
852+
.. warning::
853+
854+
This validation is performed during schema building and will raise explicit errors
855+
if conflicts are detected. No silent modifications are made to your schema.
856+
857+
Validation Rules
858+
----------------
859+
860+
The `SchemaBuilder` validates your schema against the following types of database constraints:
861+
862+
**1. Missing Property Conflicts**
863+
864+
If your schema defines an entity type (node or relationship) but omits properties that
865+
are required by database existence constraints, an error will be raised:
866+
867+
.. code:: python
868+
869+
# Database has constraint: CREATE CONSTRAINT FOR (p:Person) REQUIRE p.email IS NOT NULL
870+
# But your schema doesn't include the 'email' property
871+
872+
schema_builder = SchemaBuilder(driver=neo4j_driver)
873+
874+
# This will raise SchemaDatabaseConflictError
875+
await schema_builder.run(
876+
node_types=[
877+
NodeType(
878+
label="Person",
879+
properties=[
880+
PropertyType(name="name", type="STRING")
881+
# Missing required 'email' property!
882+
]
883+
)
884+
]
885+
)
886+
887+
**Error Resolution:** Add the missing properties to your schema or remove the database constraint.
888+
889+
**2. Property Type Conflicts**
890+
891+
If your schema defines property types that conflict with database type constraints:
892+
893+
.. code:: python
894+
895+
# Database has constraint: CREATE CONSTRAINT FOR (p:Person) REQUIRE p.age IS :: INTEGER
896+
# But your schema defines 'age' as STRING
897+
898+
# This will raise SchemaDatabaseConflictError
899+
await schema_builder.run(
900+
node_types=[
901+
NodeType(
902+
label="Person",
903+
properties=[
904+
PropertyType(name="age", type="STRING") # Conflicts with INTEGER constraint
905+
]
906+
)
907+
]
908+
)
909+
910+
**Error Resolution:** Update property types to match database constraints or remove the database constraint.
911+
912+
**3. Missing Entity Type Conflicts**
913+
914+
If database constraints reference entity types not defined in your schema and you've
915+
disabled additional types:
916+
917+
.. code:: python
918+
919+
# Database has constraints on 'Company' nodes
920+
# But your schema doesn't include Company and additional_node_types=False
921+
922+
# This will raise SchemaDatabaseConflictError
923+
await schema_builder.run(
924+
node_types=[NodeType(label="Person")],
925+
additional_node_types=False # Strict mode
926+
)
927+
928+
**Error Resolution:** Add the missing entity types to your schema or set ``additional_node_types=True``.
929+
930+
**4. Additional Properties Conflicts**
931+
932+
If your entity has ``additional_properties=False`` but database constraints require
933+
properties not in your schema:
934+
935+
.. code:: python
936+
937+
# Database requires 'email' property via existence constraint
938+
# But your schema has additional_properties=False and doesn't include 'email'
939+
940+
# This will raise SchemaDatabaseConflictError
941+
await schema_builder.run(
942+
node_types=[
943+
NodeType(
944+
label="Person",
945+
properties=[PropertyType(name="name", type="STRING")],
946+
additional_properties=False # Strict mode, but missing required 'email'
947+
)
948+
]
949+
)
950+
951+
**Error Resolution:** Add missing properties to your schema or set ``additional_properties=True``.
952+
953+
Schema Enhancement
954+
------------------
955+
956+
When your schema is compatible with database constraints, the `SchemaBuilder` can
957+
enhance your schema by setting ``required=True`` on properties that have database
958+
existence constraints:
959+
960+
.. code:: python
961+
962+
# Database has: CREATE CONSTRAINT FOR (p:Person) REQUIRE p.email IS NOT NULL
963+
964+
schema = await schema_builder.run(
965+
node_types=[
966+
NodeType(
967+
label="Person",
968+
properties=[
969+
PropertyType(name="name", type="STRING"),
970+
PropertyType(name="email", type="STRING", required=False) # Initially optional
971+
]
972+
)
973+
]
974+
)
975+
976+
# After validation, the 'email' property will be enhanced to required=True
977+
person_type = schema.node_type_from_label("Person")
978+
email_prop = person_type.get_property_by_name("email")
979+
assert email_prop.required == True # Enhanced by database constraint
980+
981+
Error Handling
982+
--------------
983+
984+
All constraint conflicts raise ``SchemaDatabaseConflictError`` with detailed error
985+
messages explaining the conflict and suggesting resolutions:
986+
987+
.. code:: python
988+
989+
from neo4j_graphrag.exceptions import SchemaDatabaseConflictError
990+
991+
try:
992+
schema = await schema_builder.run(node_types=[...])
993+
except SchemaDatabaseConflictError as e:
994+
print(f"Schema conflict detected: {e}")
995+
# Error message will indicate exactly which properties or types are missing
996+
# and provide suggestions for resolution
997+
998+
Best Practices
999+
--------------
1000+
1001+
1. **Review Database Constraints:** Before defining your schema, review existing
1002+
database constraints using:
1003+
1004+
.. code:: cypher
1005+
1006+
SHOW CONSTRAINTS
1007+
1008+
2. **Start Permissive:** Begin with ``additional_node_types=True`` and
1009+
``additional_properties=True`` to allow flexibility during development.
1010+
1011+
3. **Iterative Refinement:** Use the error messages to iteratively refine your
1012+
schema until it's compatible with database constraints.
1013+
1014+
4. **Constraint Alignment:** Ensure your schema property types match database
1015+
type constraints to avoid conflicts.
1016+
1017+
5. **Required Properties:** Include all properties referenced by database existence
1018+
constraints in your schema definitions.
1019+
1020+
8441021
Entity and Relation Extractor
8451022
=============================
8461023

src/neo4j_graphrag/exceptions.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,12 @@ class SchemaValidationError(Neo4jGraphRagError):
116116
pass
117117

118118

119+
class SchemaDatabaseConflictError(SchemaValidationError):
120+
"""Exception raised when user schema conflicts with database constraints."""
121+
122+
pass
123+
124+
119125
class SchemaExtractionError(Neo4jGraphRagError):
120126
"""Exception raised for errors in automatic schema extraction."""
121127

0 commit comments

Comments
 (0)