A Java library developed to parse and translate Regular Path Queries (RPQ), Conjunctive RPQ (CRPQ), and Union of CRPQ (UCRPQ) into Neo4j Cypher queries. It includes rewriting algorithms to make queries Cypher-compatible and tools for automaton analysis (determinization, minimization, aperiodicity).
This project originated as an internship (2017) focused on patching GMark, a research tool for generating and benchmarking graph database queries. At that time, GMark produced incorrect Cypher translations:
-
SPARQL / RPQ (GMark)
SELECT DISTINCT ?x0 ?x3 WHERE { ?x0 ((:pmakesPurchase/:ppurchaseFor)) ?x1 . ?x1 ((:ptitle/^:ppaymentAccepted/:poffers) | (:pcaption/^:pdescription/^:pincludes) | (:pcaption/^:ptext/:prelease/^:pvalidForm)) ?x2 . ?x2 ((:pincludes/:ptag) | (:pincludes/:ptag/^:ptag/:ptag) | (:pincludes/:ptag)) ?x3 . }
-
Cypher generated by GMark
MATCH (x0)-[:pmakesPurchase]->()-[:ppurchaseFor]->(x1), (x1)-[:ptitle]->()<-[:ppaymentAccepted]-()-[:poffers]->(x2), (x2)-[:pincludes]->()-[:ptag]->(x3) RETURN DISTINCT x0, x3;
-
Problem → The
(caption.description-.includes-)
and(caption.text-.release.validForm-)
branches were lost because Cypher unions (+
) were not supported.
This project fixes such issues by parsing RPQ properly and rewriting them into Cypher-compatible patterns using UNION
.
-
Grammar Parsing:
- Supports RPQ:
(x, (a.b + c*), y)
- Supports CRPQ:
(... ) & (...)
- Supports UCRPQ:
(...) | (...)
- Supports RPQ:
-
Cypher Translation:
- Handles atoms, concatenation, union (
+
), and Kleene star (*
). - Automatically rewrites unsupported patterns into UNION-based Cypher.
- Handles atoms, concatenation, union (
-
Automata & Algebra:
- Generates finite automata (Glushkov’s construction).
- Determinization and minimization (Brzozowski algorithm).
- Builds transition monoids and checks aperiodicity.
-
Test Suite:
- Unit tests for parser (
getLength
,getRename
,getInitaux
,getSuivant
). - Integration tests for translation.
- Unit tests for parser (
Input RPQ/CRPQ/UCRPQ → Parser → Expression Tree (RegExp) → Translator → Cypher Query
RPQ (RegExp) → Glushkov Automaton → Deterministic Automaton → Minimized Automaton
↓
Transition Monoid (aperiodicity)
Input:
x,(a+b)*.a-*.b*.(a+c)*,y
Output (simplified):
MATCH (x)-[:a|b*0..]->(y)
UNION
MATCH (x)<-[:a*0..]-(y)
UNION
MATCH (x)-[:b*0..]->(y)
...
Input:
(x,(a.b.c),y) & (x,a*,y)
Output:
MATCH (x)-[:a]->()-[:b]->()-[:c]->(y)
MATCH (x)-[:a*0..]->(y)
RETURN x, y
Input:
(x,(a.b.c),y) & (x,a*,y) | (x,(a*.(c+(b.d))),y)
Output:
MATCH (x)-[:a]->()-[:b]->()-[:c]->(y)
MATCH (x)-[:a*0..]->(y)
RETURN x, y
UNION
MATCH (x)-[:a*0..]->()-[:c|b.d]->(y)
RETURN x, y
- Parser →
Parser.java
(RPQ/CRPQ/UCRPQ grammar) - Expressions →
Atom.java
,Union.java
,Concatenation.java
,Star.java
- Rewriting →
RewritingRules.java
(makes queries Cypherable) - Query Models →
RPQ.java
,CRPQ.java
,UCRPQ.java
- Automaton & Monoids →
Automate.java
,Etat.java
,Transition.java
,MonoideTransition.java
- Tests & Demo →
Test.java
(integration),TestRegExp.java
(unit tests)
- Java 8+
- JUnit 5 (for tests)
- (Optional) Neo4j (to execute generated queries)
# compile
javac -d out $(find . -name "*.java")
# run demo
java -cp out main.Test
# run unit tests
java -jar junit-platform-console-standalone.jar --class-path out --scan-class-path
- Integrate directly with GMark.
- Extend to property constraints (nodes & relationships).
- Provide visualization of automata and query plans.