Skip to content

Parser and translator for Regular Path Queries (RPQ), Conjunctive RPQ (CRPQ) and Union of CRPQ (UCRPQ) into Neo4j Cypher. Includes rewriting to handle unsupported patterns, automaton-based analysis, and fixes issues found in GMark-generated Cypher queries.

Notifications You must be signed in to change notification settings

marc-treu/Java-RPQ-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

RPQ → Cypher Translator

A Java library developed to parse and translate Regular Path Queries (RPQ), Conjunctive RPQ (CRPQ), and Union of CRPQ (UCRPQ) into Neo4j Cypher queries. It includes rewriting algorithms to make queries Cypher-compatible and tools for automaton analysis (determinization, minimization, aperiodicity).


Why This Project?

This project originated as an internship (2017) focused on patching GMark, a research tool for generating and benchmarking graph database queries. At that time, GMark produced incorrect Cypher translations:

  • SPARQL / RPQ (GMark)

    SELECT DISTINCT ?x0 ?x3 WHERE {
      ?x0 ((:pmakesPurchase/:ppurchaseFor)) ?x1 .
      ?x1 ((:ptitle/^:ppaymentAccepted/:poffers) |
           (:pcaption/^:pdescription/^:pincludes) |
           (:pcaption/^:ptext/:prelease/^:pvalidForm)) ?x2 .
      ?x2 ((:pincludes/:ptag) |
           (:pincludes/:ptag/^:ptag/:ptag) |
           (:pincludes/:ptag)) ?x3 .
    }
  • Cypher generated by GMark

    MATCH (x0)-[:pmakesPurchase]->()-[:ppurchaseFor]->(x1),
          (x1)-[:ptitle]->()<-[:ppaymentAccepted]-()-[:poffers]->(x2),
          (x2)-[:pincludes]->()-[:ptag]->(x3)
    RETURN DISTINCT x0, x3;
  • Problem → The (caption.description-.includes-) and (caption.text-.release.validForm-) branches were lost because Cypher unions (+) were not supported.

This project fixes such issues by parsing RPQ properly and rewriting them into Cypher-compatible patterns using UNION.


Features

  • Grammar Parsing:

    • Supports RPQ: (x, (a.b + c*), y)
    • Supports CRPQ: (... ) & (...)
    • Supports UCRPQ: (...) | (...)
  • Cypher Translation:

    • Handles atoms, concatenation, union (+), and Kleene star (*).
    • Automatically rewrites unsupported patterns into UNION-based Cypher.
  • Automata & Algebra:

    • Generates finite automata (Glushkov’s construction).
    • Determinization and minimization (Brzozowski algorithm).
    • Builds transition monoids and checks aperiodicity.
  • Test Suite:

    • Unit tests for parser (getLength, getRename, getInitaux, getSuivant).
    • Integration tests for translation.

How It Works

1. Parsing Pipeline

Input RPQ/CRPQ/UCRPQ → Parser → Expression Tree (RegExp) → Translator → Cypher Query

2. Automaton Construction

RPQ (RegExp) → Glushkov Automaton → Deterministic Automaton → Minimized Automaton
                                               ↓
                                    Transition Monoid (aperiodicity)

Example Outputs

RPQ

Input:

x,(a+b)*.a-*.b*.(a+c)*,y

Output (simplified):

MATCH (x)-[:a|b*0..]->(y)
UNION
MATCH (x)<-[:a*0..]-(y)
UNION
MATCH (x)-[:b*0..]->(y)
...

CRPQ

Input:

(x,(a.b.c),y) & (x,a*,y)

Output:

MATCH (x)-[:a]->()-[:b]->()-[:c]->(y)
MATCH (x)-[:a*0..]->(y)
RETURN x, y

UCRPQ

Input:

(x,(a.b.c),y) & (x,a*,y) | (x,(a*.(c+(b.d))),y)

Output:

MATCH (x)-[:a]->()-[:b]->()-[:c]->(y)
MATCH (x)-[:a*0..]->(y)
RETURN x, y
UNION
MATCH (x)-[:a*0..]->()-[:c|b.d]->(y)
RETURN x, y

Code Structure

  • ParserParser.java (RPQ/CRPQ/UCRPQ grammar)
  • ExpressionsAtom.java, Union.java, Concatenation.java, Star.java
  • RewritingRewritingRules.java (makes queries Cypherable)
  • Query ModelsRPQ.java, CRPQ.java, UCRPQ.java
  • Automaton & MonoidsAutomate.java, Etat.java, Transition.java, MonoideTransition.java
  • Tests & DemoTest.java (integration), TestRegExp.java (unit tests)

Requirements

  • Java 8+
  • JUnit 5 (for tests)
  • (Optional) Neo4j (to execute generated queries)

Build & Run

# compile
javac -d out $(find . -name "*.java")

# run demo
java -cp out main.Test

# run unit tests
java -jar junit-platform-console-standalone.jar --class-path out --scan-class-path

Next Steps

  • Integrate directly with GMark.
  • Extend to property constraints (nodes & relationships).
  • Provide visualization of automata and query plans.

About

Parser and translator for Regular Path Queries (RPQ), Conjunctive RPQ (CRPQ) and Union of CRPQ (UCRPQ) into Neo4j Cypher. Includes rewriting to handle unsupported patterns, automaton-based analysis, and fixes issues found in GMark-generated Cypher queries.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages