Skip to content

Add experimental draft support for GPML-style graph query parsing #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Adds the capability for exporting the playground session on client side to be able to get fetched from another playground windows.
- Adds a REST API and exposes /parse for parsing the query over http request.
- Containerization using Docker.
- An experimental (pending [#15](https://github.com/partiql/partiql-docs/issues/15)) embedding of a subset of
the [GPML (Graph Pattern Matching Language)](https://arxiv.org/abs/2112.06217) graph query into the `FROM` clause,
supporting. The use within the grammar is based on the assumption of a new graph data type being added to the
specification of data types within PartiQL, and should be considered experimental until the semantics of the graph
data type are specified.
- basic and abbreviated node and edge patterns (section 4.1 of the GPML paper)
- concatenated path patterns (section 4.2 of the GPML paper)
- path variables (section 4.2 of the GPML paper)
- graph patterns (i.e., comma separated path patterns) (section 4.3 of the GPML paper)
- parenthesized patterns (section 4.4 of the GPML paper)
- path quantifiers (section 4.4 of the GPML paper)
- restrictors and selector (section 5.1 of the GPML paper)
- pre-filters and post-filters (section 5.2 of the GPML paper)

### Fixes
- Fixes the bug with AST graph PAN and ZOOM—before this change the pan and zoom was quite flaky and very hard to work with.
- Fixes the version value for the session and JSON output by ensuring it gets picked from the selected version in the UI.


## [0.1.0] - 2022-08-05
### Added
- Lexer & Parser for the majority of PartiQL query capabilities—see syntax [success](https://github.com/partiql/partiql-tests/tree/main/partiql-tests-data/success/syntax)
Expand All @@ -33,5 +47,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- PartiQL Playground proof of concept (POC)
- PartiQL CLI with REPL and query visualization features


[Unreleased]: https://github.com/partiql/partiql-lang-rust/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/partiql/partiql-lang-rust/compare/v0.1.0
151 changes: 150 additions & 1 deletion partiql-ast/src/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
use rust_decimal::Decimal as RustDecimal;

use std::fmt;
use std::num::NonZeroU32;

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};
Expand Down Expand Up @@ -207,6 +208,8 @@ pub enum Expr {
Like(AstNode<Like>),
Between(AstNode<Between>),
In(AstNode<In>),
/// <expr> MATCH <graph_pattern>
GraphMatch(AstNode<GraphMatch>),
Case(AstNode<Case>),
/// Constructors
Struct(AstNode<Struct>),
Expand Down Expand Up @@ -583,7 +586,153 @@ pub enum JoinSpec {
Natural,
}

/// GROUP BY <grouping_strategy> <group_key>[, <group_key>]... \[AS <symbol>\]
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatch {
pub expr: Box<Expr>,
pub graph_expr: Box<AstNode<GraphMatchExpr>>,
}

/// The direction of an edge
/// | Orientation | Edge pattern | Abbreviation |
/// |---------------------------+--------------+--------------|
/// | Pointing left | <−[ spec ]− | <− |
/// | Undirected | ~[ spec ]~ | ~ |
/// | Pointing right | −[ spec ]−> | −> |
/// | Left or undirected | <~[ spec ]~ | <~ |
/// | Undirected or right | ~[ spec ]~> | ~> |
/// | Left or right | <−[ spec ]−> | <−> |
/// | Left, undirected or right | −[ spec ]− | − |
///
/// Fig. 5. Table of edge patterns:
/// https://arxiv.org/abs/2112.06217
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum GraphMatchDirection {
Left,
Undirected,
Right,
LeftOrUndirected,
UndirectedOrRight,
LeftOrRight,
LeftOrUndirectedOrRight,
}

/// A part of a graph pattern
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum GraphMatchPatternPart {
/// A single node in a graph pattern.
Node(AstNode<GraphMatchNode>),

/// A single edge in a graph pattern.
Edge(AstNode<GraphMatchEdge>),

/// A sub-pattern.
Pattern(AstNode<GraphMatchPattern>),
}

/// A quantifier for graph edges or patterns. (e.g., the `{2,5}` in `MATCH (x)->{2,5}(y)`)
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatchQuantifier {
pub lower: u32,
pub upper: Option<NonZeroU32>,
}

/// A path restrictor
/// | Keyword | Description
/// |----------------+--------------
/// | TRAIL | No repeated edges.
/// | ACYCLIC | No repeated nodes.
/// | SIMPLE | No repeated nodes, except that the first and last nodes may be the same.
///
/// Fig. 7. Table of restrictors:
/// https://arxiv.org/abs/2112.06217
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum GraphMatchRestrictor {
Trail,
Acyclic,
Simple,
}

/// A single node in a graph pattern.
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatchNode {
/// an optional node pre-filter, e.g.: `WHERE c.name='Alarm'` in `MATCH (c WHERE c.name='Alarm')`
pub prefilter: Option<Box<Expr>>,
/// the optional element variable of the node match, e.g.: `x` in `MATCH (x)`
pub variable: Option<SymbolPrimitive>,
/// the optional label(s) to match for the node, e.g.: `Entity` in `MATCH (x:Entity)`
pub label: Option<Vec<SymbolPrimitive>>,
}

/// A single edge in a graph pattern.
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatchEdge {
/// edge direction
pub direction: GraphMatchDirection,
/// an optional quantifier for the edge match
pub quantifier: Option<AstNode<GraphMatchQuantifier>>,
/// an optional edge pre-filter, e.g.: `WHERE t.capacity>100` in `MATCH −[t:hasSupply WHERE t.capacity>100]−>`
pub prefilter: Option<Box<Expr>>,
/// the optional element variable of the edge match, e.g.: `t` in `MATCH −[t]−>`
pub variable: Option<SymbolPrimitive>,
/// the optional label(s) to match for the edge. e.g.: `Target` in `MATCH −[t:Target]−>`
pub label: Option<Vec<SymbolPrimitive>>,
}

/// A single graph match pattern.
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatchPattern {
pub restrictor: Option<GraphMatchRestrictor>,
/// an optional quantifier for the entire pattern match
pub quantifier: Option<AstNode<GraphMatchQuantifier>>,
/// an optional pattern pre-filter, e.g.: `WHERE a.name=b.name` in `MATCH [(a)->(b) WHERE a.name=b.name]`
pub prefilter: Option<Box<Expr>>,
/// the optional element variable of the pattern, e.g.: `p` in `MATCH p = (a) −[t]−> (b)`
pub variable: Option<SymbolPrimitive>,
/// the ordered pattern parts
pub parts: Vec<GraphMatchPatternPart>,
}

/// A path selector
/// | Keyword
/// |------------------
/// | ANY SHORTEST
/// | ALL SHORTEST
/// | ANY
/// | ANY k
/// | SHORTEST k
/// | SHORTEST k GROUP
///
/// Fig. 8. Table of restrictors:
/// https://arxiv.org/abs/2112.06217
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub enum GraphMatchSelector {
AnyShortest,
AllShortest,
Any,
AnyK(NonZeroU32),
ShortestK(NonZeroU32),
ShortestKGroup(NonZeroU32),
}

/// A graph match clause as defined in GPML
/// See https://arxiv.org/abs/2112.06217
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GraphMatchExpr {
pub selector: Option<GraphMatchSelector>,
pub patterns: Vec<AstNode<GraphMatchPattern>>,
}

/// GROUP BY <grouping_strategy> <group_key_list>... \[AS <symbol>\]
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct GroupByExpr {
Expand Down
17 changes: 17 additions & 0 deletions partiql-parser/benches/bench_parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,20 @@ const Q_COMPLEX_FEXPR: &str = r#"
AS deltas FROM SOURCE_VIEW_DELTA_FULL_TRANSACTIONS delta_full_transactions
"#;

const Q_COMPLEX_MATCH: &str = r#"
SELECT (
SELECT numRec, data
FROM
(deltaGraph MATCH (t) -[:hasChange]-> (dt), (dt) -[:checkPointedBy]-> (t1)),
(
SELECT foo(u.id), bar(review), rindex
FROM delta.data as u CROSS JOIN UNPIVOT u.reviews as review AT rindex
) as data,
delta.numRec as numRec
)
AS deltas FROM SOURCE_VIEW_DELTA_FULL_TRANSACTIONS delta_full_transactions
"#;

fn parse_bench(c: &mut Criterion) {
fn parse(text: &str) -> ParserResult {
Parser::default().parse(text)
Expand All @@ -45,6 +59,9 @@ fn parse_bench(c: &mut Criterion) {
c.bench_function("parse-complex-fexpr", |b| {
b.iter(|| parse(black_box(Q_COMPLEX_FEXPR)))
});
c.bench_function("parse-complex-match", |b| {
b.iter(|| parse(black_box(Q_COMPLEX_MATCH)))
});
}

criterion_group! {
Expand Down
36 changes: 32 additions & 4 deletions partiql-parser/src/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,8 @@ pub enum Token<'input> {
Caret,
#[token(".")]
Period,
#[token("~")]
Tilde,
#[token("||")]
DblPipe,

Expand Down Expand Up @@ -512,10 +514,14 @@ pub enum Token<'input> {
// Keywords
#[regex("(?i:All)")]
All,
#[regex("(?i:Acyclic)")]
Acyclic,
#[regex("(?i:Asc)")]
Asc,
#[regex("(?i:And)")]
And,
#[regex("(?i:Any)")]
Any,
#[regex("(?i:As)")]
As,
#[regex("(?i:At)")]
Expand Down Expand Up @@ -576,6 +582,8 @@ pub enum Token<'input> {
Like,
#[regex("(?i:Limit)")]
Limit,
#[regex("(?i:Match)")]
Match,
#[regex("(?i:Missing)")]
Missing,
#[regex("(?i:Natural)")]
Expand Down Expand Up @@ -612,8 +620,14 @@ pub enum Token<'input> {
Time,
#[regex("(?i:Timestamp)")]
Timestamp,
#[regex("(?i:Simple)")]
Simple,
#[regex("(?i:Shortest)")]
Shortest,
#[regex("(?i:Then)")]
Then,
#[regex("(?i:Trail)")]
Trail,
#[regex("(?i:True)")]
True,
#[regex("(?i:Union)")]
Expand Down Expand Up @@ -642,9 +656,11 @@ impl<'input> Token<'input> {
pub fn is_keyword(&self) -> bool {
matches!(
self,
Token::All
Token::Acyclic
| Token::All
| Token::Asc
| Token::And
| Token::Any
| Token::As
| Token::At
| Token::Between
Expand All @@ -671,6 +687,7 @@ impl<'input> Token<'input> {
| Token::Left
| Token::Like
| Token::Limit
| Token::Match
| Token::Missing
| Token::Natural
| Token::Not
Expand All @@ -689,7 +706,10 @@ impl<'input> Token<'input> {
| Token::Table
| Token::Time
| Token::Timestamp
| Token::Simple
| Token::Shortest
| Token::Then
| Token::Trail
| Token::Union
| Token::Unpivot
| Token::Using
Expand Down Expand Up @@ -736,6 +756,7 @@ impl<'input> fmt::Display for Token<'input> {
Token::Slash => write!(f, "/"),
Token::Caret => write!(f, "^"),
Token::Period => write!(f, "."),
Token::Tilde => write!(f, "~"),
Token::DblPipe => write!(f, "||"),
Token::UnquotedIdent(id) => write!(f, "<{}:UNQUOTED_IDENT>", id),
Token::QuotedIdent(id) => write!(f, "<{}:QUOTED_IDENT>", id),
Expand All @@ -748,9 +769,11 @@ impl<'input> fmt::Display for Token<'input> {
Token::EmbeddedIonQuote => write!(f, "<ION>"),
Token::Ion(txt) => write!(f, "<{}:ION>", txt),

Token::All
Token::Acyclic
| Token::All
| Token::Asc
| Token::And
| Token::Any
| Token::As
| Token::At
| Token::Between
Expand Down Expand Up @@ -781,6 +804,7 @@ impl<'input> fmt::Display for Token<'input> {
| Token::Left
| Token::Like
| Token::Limit
| Token::Match
| Token::Missing
| Token::Natural
| Token::Not
Expand All @@ -799,7 +823,10 @@ impl<'input> fmt::Display for Token<'input> {
| Token::Table
| Token::Time
| Token::Timestamp
| Token::Simple
| Token::Shortest
| Token::Then
| Token::Trail
| Token::True
| Token::Union
| Token::Unpivot
Expand Down Expand Up @@ -836,7 +863,8 @@ mod tests {
"WiTH Where Value uSiNg Unpivot UNION True Select right Preserve pivoT Outer Order Or \
On Offset Nulls Null Not Natural Missing Limit Like Left Lateral Last Join \
Intersect Is Inner In Having Group From For Full First False Except Escape Desc \
Cross Table Time Timestamp Date By Between At As And Asc All Values Case When Then Else End";
Cross Table Time Timestamp Date By Between At As And Asc All Values Case When Then Else End \
Match Any Shortest Trail Acyclic Simple";
let symbols = symbols.split(' ').chain(primitives.split(' '));
let keywords = keywords.split(' ');

Expand All @@ -858,7 +886,7 @@ mod tests {
"<unquoted_atident:UNQUOTED_ATIDENT>", "GROUP", "<quoted_atident:QUOTED_ATIDENT>",
"FROM", "FOR", "FULL", "FIRST", "FALSE", "EXCEPT", "ESCAPE", "DESC", "CROSS", "TABLE",
"TIME", "TIMESTAMP", "DATE", "BY", "BETWEEN", "AT", "AS", "AND", "ASC", "ALL", "VALUES",
"CASE", "WHEN", "THEN", "ELSE", "END"
"CASE", "WHEN", "THEN", "ELSE", "END", "MATCH", "ANY", "SHORTEST", "TRAIL", "ACYCLIC", "SIMPLE"
];
let displayed = toks
.into_iter()
Expand Down
Loading