An (unfinished) jq-style cli tool - primarily written to get hands on experience with writing data-oriented parsers, and having fun with stack-based filter evaluation. It leverages serde to support a wide range of file formats easily1.
Expression parsing is done via a Pratt parser, taking reference from matklad's article on Simple but Powerful Pratt Parsing. The resulting AST structure is represented implicitly, with no nested heap allocations:
enum TokenTree {
Atom(AtomIndex),
UnaryOp(Op),
BinaryOp(Op),
/// '|' syntax
Pipe(TTIndex, TTIndex),
CollectObject(usize),
/// Function call (`f(..)`)
Call,
}
// Op is a simple enum for each operator type
// TTIndex & AtomIndex are wrappers around u16 to avoid indexing the wrong vector.
Atoms are stored separately in a single vector, which we index into via u16
's,
allowing us to share one heap allocation for all atoms, and also letting us only
2 bytes per atom token, rather than (usually) 4 or 8 bytes if we used native
pointers.
Keeping the tree structure implicit works because the AST is so simple. We can
evaluate expressions from start to finish, pulling from a stack to get operator
arguments, and pushing our results back to that stack. (e.g. BinaryOp
will
always2 pop the last two tokens)
starq operators & functions can generate more than 1 output for each input.
For example, we can pass a single array into starq:
$ echo '[0,1,2]' | starq -c '.'
[0.0,1.0,2.0]
But by iterating (.[]
) over that array, we create a generator for each item in
that array - effectively 'exploding' that array into individual items.
$ echo '[0,1,2]' | starq -c '.[]'
0.0
1.0
2.0
This lets us do operations on each item in that array,
$ echo '[0,1,2]' | starq -c '.[] * 5'
0.0
5.0
10.0
and we can collect generators back into arrays:
$ echo '[0,1,2]' | starq -c '.[] * 5 | [.]'
[0.0,5.0,10.0]
We represent generators as a 'marker' atom, that holds the size of the generator:
enum Atom {
Null,
// ...
Number(f64),
Bool(bool),
// ...
Generator(usize),
}
In the context of an actively evaluating expression, a Generator(n)
atom must always
be preceeded by n
atoms in the stack. That way, whenever an operator is
pulling from the stack for arguments, it knows to pop all the contents of any generators it meets.
How an operator actually deals with generators depends on whether it's binary or unary, and in the case of binary operators, which of the two arguments are/aren't generators. Thankfully, most3 operators within the same category behave the same around generators, so we can 'unfold' generators before delegating to the actual operator implementation.
For unary operators, we just need to call the underlying operator multiple times - one for each element in the generator.
For binary operators, things are a little more intricate:
- If only one of the arguments is a generator, we run the underlying operator once for each item in the generator.
- If both arguments are generators, we need to run the underlying operator for
each pair in
the cartesian product of
our two generators.
- e.g passing
(1,2)
&(3,5)
to the multiplication operator gives us:1 * 3 = 3
1 * 5 = 5
2 * 3 = 6
2 * 5 = 10
- e.g passing
Currently, numbers in starq are always represented as 64-bit floats like jq, although this may change in the future.
Indexing into an array will silently truncate the given index's fractional value:
$ starq -n '[0,1,2] | .[1.5662]'
1.0
starq intends to mostly support jq's syntax, but will extend/replace language features wherever I personally don't like it.
The following is an overview of:
- jq features supported by starq
- jq features not yet implemented, or not planned
(thanks to 01mf02's jaq for this list)
- Identity (
.
) - Recursion (
..
) - Basic data types (null, bool, string, number, array, object)
- If-then-else (
if .. then .. else .. end
) - Folding (
reduce .[] as $x (0; . + $x)
,foreach .[] as $x (0; . + $x; . + .)
) - Try-catch (
try .. catch ..
) - Breaking (
label $x | f | ., break $x
) - String interpolation (
"The successor of \(.) is \(.+1).
) - Format strings (
@json
,@text
,@csv
, etc)
- Array/object indexing (
.[1]
,.foo
) - Iterating over arrays/objects (
.[]
) - Optional indexing/iteration (
.foo?
,.[]?
) - Array slicing (
.[3:7]
,.[0:-1]
) - String slicing
- Composition (
|
) - Variable binding (
. as $foo | $foo
) - Pattern binding (
. as {a: [$foo, {("b", "c"): $bar, $baz}]} | $foo, $bar, $baz
) - Concatenation (
,
) - Plain assignment (
=
) - Update assignment (
|=
) - Arithmetic update assignment (
+=
,-=
, etc) - Alternative operator (
//
) - Logic (
or
,and
) - Comparison (
<
,==
, etc) - Arithmetic (
+
,-
,*
,%
, etc) - Negation (
-
)
Only a handful of filters are implemented:
sqrt
,log
,abs
,round
,floor
sin
,cos
,tan
keys
,values
Notably, these filters are implemented as intrinsics, unlike jq - which defines them via more basic filters. This is temporary until those more basic filters are implemented in starq.
Footnotes
-
Only JSON support is implemented as of now though :] ↩
-
If there are any generators as operator arguments, it has to also pop the atoms involved in those generators ↩
-
Currently, the concat (
,
) & array literal ([.]
) operators are the exceptions, since they uniquely interact with generators (concatenation creates a generator, array literals can collect a generator into an array). ↩