Skip to content

alanpq/starq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

starq (starq)

An (unfinished) jq-style cli tool - primarily written to get hands on experience with writing data-oriented parsers, and having fun with stack-based filter evaluation. It leverages serde to support a wide range of file formats easily1.

Implementation

Expression parsing is done via a Pratt parser, taking reference from matklad's article on Simple but Powerful Pratt Parsing. The resulting AST structure is represented implicitly, with no nested heap allocations:

enum TokenTree {
    Atom(AtomIndex),
    UnaryOp(Op),
    BinaryOp(Op),
    /// '|' syntax
    Pipe(TTIndex, TTIndex),
    CollectObject(usize),
    /// Function call (`f(..)`)
    Call,
}
// Op is a simple enum for each operator type
// TTIndex & AtomIndex are wrappers around u16 to avoid indexing the wrong vector.

Atoms are stored separately in a single vector, which we index into via u16's, allowing us to share one heap allocation for all atoms, and also letting us only 2 bytes per atom token, rather than (usually) 4 or 8 bytes if we used native pointers.

Keeping the tree structure implicit works because the AST is so simple. We can evaluate expressions from start to finish, pulling from a stack to get operator arguments, and pushing our results back to that stack. (e.g. BinaryOp will always2 pop the last two tokens)

Generators

starq operators & functions can generate more than 1 output for each input.

For example, we can pass a single array into starq:

$ echo '[0,1,2]' | starq -c '.'
[0.0,1.0,2.0]

But by iterating (.[]) over that array, we create a generator for each item in that array - effectively 'exploding' that array into individual items.

$ echo '[0,1,2]' | starq -c '.[]'
0.0
1.0
2.0

This lets us do operations on each item in that array,

$ echo '[0,1,2]' | starq -c '.[] * 5'
0.0
5.0
10.0

and we can collect generators back into arrays:

$ echo '[0,1,2]' | starq -c '.[] * 5 | [.]'
[0.0,5.0,10.0]

We represent generators as a 'marker' atom, that holds the size of the generator:

enum Atom {
    Null,
    // ...
    Number(f64),
    Bool(bool),
    // ...
    Generator(usize),
}

In the context of an actively evaluating expression, a Generator(n) atom must always be preceeded by n atoms in the stack. That way, whenever an operator is pulling from the stack for arguments, it knows to pop all the contents of any generators it meets.

How an operator actually deals with generators depends on whether it's binary or unary, and in the case of binary operators, which of the two arguments are/aren't generators. Thankfully, most3 operators within the same category behave the same around generators, so we can 'unfold' generators before delegating to the actual operator implementation.

For unary operators, we just need to call the underlying operator multiple times - one for each element in the generator.

For binary operators, things are a little more intricate:

  • If only one of the arguments is a generator, we run the underlying operator once for each item in the generator.
  • If both arguments are generators, we need to run the underlying operator for each pair in the cartesian product of our two generators.
    • e.g passing (1,2) & (3,5) to the multiplication operator gives us:
      • 1 * 3 = 3
      • 1 * 5 = 5
      • 2 * 3 = 6
      • 2 * 5 = 10

Numbers

Currently, numbers in starq are always represented as 64-bit floats like jq, although this may change in the future.

Indexing into an array will silently truncate the given index's fractional value:

$ starq -n '[0,1,2] | .[1.5662]'
1.0

Features

starq intends to mostly support jq's syntax, but will extend/replace language features wherever I personally don't like it.

The following is an overview of:

  • jq features supported by starq
  • jq features not yet implemented, or not planned

(thanks to 01mf02's jaq for this list)

Basics

  • Identity (.)
  • Recursion (..)
  • Basic data types (null, bool, string, number, array, object)
  • If-then-else (if .. then .. else .. end)
  • Folding (reduce .[] as $x (0; . + $x), foreach .[] as $x (0; . + $x; . + .))
  • Try-catch (try .. catch ..)
  • Breaking (label $x | f | ., break $x)
  • String interpolation ("The successor of \(.) is \(.+1).)
  • Format strings (@json, @text, @csv, etc)

Paths

  • Array/object indexing (.[1], .foo)
  • Iterating over arrays/objects (.[])
  • Optional indexing/iteration (.foo?, .[]?)
  • Array slicing (.[3:7], .[0:-1])
  • String slicing

Operators

  • Composition (|)
  • Variable binding (. as $foo | $foo)
  • Pattern binding (. as {a: [$foo, {("b", "c"): $bar, $baz}]} | $foo, $bar, $baz)
  • Concatenation (,)
  • Plain assignment (=)
  • Update assignment (|=)
  • Arithmetic update assignment (+=, -=, etc)
  • Alternative operator (//)
  • Logic (or, and)
  • Comparison (<, ==, etc)
  • Arithmetic (+, -, *, %, etc)
  • Negation (-)

Filters

Only a handful of filters are implemented:

  • sqrt, log, abs, round, floor
  • sin, cos, tan
  • keys, values

Notably, these filters are implemented as intrinsics, unlike jq - which defines them via more basic filters. This is temporary until those more basic filters are implemented in starq.

Footnotes

  1. Only JSON support is implemented as of now though :]

  2. If there are any generators as operator arguments, it has to also pop the atoms involved in those generators

  3. Currently, the concat (,) & array literal ([.]) operators are the exceptions, since they uniquely interact with generators (concatenation creates a generator, array literals can collect a generator into an array).

About

*q - jq-style cli tool for JSON, YAML, TOML, etc

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published