Who's a good format descriptor?
Pup is a library to define format descriptors which can be interpreted both as a parser and as a pretty-printer. This obviates the common need to maintain separate parsers and pretty-printers.
The Pup library is focused on ergonomics, the goal is for grammars to be barely noisier than, say, a Megaparsec parser.
For instance for a type
data T = C Int Bool | D Char Bool Int
A Megaparsec parser could look like
C <$> chunk "C" *> space1 *> int <* space1 <*> bool
<|> D <$> chunk "D" *> space1 *> anySingle <* space1 <*> bool <* space1 <*> int
A Pup format descriptor for the same type could look like
#C <* chunk "C" <* space1 <*> int <* space1 <*> bool
<|> #D <* chunk "D" <* space1 <*> anySingle <* space1 <*> bool <* space1 <*> int
But the Pup descriptor can pretty-print in addition to parse.
Our article Invertible Syntax without the Tuples (published version (TBA), extended version with appendices) goes over a lot of the design decisions which went into the Pup library, in particular how it compares with previous approaches.
Here are some highlights:
- Pup uses indexed monads, the indices represent a stack in continuation-passing
form. For instance a format descriptor for natural numbers has type
We read
nat :: … => m (Int -> r) r Int
m (Int -> r) r
as meaning that the printer fornat
reads (and pops) anInt
off the stack; the finalInt
means that the parser fornat
returns anInt
(after consuming some input). - Pup doesn't implement its own parsers and printers. Instead format descriptors rely on backends. Currently, Pup provides a Megaparsec backend for parsing, and a Prettyprinter backend for printing.
- Pup's type-class based interface makes extensible. You can swap backends for parsers and printers independently. Pup even supports, at least in principle, format descriptors which can be interpreted in several parser or printer backends, should that make sense.
Pup ostensibly stands for Parser-UnParser (“unparsing” is a term coined by Danvy in his Functional unparsing, to which this library is indebted; the definition of “unparsing” is nebulous but it can be understood a pretty-printing).
But this library is also meant to build upon the prior work of, and be a more practical version of, the Cassette library. Cassette is sometimes stylised as K7 (after the French pronunciation of both words). And pups are K9.
Now you know.