-
Notifications
You must be signed in to change notification settings - Fork 46
Links Desugaring Pipeline
Desugaring pipeline defined in frontend.ml. There are separate pipelines for
compilation and REPL.
Compilation pipeline is invoked from Loader.read_file_source and from
bin/links.ml via bin/driver.ml. At a high level the invocation starts with
let sugar, pos_context = ModuleUtils.try_parse_file filename`
try_parse_file calls Parse.read (via Parse.parse_file), which returns a
tuple of type 'result * SourceCode.source_code. Parse.read directly invokes
the Parser.file, which is an entry point for the parsing of source
files. 'result of parsing a file is of type (Sugartypes.binding list * Sugartypes.phrase option), which is a list of declarations in a file paired
with optional expression. SourceCode.source_code is an object for handling
source file (extracting line ranges, etc.).
Output of a compilation pipeline is of type ((Sugartypes.program * Types.datatype * Types.typing_environment) * string list). Importantly,
Sugartypes.program is the desugared program and string list is a list of FFI
files.
Program compilation pipeline is defined as a function that takes tyenv,
pos_context and program. pos_context is the same SourceCode.source_code
object that was returned by parsing a file; program is a tuple of declarations
and an optional phrase.
-
Resolve positions by replacing
(start, finish, None)with(start, finish, Some pos_context):(ResolvePositions.resolve_positions pos_context)#program program -
Desugar alien blocks:
DesugarAlienBlocks.transform_alien_blocks program -
Desugar modules (optional, only if modules enabled). Returns a program with desugared modules paired with a list of filenames for FFI bindings (filenames stored in
Foreignnodes). Note:DesugarModules.flatten_simpleandDesugarAlienBlocks.flatten_simpleare identical, butflatten_bindingsare not. -
Check correctness of XML quasi quotes:
CheckXmlQuasiquotes.checker#program programRaises an error if something is wrong. If everything correct, the result is discarded.
-
Check correctness of settings for session exceptions, handlers, and relational lenses:
DesugarSessionExceptions.settings_check programExperimentalExtensions.check#program
Main desugaring pipeline is a chained sequence of calls to desugar modules:
((( (* Early source transformations on the AST *)
ExperimentalExtensions.check#program
->- before_typing_ext session_exceptions DesugarSessionExceptions.wrap_linear_handlers
->- DesugarHandlers.desugar_handlers_early#program
->- DesugarLAttributes.desugar_lattributes#program
->- RefineBindings.refine_bindings#program
->- DesugarDatatypes.program tyenv.Types.tycon_env
(* Typechecking the AST *)
->- TypeSugar.Check.program tyenv
(* Typability preserving late source transformations *)
->- after_typing ((DesugarCP.desugar_cp tyenv)#program ->- snd3)
->- after_typing ((DesugarInners.desugar_inners tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program ->- snd3)
->- after_typing_ext session_exceptions ((DesugarSessionExceptions.desugar_session_exceptions tyenv)#program ->- snd3)
->- after_typing ((DesugarProcesses.desugar_processes tyenv)#program ->- snd3)
->- after_typing ((DesugarDbs.desugar_dbs tyenv)#program ->- snd3)
->- after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
->- after_typing ((DesugarRegexes.desugar_regexes tyenv)#program ->- snd3)
->- after_typing ((DesugarFormlets.desugar_formlets tyenv)#program ->- snd3)
->- after_typing ((DesugarPages.desugar_pages tyenv)#program ->- snd3)
->- after_typing ((DesugarFuns.desugar_funs tyenv)#program ->- snd3))
program), ffi_files)Meaning of combinators:
-
->-chains calls one by one -
snd3returns second component of triple -
after_typingcalls a function on a first component of a triple -
_extruns a pass conditionally, if a given extension is enabled.
Importantly, many desugaring transformations derive from
TransformSugar.transform, which defines the type of program method as:
program -> ('self_type * program * Types.datatype option)
So a single step in the late desugaring pipeline, e.g.:
after_typing ((DesugarFors.desugar_fors tyenv)#program ->- snd3)
takes a (Sugartypes.program * Types.datatype * Types.typing_environment)
triple as an input and calls a desugaring on the first component
(after_typing). That call returns a triple, of which we take the second
component (snd3).
In a mail to Links-dev on 22/11/2018 I asked about the details of Links compilation pipeline and Daniel provided a detailed response, which I quote below:
There are two distinct notions of desugaring in Links: 1) source-to-source transformation, and 2) translation between two separate languages. To the best of my knowledge, there are three separate phases that performs source-to-source transformations, and two separate phases that translates between internal languages. A bird's eye view of the compilation pipeline might look something like (where the pseudo types in brackets denotes a pseudo signature for the respective phase):
- Parsing textual source into an AST [Textual source -> Sugartypes]
- Early source transformations on the AST [Sugartypes -> Sugartypes]
- Typechecking the AST [Sugartypes * type_env -> Sugartypes * type * type_env]
- Typability preserving late source transformations [Sugartypes * type * type_env -> Sugartypes * type * type_env]
- Desugaring of the AST into IR [env * Sugartypes -> Ir * name_env] 5.5. (Optional) Typability preserving source transformations (or optimisations) on IR [type_env * Ir -> Ir * type]
- Evaluation directly on the IR... [ Value.env * IR -> Value.t * Value.env ] (Value.env maps IR names to abstract machine Value terms)
- or desugaring of IR into CODE (the JavaScript intermediate language) [ Value.env * IR -> CODE * value_env ] (value_env maps IR names to JavaScript strings)
The early source-to-source transformation phase identifies and groups recursive functions, expands type aliases, fills in kind information on type variables, etc. Sometimes the phase is also exploited to perform some purely syntactic elaboration (e.g. the phrase
handler h(x) { M }is elaborated intofun h(x) { handle(x()) { M } }).Furthermore the transformations performed in the second phase are untyped, whilst the transformations performed in the fourth phase are (intended) to preserve typability, in particular, such transformations are obliged to return the type of the transformed expression which may involve a transformation on types.
The fifth phase is akin to the desugaring phase from AST to Core that you are familiar with in GHC. Although, in Links this phase performs name resolution too.
It is worth noting that some of the transformations in phase 2 and 4 are unhygienic as they depend on surface names and sometimes the particular source order of functions in prelude (the source order sensitivity might be due to the grouping hack in refineBindings.ml).
And some more information from a follow-up email:
What is the intended behaviour if a fragment of code that is being transformed is incorrect? Is it expected that an error is raised in such a case? Or are such errors ignored and caught later on?
I suppose the intention is that the type checker will catch any semantic error. The ad-hoc nature of the early transformations make it is difficult to debug errors arising from them.
Now I'm wondering how typechecking works without names being resolved first...
I suppose technically names are resolved "on-the-fly", but they are not assigned unique names. The type checker passes an environment that contains an immutable map from surface names (strings) to types (specifically the types defined in types.ml).
Would it improve matters if we had a separate phase 1.5 that resolved names?
It would be more robust to resolve names earlier, and move the few special functions in the prelude into the compiler itself. Then we could use a structured API to query the resolved names of these built-ins. It would also allow us to get rid of the type-patching hack that runs after type checking. This hack patches up the inferred types of the special functions in the prelude, for example the wild effect is dropped in the signature of
concatMapsuch that it can be compiled to the database.
Entry point: DesugarAlienBlocks.transform_alien_blocks program
Invariants before: Positions resolved to (start, finish, Some pos_context), where pos_context is a SourceCode.source_code
object returned by parsing.
Invariants after: All AlienBlock bindings replaced with Foreign
bindings.
Program is a list of bindings and an optional phrase. This pass builds a new
list of bindings and keeps the phrase. A list of bindings inside an
AlienBlock gets flattened to a list of Foreign binding, so:
alien javascript "test.js" {
setTitle : (String) ~> ()
alertBox : (String) ~> ()
}
gets replaced with:
alien javascript "test.js" setTitle : (String) ~> ()
alien javascript "test.js" alertBox : (String) ~> ()
Bindings inside Module and Blocks get flattened in the same way, but they
are obviously kept inside their respective modules/blocks and not floated to the
top level. flatten_simple is responsible for block flattening. As pointed out
earlier the same block flattening mechanism is used in DesugarModules.
Note: Skipping a more detailed description due to ongoing work on the module system, which will very likely change the behaviour of current desugaring.
Entry point: DesugarModules.desugarModules prog_with_deps
Invariants before:
Invariants after: All Module and QualifiedImport bindings are removed.
There are definitely some extra invariants here (c.f. resolve function).
Entry point: DesugarSessionExceptions.wrap_linear_handlers
Invariants before:
Invariants after: TryInOtherwise phrases wrapped inside a switch
statement.
This transformation replaces try L as x in M otherwise N with:
switch (try L as x in Just(x) otherwise Nothing) {
case Just(x) -> M
case Nothing -> N
}
There's a detailed comment in the source code describing the rationale behind this transformation, but I don't understand it.
Entry point: DesugarHandlers.desugar_handlers_early#program
Invariants before:
Invariants after:
Entry point: DesugarLAttributes.desugar_lattributes#program
Invariants before: attributes starting with "l:" appear only as "l:action", "l:name" or "l:on*" attributes in XML forms or as "l:href" in link () XML tags.
At the beginning of the source file there is a comment that says "l:href" and "l:action" attributes should be disallowed in client-side XML, but I don't see any code that would enforce that in any way or that would cause errors because of this. Desugared attribute functions receive Server location though.
Invariants after: No Xml phrasenodes contain attributes with names that start with "l:". This is enforced in replace_lattrs` function that runs a
check after desugaring and raises an error if it finds any attribute with name
starting with "l:".
I don't really understand the details of what the transformation generates, since I'm not familiar with Links web programming model.
Question: Where does the "l:" prefix come from? Is this a special convention introduced in Links?
Note: discarding of source positions is definitely too eager in desugar_form
and desugar_lnames. We traverse XML children nodes to modify the attributes.
These new attributes should have dummy positions, but the XML children nodes
should maintain their existing positions. Reported as #480.
Note: from a quick glimpse it seems that this is responsible for identifying groups of mutually recursive bindings. Since there is an ongoing work on this (#457) I'm skipping this module for now since it is likely to change.
Entry point: RefineBindings.refine_bindings#program
Invariants before: Funs are absent in the AST. They are not generated by
the parser but introduced in this step.
Invariants after:
Note: Daniel mentioned that he's working on changing this module as part of work on the module system. Skipping for now.
Entry point: DesugarDatatypes.program tyenv.Types.tycon_env
Invariants before:
Invariants after:
Entry point: TypeSugar.Check.program tyenv
Invariants before: Links Prelude is loaded and typechecked, with the results
stored in tyenv.
Invariants after: (Presumed) Program should be semantically correct.
Typechecking algorithm takes type environment (contains Prelude), bindings and
body. Bindings are typechecked and results are contained in a new environment
tyenv'. Body is typechecked in an extended environment (tyenv + tyenv').
Result of typechecking is (bindings, Some body), typ, tyenv'. Observation: we
only return tyenv', i.e. environment generated by typechecking bindings. This
does NOT contain original environment passed to the typechecking function. In
other words, typing environments don't accumulate.
Entry point: (DesugarCP.desugar_cp tyenv)#program
Invariants before:
Invariants after: CP nodes are absent from the AST, mostly replaced with
a block containing a binding (or a group of bindings) and an expression.
Entry point: (DesugarInners.desugar_inners tyenv)#program
Invariants before: binders in Funs declarations are typed (second binder
component Types.datatype option is Some); (outer, extras) field exists
(see below).
Invariants after: extras in Funs declarations are None (see below).
Also, extras are unbound in the resulting environment because "any existing
functions with the same name will now be shadowed by the functions defined in
this binding - and hence will not need any extra type variables adding".
Comment in the source code says:
Recursive functions must be used monomorphically inside their function bodies (unless annotated with a polymorphic type), but they may still be used polymorphically outside (after having been generalised). This pass unifies the inner and outer types of functions, which is necessary for converting to the IR, which stores a single type for each recursive function.
Traverses TAppl, named InfixAppl and UnaryAppl, Spawn, Escape and
Block phrasenodes but doesn't modify them, extends only the typing
environment. Also traverses groups of function bindings (Funs). Fun
bindings are ignored - presumably because we are only interested in recursive
bindings (does this mean Fun binding cannot be recursive?).
Note: I don't really understand the meaning of arguments to Funs constructor,
in particular the (Types.datatype * Types.quantifier option list) option bit,
which is referred to as Some (outer, extras). What are outer and extras?
Entry point: (DesugarSessionExceptions.insert_toplevel_handlers tyenv)#program
Invariants before:
Invariants after: bodies of Spawn pharsenodes (except for Wait) are
enclosed inside a TryOrOtherwise. TODO: write down the exact nature of the
transformation.
Entry point: (DesugarSessionExceptions.desugar_session_exceptions tyenv)#program
Invariants before:
Invariants after: Raise and TryInOtherwise phrasenodes are removed from
AST. Raise is replaced with DoOperation, TryInOtherwise with a
constructed deep Handle.
Entry point: (DesugarProcesses.desugar_processes tyenv)#program
Invariants before:
Invariants after: Spawn phrasenodes are replaced with calls to builtin
functions spawnWait/spawnAt/spawnAngelAt. Receive prasenodes are
replaced with Switch phrasenodes.
Entry point: (DesugarDbs.desugar_dbs tyenv)#program
Invariants before:
Invariants after:
Entry point: (DesugarFors.desugar_fors tyenv)#program
Invariants before:
Invariants after: Iteration phrasenode and all iterpatt (comprehension
generators) are absent from the AST.
Desugars for comprehensions into primitives (for -> concatMap, order by
-> sortBy, where -> if then else).
Entry point: (DesugarRegexes.desugar_regexes tyenv)#program
Invariants before: Regex phrasenodes appear only as the right argument of
RegexMatch binop. And the other way around: if anything else than a Regex
appear as a rhs of RegexMatch an exception is raised.
Invariants after: No Regex phrasenodes and no RegexMatch binops are
present in the tree, which implies that expressions of regex AST are erased
completely. All Regexes are converted to phrases. Only variables appear in
regexes. All spliced expressions are replaced with let-bindings placed in a
block with the regex they are used in.
Entry point: (DesugarFormlets.desugar_formlets tyenv)#program
Invariants before: Formlet body contains only: FormBinding, Xml,
TextNode or Block phrases.
Invariants after: No Formlet phrasenodes in AST. TextNode and
FormBinding that were located inside Formlet body are eliminated as well.
Entry point: (DesugarPages.desugar_pages tyenv)#program
Invariants before: Page contains only: FormletPlacement,
PagePlacement, Xml, TextNode or Block phrases.
Invariants after: No Page phrasenodes in AST. FormletPlacement and
PagePlacement placed directly under Page are also eliminated. I wonder
whether the intention to eliminate FormletPlacement and PagePlacement
phrasenodes completely. Currently this is not enforced, as hinted by a TODO
note. Firstly, such nodes might appear outside of Page phrasenode since
primary_expression production in the parser admits XML expressions.
Secondly, they might appear somewhere in a Block expression. Note that
Block expressions can only appear under a Page expressions if they have
been introduced by desugaring - they cannot be created by the parser.
Entry point: (DesugarFuns.desugar_funs tyenv)#program
Invariants before:
Invariants after: Section (Project e) are eliminated by turning them into
function bindings enclosed inside a block:
(.l)
-->
{ fun f(x:r) {x.l}; f }
Note that this is different from what the source code comment in DesugarFuns
says. FunLit (anonymous functions) are named, uncurried, and eliminated in
the same way (again, comment says something else):
fun [qs](xs1)...(xsk) {e}
-->
{fun f[qs](xs1) {
fun f1[](xs2) {
...
fun fk[](xsk) {
e
}; fk
...
}; f1
}; f}
Function bindings (Fun, Funs) are uncurried in the same way.