Skip to content
This repository was archived by the owner on Oct 28, 2022. It is now read-only.

The Monomorphizer

Vaivaswatha N edited this page Aug 20, 2020 · 11 revisions

Monomorphizer in the Scilla Compiler

This document describes the monomorphizer [3] (specialization of polymorphic expressions) in the Scilla compiler. The compiler pass consists of the following components: (1) AST initialization (2) Type-flow analysis (3) Monomorphization

AST Initialization

We associate with each AST node, a TFA (type-flow-analysis) index. This index is the offset into a global array of tfa_els (short for TFA element). Each tfa_el contains the data-flow information we're tracking for that AST node.

Type-flow Analysis

The goal of the type-flow analysis is to compute, for each tfun 'A expression, the set of closed types that may be substituted for 'A at runtime (via the @ operator). To achieve this, the analysis keeps track of

  1. The flow of closed types from the argument of a type instantiation expression @foo, to the type variable 'A of the tfun 'A expressions that foo may resolve to at runtime.
  2. The flow of tfun expressions.
  3. Control-flow analysis: the flow of fun expressions to improve the precision of the analysis.

(2) is carried out using the same constraints as done for control-flow analysis (3) is performed without context sensitivity. (1) is context sensitive.

Analysis data element tfa_el

  (* The context of free variables in a (type) closure. *)
  type context_env = Context.t IntMap.t

  (* We track the flow of Funs and TFuns as a pair of tfa_data index
   * and the context environment of its free type variables. *)
  type clo_env = int * context_env

  type rev_ref = ExprRef of expr_annot ref | VarRef of eannot Identifier.t

  (* Data element propagated in the type-flow analysis. *)
  type tfa_el = {
    (* The TFun expressions that reach a program point or variable. *)
    reaching_tfuns : CloSet.t;
    (* The Fun expressions that reach a program point or variable. *)
    reaching_funs : CloSet.t;
    (* The closed types that reach a type variable, in a given context. *)
    reaching_ctyps : TypSet.t Context.Map.t;
    (* A back reference to who this information belongs to. *)
    elof : rev_ref;
    (* The free type variables at a TFun.
     * Useful in building the context_env for reaching_tfuns. *)
    free_tvars : int list;
    (* Is fun being analyzed already? *)
    on_analysis_stack : bool;
  }

The terms context environment is used as in [1]. The flow of both tfun and fun expressions track the flow of clo_env, which is a pair of the closure itself (identified by its TFA index) and the environment for its free type variables.

The Analysis

The analysis is an AST walk, repeated until there is no change to the tfa_el associated with any of the AST nodes. TFA elements are updated during the analysis by an include_in : ~a:tfa_el -> ~b:tfa_el -> (tfa_bol * bool) function which includes the sets in b into that of a. This is the data-flow analysis merge operation (compiler text books typically use a lattice meet operation to represent this, while program analyses texts denote it as a lattice join operation). The include_in function simply performs set inclusion for each of the three constituents of the analysis and returns if anything changed in a.

The environment for the analysis itself is defined as

  type tfa_env = {
    (* The contexts of currently free type variables. *)
    ctx_env : context_env;
    (* The current calling context. *)
    cctx : Context.t;
  }

I next outline what the analysis does for relevant AST node types. In each case, whenever a set inclusion is computed, changes, if any are noted, and will lead to one more iteration of the analysis. For an expression e:

  1. App (f, actual_args): Three operations are performed at function applications. For each tfun expression f' that may flow into f:
    1. The TFA information from the actual arguments are included in the TFA information of the corresponding formal arguments of f'.
    2. The body of f' is analyzed.
    3. The TFA information from the body of f' is included in the TFA information of function application we are currently analyzing.
  2. TFun _ | Fun _ : The expresssion e is included in its own corresponding TFA information (reaching_tfuns and reaching_funs respectively), with the right context environment for the free type variables (recall that we track the reaching closures as a pair of the closure itself and the context environment for its free type variables).
  3. TApp (tf, targs):
    1. For each free type variable 'A in the current context, the set of types that may flow into 'A is queried.
    2. For each targ in targs, substitute the free type variables in targ with the set of types that may flow into them. This involves computing all possible permutations into a final set of closed types targ_specls for each targ. For example, if we have @foo 'X -> 'Y and in the current context, 'X has Int32, Int64 flowing into it, and 'Y has String and ByStr20 flowing into it, then the actual parameter (there is only one in this example) will have the set of closed types targ_specls computed as Int32 -> String, Int32 -> ByStr20, Int64 -> String and Int64 -> ByStr20.
    3. The current context is appened with this TApp site
    4. apply env tf [targ :: remaining_args]: For each tfun 'A expression tf' that flows into tf, do the following:
      1. targ's specializations computed earlier (targ_specls) is included into the formal argument 'A.
      2. tf''s sub-expression tf'_subexpr is analyzed, with an updated current context that includes 'A as a free type variable.
      3. The environment is updated to include 'A as a free type variable. Let's call this env'.
      4. apply recursively: apply env' tf'_subexpr remaining_args.
      5. As a precision improvement heuristic, before applying tf'_subexpr, we remove those reaching_tfuns in it whose context environments are incompatible with the current one. This is helpful because reaching_tfuns is not context sensitive.

Monomorphization (Specialization of TFun expressions)

References:

  1. Principles of Program Analysis by Flemming Nielson, Hanne R. Nielson, Chris Hankin.
  2. An Efficient Type- and Control-Flow Analysis for System F - Adsit and Fluet.
  3. https://github.com/Zilliqa/scilla-compiler/blob/master/src/astlowering/Monomorphize.ml
Clone this wiki locally