-
Notifications
You must be signed in to change notification settings - Fork 21
Source processing flow
An overview of the operations fortran-src does on Fortran source.
Note that the online fortran-src Haddock documentation may be useful as a reference while reading this. Click on one of the modules to be taken to its documentation page. Many of the types mentioned here are defined in Language.Fortran.AST
.
At the highest level, fortran-src parses Fortran source code in Language.Fortran.Parser. This module exports various functions for parsing Fortran, along with primitives for defining your own parser.
Language.Fortran.Parser.byVer selects the default parser for the given Fortran version (see Language.Fortran.Version). A default parser does the following:
- Lex & parse (simultaneous)
- Post-parse transform
A successful initial parse generates a ProgramFile
. There are also "smaller" parsers which will only parse a subset of the ProgramFile
, like a single Block
or Expression
.
After parsing, you may wish to run analyses on the code. (The variable renaming pass at Language.Fortran.Analysis.Renaming is generally prior to running any of these.) fortran-src provides 3 top-level analyses:
- Type analysis: Language.Fortran.Analysis.Types
- Basic block analysis: Language.Fortran.Analysis.BBlocks
- Dataflow analysis (requires basic blocks): Language.Fortran.Analysis.DataFlow
This is all handled by the Happy parser generator. See e.g. Language.Fortran.Parser.Free.Fortran90 - programParser
is a magical definition, generated for us by Happy. Lexing occurs alongside parsing. Parsers inside Language.Fortran.Parser.Free
use the free-form Fortran syntax (and appropriate lexer), while those in Parser.Fixed
use fixed-form.
Due to Fortran's idiosyncrasies, some syntax is ambiguous until we inspect the AST a bit closer. Rather than doing this in the parser (awkward with a parser generator like Happy), we separate these passes out into a set of post-parse transformations. These are by default applied depending on the Fortran version you're parsing. These transformations include:
- Function call disambiguation: Language.Fortran.Transformation.Disambiguation.Function
- Subscript syntax is ambiguous whether
f(x)
means calling functionf
with argumentx
, or accessing the element at indexx
in the arrayf
. Disambiguating requires determining iff
is a function (perhaps an intrinsic) or array.
- Subscript syntax is ambiguous whether
- Turn statement-based syntax blocks into actual delimited blocks: Language.Fortran.Transformation.Grouping
Post-parse transformation alters the AST, but not node annotations. Internally, it does the following:
- perform temporary renaming pass
- perform temporary type analysis pass
- perform transformations in sequence with annotations from temporary passes
- discard analysis and renamings, return transformed AST
Analyses variables through the whole AST and produces unique names, which allow ignoring scoping during later analyses. The original names are retained.
-
analyseRenames
generates these renamings and places them in the relevant nodes' annotations. -
rename
then substitutes these renamings in, over the original names. -
unrename
puts the original names back.
Type checking depends on some post-parse transformations for correctness (the subscript disambiguation, and intrinsics disambiguation)
With type checking (--typecheck
CLI, analyseTypes
library), after parsing:
- Gathers type information on 4 separate traversals through the entire
ProgramFile
- Uses gathered information to "annotate"
Expression
s andProgramUnit
s: involves evaluating expression types e.g.real + int = real
- Types are a mix of
BaseType
+ other syntax tags, and fortran-varsSemanticType
s which include kind
Analyses control flow and produces basic block graphs. Each ProgramUnit
has its basic block graph inserted into its annotation.
- Live variable analysis
- Constant expression analysis
- Gathers explicit constants (PARAMETER variables)
- Evaluates a handful of intrinsics and binops
- The implicit type analysis isn't ideal. We should be able to identify the exact set of syntax analysis required to perform these post-parse transformations, and do only that.
- Or perhaps these transformations could take place in-line, inside the main syntax analysis?
- It gets even worse as we do more work during type analysis - perhaps we could make these analyses a bit configurable to work around that.
- Parts of the constant expression analysis should be done earlier. Explicit constant variables (PARAMETERs) may be used in types as kind parameters, so we should gather that info during type analysis. #192
- Constant expression analysis could be expanded to cover more operations, and have behaviour closer to Fortran compilers in use. #192