-
Notifications
You must be signed in to change notification settings - Fork 21
Functionality (and library structure)
fortran-src provides the following functions over Fortran source code:
- lexing and parsing to an expressive abstract syntax tree;
- perform various static analyses;
- pretty printing;
- "reprinting", or patching sections of source code without removing secondary notation such as comments;
- exporting to JSON.
fortran-src is primarily a Haskell library, but it also packages a command-line tool for running and inspecting analyses. By exporting parsed code to JSON, the parsing and standard analyses that fortran-src provides may be utilized by non-Haskell tools.
The library's top-level module is called Language.Fortran
. As such
all submodules are within that namespace.
Static analysis of Fortran requires a choice in the lexing and parsing front end: either to take the approach of many compilers, allowing an amalgam of features (e.g., gfortran with its hand-written parser), or to enforce language standards at the exclusion of some code that is accepted by major compilers. fortran-src takes roughly the latter approach, though it also has an extended Fortran 77 mode for supporting legacy extensions influenced by vendor-specific compilers that have been popular in the past.
Furthermore, the Fortran language has evolved through two broad syntactic forms:
-
fixed source form, used by FORTRAN 66 and FORTRAN 77 standards, where each line of source code follows a strict format (motivated by its original use with punched cards). The first 6 columns of a line are reserved for labels and continuation markers. The character
C
in column 1 indicates a comment line to be ignored by the compiler, else the line properly begins from column 7. -
free source form, first specified in Fortran 90 and subsequent versions of the standards, which has fewer restrictions on line format and a different method of encoding line continuations.
Therefore, two lexers are provided: the fixed form lexer, for handling earlier
versions of the language: FORTRAN 66 and FORTRAN 77 (and additional
Legacy
and Extended
modes), and the free form lexer, for Fortran
90 onwards. The lexers are auto-generated via the
alex
tool.
The fixed form lexer (Language.Fortran.Parser.Fixed.Lexer
) handles
the expectation that the first 6 columns of a line are reserved for
code labels and continuation line markers, with code starting at
column 7, and with comment lines starting with C
in the first
column. Only the first 72 columns are scanned (i.e., anything after is
ignored).
The free form lexer (Language.Fortran.Parser.Free.Lexer
) is less
constrained but still has to manage continuation-line markers which
break statements across multiple lines.
fortran-src then defines one parser per supported standard (with the exception of
FORTRAN 77, for which we define extra parsers handling non-standard extended features).
Each parser uses the source form that its standard specifies.
Later Fortran standards such as Fortran 2003 are generally comparable to Fortran
90, but with additional syntactic constructs. The fortran-src parsers reflect
this, gating certain features by the language standard being parsed. Parsers are grouped by
fixed or free form, thus parsers for FORTRAN 66 and FORTRAN 77 are
within the Language.Fortran.Parser.Fixed
namespace and the rest are within
Language.Fortran.Parser.Free
. A top-level module (Language.Fortran.Parser
)
provides a unified point of access to the underlying parsers.
The suite of parsers is automatically generated from
attribute grammar definitions in the Bison format, via the
happy
tool.
CPP (the C pre-processor) can be run prior to lexing or parsing.
The parsers all share a common abstract syntax tree (AST) representation
(Language.Fortran.AST
) via a group of mutually-recursive data
types. All such data types are parametric data types, parameterised by
the type of "annotations" that can be stored in the nodes of the
tree. For example, the top-level of the AST is the ProgramFile a
type, which comprises a list of ProgramUnit a
values, parameterised
by the annotation type a
(i.e., that is the generic type parameter).
The annotation facility is useful for,
for example, collecting information about types within the nodes
of the tree, or flagging whether the particular node of the tree has been
rewritten or refactored.
An interface of functions provides the ability to extract and set annotations
via the Annotated
class, of which all AST data types are an instance:
class Annotated f where
getAnnotation :: f a -> a
setAnnotation :: a -> f a -> f a
modifyAnnotation :: (a -> a) -> f a -> f a
Some simple transformations are provided on ASTs:
- Grouping transformation, turning unstructured ASTs into structured ASTs
(
Language.Fortran.Transformation.Grouping
); - Disambiguation of array indexing vs. function calls (as they share
the same syntax in Fortran) (
Language.Fortran.Transformation.Disambiguation
) and intrinsic calls from regular function calls (Language.Fortran.Transformation.Disambiguation.Intrinsic
), e.g.a(i)
is both the syntax for indexing arraya
at indexi
and for calling a function nameda
with argumenti
; - Fresh name transformation (obeying scoping)
(
Language.Fortran.Analysis.Renaming
).
All of these transformations are applied to the ASTs following parsing (with some slight permutations on the grouping transformations depending on whether the code is FORTRAN 66 or not).
The table below summarises the current static analysis techniques
available within fortran-src (grouped under Language.Fortran.Analysis
).
- Control-flow analysis (building a super graph) (
Language.Fortran.Analysis.BBlocks
); - General data flow analyses (
Language.Fortran.Analysis.DataFlow
), including:- Reaching definitions;
- Def-use/use-def;
- Constant evaluation;
- Constant propagation;
- Live variable analysis;
- Induction variable analysis.
- Type analysis (
Language.Fortran.Analysis.Types
); - Module graph analysis (
Language.Fortran.Analysis.ModGraph
);
A representation, abstracted away from the details of the syntax tree,
is provided for evaluation of expressions and for semantic analysis
(Language.Fortran.Repr
). Constant expression evaluation
(Language.Fortran.Repr.Eval.Value
) leverages this representation
and enables some symbolic manipulation too, essentially providing some partial evaluation.
A commonly required feature of language tools is to generate source code.
We thus provide pretty printing features to generate textual source
code from the internal AST (Language.Fortran.PrettyPrint
).
Furthermore, fortran-src provides a diff-like patching feature for
(unparsed) Fortran source code that accounts for the fixed form style,
handling the fixed form lexing of lines, and comments in its
application of patches (Language.Fortran.Rewriter
). This aids in the
development of refactoring tools.
The associated \texttt{CamFort} package\footnote{\url{https://github.com/camfort/camfort}} which builds heavily on fortran-src provides a related "reprinting" algorithm (@clarke2017scrap)
that fuses a depth-first traversal of the AST with a textual diff algorithm
on the original source code. The reprinter is parameterised by reprintings
which hook into each node and allow nodes which have been refactored by CamFort
to have the pretty printer applied to them. The resulting outputs from each
node are stitched into the position from which they originated in the
input source file. This further enables the
development of refactoring tools that need to perform transformations on source code text.