Skip to content

improve specification of non-constant array shape declarations and pdt initialization #200

Open
@sigfig

Description

@sigfig

fortran as specified has some semantic gaps in the interpretation of the restricted scalar expressions used in procedure interfaces, the interpretation of automatic array declarations using the dimension attribute, and the interpretation of parameterized derived type initialization in type declarations. this seems to be a substantial conformance issue, gfortran ifort and cuda fortran all have noticeable differences in how these language features are interpreted. i've found a few cases in gfortran and ifort where fortran 2003 compliant code will cause crashes during compilation. this is especially noticeable when using type parameters, pdts with integer, len parameters are largely unusable in programs where they would otherwise provide a lot of benefit, due to incomplete implementations and lack of testing. recent specification changes are no help here, after reviewing the semantics in fortran 2018, i have no idea how to implement these features myself without ignoring most of the rules.

on the fortran language spec side i believe there's a straightforward remedy for making compliance feasible:

  1. split the roles of specification expressions, further constraining the expressions that can be used in "shape expressions"
  2. merge all the existing rules for non-constant array shapes, coarray shapes, and integer, len type parameters into one section on resolving dynamic and assumed shapes
  3. introduce an operational interpretation of dynamic shape expressions into the spec

part 1 only entails removing the already unusable primaries, intrinisics, and inquiries from usage in shape expressions. part 2 is a spec simplification, no semantics are changed, they're just deduplicated and the intended interpretation is made more explicit, without adding any new concepts.

part 3 is the substantial change, to be useful it requires a non-trivial amount of reinterpretation and the addition of many new rules.

my proposal to make this easy in both specification and implementation is to introduce just one new semantic construction that corresponds to an integer constraint interpretation of the scalar expressions used in procedure interfaces. this is opposed to the current imperative expression evaluation interpretation of the same syntax.

this addition would require adding a notion of a "shape invariant", a few different classes of shape invariants, and resolution rules distinct from expression evaluation.

that seems to be a much broader scope of changes than other proposals, but it does seem like a constraint interpretation fits a lot closer to the existing semantics of shape expressions in fortran. as a trivial example, a matmul interface:

function f(x, y, z, l, r) result(o)
  integer :: x, y, z
  real, dimension(x, y) :: l
  real, dimension(y, z) :: r
  real, dimension(x, z) :: o
end function f

the required shape validation checks could be evaluated from top to bottom, from bottom to top, or in parallel, with no meaningful differences. we don't really have expression reduction available here either, either the interface is satisfiable or it isn't, there are no intermediate states in which a subset of this interface could be meaningfully satisfiable. if we introduced data dependencies or inequalities, we would arrive at an ordering, but it won't necessarily be an ordering over a set of statements, or over expressions. any ordering induced by an interface is an arbitrary lattice over variable relations. these look a lot more like simultaneous constraints than sequential statements.

this reinterpretation has a substantial upside with respect to specification effort: pdt initialization constraints can always be transcluded into the constraint set of the interface that declares the type, no extra rules are needed. additionally, the admissable intrinsic functions and requirements on specification functions admit easy constraint propagation. on the implementation side, the consequences of this sort of change unquestionably adds complexity to a compiler. the material benefit for compiler projects is that pdts and statement expressions are no longer a murky area, and users can successfully write a parameterized matrix type without filing bug reports.

this presentation is eliding over many caveats you may be able to notice. i've drafted a proposal document that elaborates on the shape invariant representation, static semantics, and implementation strategies, but it's super long so i didn't want to bomb the proposals repo before getting initial feedback. fortran makes this approach mostly easy but it does throw in some quirks, let me know if any of you have better ideas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Clause 9Standard Clause 9: Use of data objects

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions