Skip to content
Chris Nuernberger edited this page Feb 27, 2018 · 10 revisions

thesis

  • the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values

    • This is addressed by a systematic model of schedules.
  • Stencil computation - repeated application of kernel across a dataset.

  • loop fusion

  • loop fusion + tiling slides

  • this is also why libraries of optimized code cannot deliver efficient performance when building real image processing pipelines: individually optimized subroutines do not compose into an optimized whole, since they cannot reorganize computation for locality or parallel execution across function boundaries.

  • I believe the right way to program image processing pipelines is to separate the intrinsic algorithm—what is computed—from the concerns of efficiently organizing it for machine execution—decisions about storage and the ordering of computation.

  • In particular, recursion is only allowed within a single function, using update definitions, and the recursion is bounded to a fixed depth before it begins by an explicit reduction domain. As a result, Halide’s language of algorithms is not Turing-complete, but is amenable to extensive analysis and transformation.

Clone this wiki locally