-
Notifications
You must be signed in to change notification settings - Fork 3
halide_notes
-
the organization of computations and data for a given algorithm is constrained by a fundamental tension between parallelism, locality, and redundant computation of shared values
- This is addressed by a systematic model of schedules.
-
Stencil computation - repeated application of kernel across a dataset.
-
this is also why libraries of optimized code cannot deliver efficient performance when building real image processing pipelines: individually optimized subroutines do not compose into an optimized whole, since they cannot reorganize computation for locality or parallel execution across function boundaries.
-
I believe the right way to program image processing pipelines is to separate the intrinsic algorithm—what is computed—from the concerns of efficiently organizing it for machine execution—decisions about storage and the ordering of computation.
-
In particular, recursion is only allowed within a single function, using update definitions, and the recursion is bounded to a fixed depth before it begins by an explicit reduction domain. As a result, Halide’s language of algorithms is not Turing-complete, but is amenable to extensive analysis and transformation.