-
Notifications
You must be signed in to change notification settings - Fork 359
Description
Running set of multiple operations in parallel ("Operator-level parallelization") as shown in Figure 1 has potential to improve inference time. By using draft PR #2756, we confirmed this parallelization accelerated inference time of some actual models. We will split the PR into smaller PRs for step-by-step reviewing. This issue describes overall plan and status for the PRs.
We introduced new operations which are called ONNXParallelOp and ONNXForkOp.(PR #2810) . These operations are lowered to KrnlParallelOp, KrnlIterateOp, and SCFIfOp. We will create subsequent PR for lowering pass for ONNXParallelOp and ONNXForkOp. By taking this approach, we can use onnx-mlir existing OpenMP implementation and meet requirement about using common framework for threading described in issue #2497.
- PR Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810 (Ready for review)
:
:
Figure 1. Operator-level parallelization | Figure 2. Implementation |
---|---|
![]() |
![]() |