-
Notifications
You must be signed in to change notification settings - Fork 76
RFC-0031: ZenDNN Integration #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
naveenthangudu
wants to merge
1
commit into
pytorch:master
Choose a base branch
from
naveenthangudu:ZenDNN_Integration
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
|
||
# ZenDNN Integration | ||
|
||
|
||
## **Author** | ||
* @naveenthangudu | ||
|
||
|
||
## **Summary** | ||
This document proposes an approach for integrating ZenDNN library into PyTorch. This integration will enable inference optimizations for deep learning workloads on AMD CPUs. | ||
|
||
|
||
## **Highlights** | ||
* Default build behavior remains un-altered. | ||
* A pre-build step will generate ZenDNN integration code. | ||
* ZenDNN integration is optimized for inference workloads. | ||
* New opt-in build flag (environmental variable) `USE_ZENDNN` will be added. | ||
* No Submodules are added. When `USE_ZENDNN` is set to 1, the following repositories will be downloaded into third_party folder. | ||
* [ZenDNN](https://github.com/amd/ZenDNN) | ||
* [AOCL BLIS](https://github.com/amd/blis) | ||
* ZenDNN optimizations are added as JIT Passes on frozen graph under `optimize_for_inference` API. | ||
* Support will be added for fp32 and bf16 data types. | ||
|
||
|
||
## **Motivation** | ||
The main goal of this integration is to realize optimal inference on AMD CPUs for PyTorch. ZenDNN is a library which enables performance improvements on AMD CPU architectures. Please find the repo and more details [here](https://github.com/amd/ZenDNN). To highlight the potential uplift using ZenDNN vs. oneDNN on AMD CPUs, we have benchmarked our fully integrated ZenDNN version using PyTorch v1.12 on a Genoa CPU (4th Generation AMD EPYC™ Processors). | ||
|
||
|
||
### **Benchmarking Configuration** | ||
||| | ||
|-----------------------------|-------| | ||
| PyTorch version | v1.12 | | ||
| CPU | 4th Generation AMD EPYC™ Genoa CPU | | ||
| Number of physical cores | 96 | | ||
| NUMA Nodes per Socket(NPS) | 1 | | ||
| Model data type | FP32 | | ||
|
||
### **Latency Performance - batch_size=1** | ||
|
||
<img src="./RFC-0031-assets/Latency.svg" width=40%> | ||
|
||
### **Throughput Performance - batch_size=640** | ||
|
||
<img src="./RFC-0031-assets/Throughput.svg" width=40%> | ||
|
||
|
||
## **Proposed Implementation** | ||
|
||
### **Overview** | ||
This approach references the [hipify](https://github.com/ROCm-Developer-Tools/HIPIFY) approach used by the AMD ROCm team for integration of ROCm libraries into PyTorch. By leveraging shared APIs between oneDNN and ZenDNN, a pre-build step will generate ZenDNN integration code wherever possible. By adding an opt-in build flag, default build behavior is un-altered. Required repositories are only downloaded when the build flag is enabled. | ||
naveenthangudu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### **Code generation script** | ||
* References hipify tool, which is present in PyTorch. | ||
* Implemented in Python and run as a pre-build step. | ||
* Takes in folders or files and generates substituted code. | ||
|
||
### **Build Infrastructure** | ||
* Support for opt-in flag (environmental variable) `USE_ZENDNN` in Linux build path. | ||
* PyTorch build reverts to default behavior if `USE_ZENDNN` environmental | ||
variable is set to zero or unset. | ||
* When `USE_ZENDNN` is set to 1, | ||
* ZenDNN and AOCL BLIS repositories will be downloaded into third_party folder. | ||
* AOCL BLIS and ZenDNN will be built into PyTorch. | ||
* oneDNN path will be disabled. | ||
* BLIS library is built as a pure dependency for ZenDNN acceleration. MKL will continue to be the BLAS library used by default. | ||
|
||
|
||
### **Graph Optimizations** | ||
As part of this integration, the following optimizations will be added into PyTorch. As optimization is an ongoing process, we will be adding more optimizations in the future. | ||
* Fusion of ops. Examples include, | ||
* Conv Bias and Relu (CBR) fusions | ||
* Conv and Add fusions (Fusing Residual addition into conv) | ||
* Inplace concatenation (Inception) | ||
* Reorder optimizations | ||
* Inplace unary elementwise operations | ||
naveenthangudu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Memory Pool optimizations | ||
|
||
As a few optimizations are specific to inference only workloads, our graph optimizations are added onto frozen graphs. They are added under `optmize_for_inference` API. The figure below illustrates ZenDNN graph optimizations in the PyTorch workflow. | ||
|
||
 | ||
|
||
## **Open Questions** | ||
The following questions will impact the design and implementation of ZenDNN integration. | ||
|
||
* Can ZenDNN have a new dispatch key added specifically for it? | ||
* Can ZenDNN have its own backend? | ||
* Can we add ZenDNN tensor layout to tensor layouts? | ||
|
||
|
||
## **Next Steps** | ||
* First PR - with code generation tool and build infrastructural changes with support for eager mode. | ||
* Second PR - featuring a few graph optimizations. | ||
* Third PR - including unit tests and CI deployment. | ||
* Further PRs with more optimizations and PyTorch 2.0 feature integration. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.