DLIR-Allocator

We find that many types of computing-resources (such as CUDA-GPU and FPGA) have parallel waiting problem, which is bad for deep learning inference applications which are computationally intensive and delay-sensitive. To solve the above problem, one can consider intercepting API calls from the hardware driver layer, as in GPU virtualization, but this makes the generality greatly reduced and the system over-coupled. Therefore, we innovatively start from the model and create a generic allocator and mask the driver layer scheduling to alleviate the above problems, expecting to obtain better service latency for each request.

Architecture

Develop Environment

we test our program with GTX-2080Ti with 10-core GPU

gcc/g++ with v8.4.0
grpc with github commit version 8f6ae3599f247c3e0de604b5321538b99f3d68a3
protobuf with 3.22.2 (install by grpc source code)
onnxruntime-gpu with v1.12.1
C++ compiler param support:
- -std=c++17
- -lstdc++fs
- -lonnxruntime
- -lprotobuf
- -lpthread
- add -DPARALLER_MODE if you only want to mask our allocator mechanism.
nlohmann::json library installed.

Compile

you can get muti-version by give compiler-flag, DLIR_MODE (default), BNST_MODE, FIFO_MODE, OYST_MODE and PARALLER_MODE are available.

DLIR_MODE: DLIR mode, to allow auto-split and sort
BNST_MODE: similar to DLIR_MODE, but split is not allowed.
OYST_MODE: similar to DLIR_MODE, but split is forced.
PARALLER_MODE: run all kinds of task in muti-process.
FIFO_MODE: run task with FIFO.

compile DLIR_MODE as an example:

git clone git@github.com:EdgeScheduler/DLIR-Allocator.git

cd DLIR-Allocator
mkdir -p build && cd build
cmake ../ -DCOMPILE_MODE="DLIR_MODE"
make

# you can get binary in DLIR-Allocator/bin/release/DLIR-Allocator

Also, you can compile to all version by run script directly:

git clone git@github.com:EdgeScheduler/DLIR-Allocator.git

cd DLIR-Allocator
./scripts/build.sh

# you can get all binary in DLIR-Allocator/bin/release/*-Allocator

Surrported

Operation System
- Linux (Ubuntu test)
Hardware Support
- CUDA-GPU (GTX 2080Ti and Tesla-T4 test)
- to-do
  - Mali-GPU
  - FPGA
  - DSP

Relationship with OnnxSplitRunner

In order to eliminate the negative effects of fake-multi-threading mechanism of Python course by GIL, we eventually decided to refactor the code with C++. Raw Project with Python can still be found at: https://github.com/EdgeScheduler/OnnxSplitRunner

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.vscode		.vscode
Onnxs/scripts		Onnxs/scripts
RPCProtoInterface		RPCProtoInterface
backup		backup
data		data
doc/resource/images		doc/resource/images
entrance		entrance
include		include
libs		libs
scripts		scripts
sources		sources
test		test
testlib		testlib
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DLIR-Allocator

Architecture

Develop Environment

Compile

Surrported

Relationship with OnnxSplitRunner

Recommend

Contributors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

EdgeScheduler/DLIR-Allocator

Folders and files

Latest commit

History

Repository files navigation

DLIR-Allocator

Architecture

Develop Environment

Compile

Surrported

Relationship with OnnxSplitRunner

Recommend

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages