This repository contains the code for "SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators". Please follow the instructions in the AE Appendix of our corresponding HPCA 2025 paper.
- First you will need to install some python libs:
pip install -r requirements.txt
orconda install --file requirements.txt
- Optional: we use OpenMP to do multi-threaded search, if you do not want this, just comment out
-fopenmp
in theMakefile
.
./build.sh
./run.sh --eta
./get_results.sh
After running build.sh
, you can execute a single experiment using the command:
./build/soma 108 2 512 1 8 1 8 4 256 512 849779186 results/dse
- Network (108): Specifies the neural network to be used. (Full list below)
- Baseline Type (2): Must be
2
, as other values are not supported. - Sequence Length (512): Relevant for LLMs; ignored for CNNs.
- Number of Segments (1): Used when a network is too large and needs partitioning for scheduling.
- L2 Buffer Size (8 MB): Defines the L2 buffer size in megabytes.
- Batch Size (1): Specifies the number of input samples processed at once.
- DRAM Bandwidth Ratio (8): Ratio of DRAM bandwidth (GB/s) to computational power (TOPS). Default is
1
. - PE Array Dimension (4): Typically ranges from
4
to16
. - L2 Buffer Bandwidth (256 GB/s): Specifies the bandwidth for L2 buffer.
- MAC Units per PE (512): Determines the number of multiply-accumulate (MAC) units per PE. TOPS is calculated as:
TOPS = 2 * mac_num * PE_ARRAY_Dim^2 / 1024
- Random Seed (849779186): Used for the random number generator.
- Results Folder (
results/dse
): Specifies where the experiment results are stored.
0
: Darknet191
: VGG192
: ResNet503
: GoogLeNet4
: ResNet1015
: DenseNet6
: Inception-ResNet-V17
: GNMT8
: LSTM9
: ZFNet10
: Transformer11
: Transformer Cell12
: PNASNet13
: ResNeXt5014
: ResNet15215
: Transformer Big Cell16
: RetinaNet-ResNet5017
: U-Net18
: RandWire Small19
: RandWire Large
101
: GPT-J 6B (Decode)102
: GPT-J 6B (Prefill)103
: LLaMa 2 70B (Decode)104
: LLaMa 2 70B (Prefill)105
: BERT Base106
: BERT Large107
: GPT-2 Small (Decode)108
: GPT-2 Small (Prefill)109
: GPT-2 XL (Decode)110
: GPT-2 XL (Prefill)
For unsupported models, the program will throw an error: Model not supported.
Contains header files for the project.
Holds Python scripts used for processing and analyzing experiment results.
Contains the source code (C++) for implementing the SoMa Framework.
Note on GPT: Please note that for GPT-2, we actually explored only one block, so in the data processing script, the corresponding latency is multiplied by the number of blocks. For GPT-2-small in the DSE experiments, the number of blocks is 12.
If you encounter any issues during the AE process, please contact:
Jingwei Cai (Tsinghua University) 1148821791@qq.com
Xuan Wang (Xi'an Jiaotong University, Institute for Interdisciplinary Information Core Technology) wangxuanxjtu@163.com