This repo contains the framework introduced in the paper BufferProspector [Wei25]_ . The python tool allocates unused buffer for the ofmaps that needs to be buffered in Timing Mismatch of Layer-pipeline Mapping. See the details in our DAC'25 [Wei25]_ paper.
This framework is built upon the framework of Tangram [Gao19, Gao17]_ .
If you use this tool in your work, we kindly request that you reference our paper below, and send us a citation of your work.
- Wei et al., "BufferProspector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators", in DAC, June 2025.
The installation process can be done in the same way of the Tangram framework.
However, to make the installation easier, in the main python scripts we have
added the path of BufferProspector into PYTHONPATH
manually, and one only need
to take care of the dependencies during installation.
One can install the dependencies by using pip
::
> pip install -r requirements.txt
To reproduce the experiments in BufferProspector[Wei25]_ , simply run
> cd nn_dataflow/tools
> python exp_overall.py
> python exp_dse.py
The results will be output to the folders 01_overall
and 02_DSE
under nn_dataflow/tools
.
Each result contains 3 files:
-
The
*.json
file contains the searched scheme and its performance, latency, buffer usage, etc. -
The
*.dat
file contains the dumpedNNDataflowScheme
object and segment information, for direct reading the object back to python during debugging and inspections. -
The
*.txt
file contains the log of the experiments, including the DP searched segments and their costs.
The *.dat
file will be produced only when the experiment finishes, and its existance is used
as a flag in the exp_*.py
scripts: If the dat file is produced, related experiment is skipped
to avoid running it the second time.
After the script finished, run data.py
under the nn_dataflow/tools
folder to gather the related statistics:
> # cd nn_dataflow/tools
> python ./data.py
The final result is generated into the res.csv
file.
Notes:
-
The
exp_*.py
scripts can run the experiments in parallel. One can change therun_sequential
function intorun_threads
to enable parallel execution. However, the memory consumption of the program is large, and if there is not enough memory (at least 64GB for LLMs), running the experiments sequentially is recommended. The hyperparameters to control the parallel execution is atrun_multithread.py
, please refer to the script for further information. -
We have also provided the expected results of the experiments at the
nn_dataflow/DAC_exp
folder. One can directly move it into thenn_dataflow/tools
folder and rundata.py
to obtain the results.
BufferProspector
is free software; you can redistribute it and/or modify it
under the terms of the BSD License <LICENSE>
__ as published by the Open
Source Initiative, revised version.
.. [Wei25] Wei, Cai, Gao, Peng, Wu, Shi, and Ma, Buffer Prospector: Discovering and Exploiting Untapped Buffer Resources in Many-Core DNN Accelerators
__, in DAC. June, 2025.
.. [Gao19] Gao, Yang, Pu, Horowitz, and Kozyrakis, TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators <//dl.acm.org/citation.cfm?id=3297858.3304014>
__, in ASPLOS. April, 2019.
.. [Gao17] Gao, Pu, Yang, Horowitz, and Kozyrakis, TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory <//dl.acm.org/citation.cfm?id=3037697.3037702>
__, in ASPLOS. April, 2017.