- Figure 1: Application background of binary code understanding.
- Figure 2: An overview of the benchmark dataset construction process.
- Figure 3: An overview of the evaluation process.
More details can be found in our paper.
conda create -n binaryllmEval python=3.8.0
conda activate binaryllmEval
pip install -r requirements.txtWe provide here scripts to infer locally deployed LLMs and call ChatGPT via API.
CUDA_VISIBLE_DEVICES=0 python infer_llama.py
The evaluation data is in the dataset folder, and the specific prompts are provided in the utils.py file.
Calculate the Precision, Recall, and F1-score metrics of function name recovery task
python cal_funcname_metrics.py
Calculate the BLEU-4, METEOR, and Rouge-L metrics of binary code summarization task
python cal_summarization_metrics.py
@article{shang2025empirical,
title={An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding},
author={Shang, Xiuwei and Fu, Zhenkan and Cheng, Shaoyin and Chen, Guoqiang and Li, Gangyang and Hu, Li and Zhang, Weiming and Yu, Nenghai},
journal={arXiv preprint arXiv:2504.21803},
year={2025}
}


