BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

Deps

pull docker image

docker pull dockcross/linux-x64:latest

install pypi deps

pip install -r requirements.txt

Usage for users

from evaluator.data import read_problems, write_samples, generate_one_prompt
from user_impl_script import generate_one_completion
problems = read_problems()
samples = [
    dict(task_id=task_id, completion=generate_one_completion(generate_one_prompt(problem))) for problem in problems
]
write_samples(samples)

Data Preprocess

Extract function name and address from binaries

python ext_idb_and_nameaddr.py

Extract multiple information of function from binaries

python ext_func.py

Inference

We provide here scripts to infer locally deployed LLMs and call ChatGPT/GPT-4 via API.

CUDA_VISIBLE_DEVICES=0 python infer_llama.py

Evaluation

python evaluation.py --prediction_file ./queryllm/Llama-2-7b-chat-hf_prediction.json --problem_file ./problem_data.json

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
RefCode		RefCode
data_preprocess		data_preprocess
evaluator		evaluator
imgs		imgs
queryllm		queryllm
README.md		README.md
evaluation.py		evaluation.py
problem_data.json		problem_data.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

Deps

Usage for users

Data Preprocess

Inference

Evaluation

Workflow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Sxxxw/BinMetric

Folders and files

Latest commit

History

Repository files navigation

BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models

Deps

Usage for users

Data Preprocess

Inference

Evaluation

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages