This repository contains the code for the EMNLP 2025 paper
If you use AutoCVSS for your research, please cite our paper:
@inproceedings{sanvito2025autocvss,
title={AutoCVSS: Assessing the Performance of LLMs for Automated Software Vulnerability Scoring},
author={Sanvito, Davide and Arriciati, Giovanni and Siracusano, Giuseppe and Bifulco, Roberto and Carminati, Michele},
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track",
month = nov,
year = "2025",
publisher = "Association for Computational Linguistics",
}AutoCVSS uses Python 3.12: ensure you have it on your system.
# sudo apt update; sudo apt install software-properties-common # (if `add-apt-repository` is not found)
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.12This project dependencies are managed with poetry. We suggest using pipx to install poetry.
sudo apt update
sudo apt install -y pipx
pipx ensurepath
pipx install poetryTo set up the project environment and install all necessary dependencies, run the following command:
cd AutoCVSS
poetry installA valid OpenAI API key must be provided in the configuration file: .keys.ini.sample provides an example.
# cd AutoCVSS
cp .keys.ini.sample .keys.ini
nano .keys.ini- A valid OpenAI API key must be provided in the
OpenAI_auth_paramssection. - The
Langfusesection can be optionally uncommented and configured to enable the integration with Langfuse, a SOTA tool for LLM observability. - For Local LLMs, assuming they are served through an OpenAI-compatible API (e.g. via vLLM/ollama), you can configure the endpoints and model names in
autocvss/connector/llm.pyfile based on your local deployment:- LLaMA3:
get_llama31_api_model_name()andget_llama31_client() - DeepSeek-R1:
get_ollama_deepseek_r1_70b_api_model_name()andget_ollama_deepseek_r1_70b_client()
- LLaMA3:
Before running the experiments, run the following commands to download the CVE data from NVD.
# cd AutoCVSS
cd data
for year in {2022..2024}; do
if [ ! -f nvdcve-2.0-${year}.json.zip ]; then
wget https://nvd.nist.gov/feeds/json/cve/2.0/nvdcve-2.0-${year}.json.zip
unzip nvdcve-2.0-${year}.json.zip
fi
done
poetry run python process_nvd_data.pyYour data directory should now look as follows:
.
├── v31_full_dataset
│ ├── dataset_test_df.csv
│ ├── dataset_train_df.csv
│ ├── test_set_cve_ids.csv
│ └── train_set_cve_ids.csv
├── v31_low_resource_dataset
│ ├── dataset_test_df.csv
│ ├── dataset_train_df.csv
│ ├── test_set_cve_ids.csv
│ └── train_set_cve_ids.csv
└── v40_38_samples_by_first
├── dataset_from_NVD.csv
└── test_set_cve_ids.csvYou are now ready to execute the experiments:
- The
notebooksdirectory includes the Jupyter notebooks to evaluate LLMs for the prediction of CVSS scores. - In each notebook you should select the data scenario and LLM configuration to be tested by providing:
DATASET_NAMEandCVSS_METRICSEXPERIMENT_NAME
⚠️ By default, the notebooks only test the first 3 samples: comment the following line to run tests on the entire test data!- dataset_test_df = dataset_test_df.head(3) + # dataset_test_df = dataset_test_df.head(3)
- Optionally, the notebook can be run directly from the shell with the 3 bash scripts provided in the root of this repository.
run__evaluation_zeroshot_STD_DTD.shrun__evaluation_zeroshot_FVP.shrun__evaluation_fewshots_STD_DTD.sh
- Finally, the tables with the summary of the results can be visualized with the
parse_results.ipynbnotebook.