A Contrastive Learning Approach to Bug Severity Classification with Large Language Model Embeddings

Bug severity classification is critical for prioritizing software issues effectively. This project leverages Large Language Models (LLMs), specifically CodeBERT, to classify bug reports based on severity levels by generating contextual embeddings of bug descriptions. A contrastive learning strategy is applied to enhance the embedding space, improving separation between bug report classes and leading to better classification accuracy.

Our approach was evaluated on the NASA PITS and Mozilla bug datasets and compared with traditional embedding-based models like Doc2Vec. Results show that fine-tuning LLMs with contrastive learning improves performance, especially in diverse and imbalanced datasets.

Core Features

Contrastive learning using NT-Xent loss to structure the embedding space
Embedding generation using CodeBERT (RoBERTa-based LLM)
Fully connected classification head trained on bug report embeddings
Evaluation across intra- and cross-project datasets
Comparison with non-contrastive LLM and traditional Doc2Vec models
t-SNE visualization for understanding embedding separability

Datasets Used

NASA PITS Dataset (5 projects, 3,282 combined bug reports)
Mozilla Bug Reports (9,998 bug reports across multiple severity levels)

Only the bug descriptions and severity levels are used for training. The models are tested in both same-project and cross-project settings.

Experimental Settings

We evaluate the following:

LLM + Contrastive Learning (ours)
LLM without Contrastive Learning
Doc2Vec + MLP (baseline from prior work)

Key metrics include Accuracy and F1-score, especially relevant for imbalanced datasets.

Background

This work originated as a course project in the graduate course Applications of LLMs to Software Engineering at Ontario Tech University. The project was developed by Mosarrat Rumman, Emon Roy, and Anushka Zaman under the supervision of Professor Jeremy Bradbury. It was later extended into a research paper submitted to the COMPSAC 2025 SETA track.

Initial implementation by Mosarrat Rumman.
Currently owned and maintained under the Software Engineering & Education Research Lab (SEER Lab).

Paper Link

A Contrastive Learning Approach to Bug Severity Classification with Large Language Model Embeddings
Accepted to COMPSAC 2025, SETA Track
📄 Link to the paper will be announced soon.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Dataset		Dataset
Results		Results
src		src
README.md		README.md
embeddingVisualization_bug_report_severity.ipynb		embeddingVisualization_bug_report_severity.ipynb
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Contrastive Learning Approach to Bug Severity Classification with Large Language Model Embeddings

Core Features

Datasets Used

Experimental Settings

Background

Paper Link

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

seer-lab/bug_severity_contrastive_learning

Folders and files

Latest commit

History

Repository files navigation

A Contrastive Learning Approach to Bug Severity Classification with Large Language Model Embeddings

Core Features

Datasets Used

Experimental Settings

Background

Paper Link

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages