Code smells serve as indicators of underlying quality issues that negatively impact software maintainability. Refactoring is a widely recognized technique for improving code quality by restructuring code. This study aims to shed light, to the best of our knowledge for the first time in the literature, on code smells and the types of refactorings that one can apply to address such smells. In this empirical study, we aim to investigate the evolution of code smells and the impact of applied refactoring techniques to offer developers actionable advice to make better decisions about when and how to refactor code. This study examines
- Windows/Linux
- Java runtime v17
- Executables:
- Python 3.11+
- GitPython
- pandas
- numpy
- scikit-learn
- openai
- tiktoken
- seaborn
- matplotlib
- lifelines
- chardet
Any CPU with a minimum memory of 16GB with 4 cores should be sufficient to run the analysis.
(Note: Provided steps are for linux based system)
To use the same repositories as analyzed in the study, the repositories can be downloaded from Zenodo.
To reproduce from scratch, the following steps are required:
Steps to reproduce
1 Dependencies and Environment Set-Up
2 Repository selection
3 Data preparation
4 Data analysis
5 Post-processing
6 Manual analysis
7 Configuration
-
Clone this repository to the local folder '
cd
' into the folder. -
(optional) Setup virtual environment
python -m venv <venv_name> source <venv_name>/bin/activate
-
Install all the dependencies
pip install -r requirements.txt
List of repositories shortlisted for the study can be found in the file bin/data/corpus_specs.json
. The file contains the following fields:
parameters
: The parameters used to filter the repositories.items
: The list of repositories that were selected based on the parameters.
Note: The repository index of list
items
is used to for further analysis with<repo_idx>
.
- Code smell detection:
python3 scripts/data_generation.py designite <repo_idx>
- Refactoring identification:
python3 scripts/data_generation.py refminer <repo_idx>
After the data generation, the following steps are performed:
- Individual repository analysis:
python3 scripts/analysis.py <repo_idx>
- Aggregate analysis (Corpus level):
python3 scripts/analysis.py
This will generate the smells and refactorings collocation mapping for the entire corpus.
python3 scripts/postprocess.py
To conduct the manual analysis, utilize respective modules in scripts/manual_analysis.py
and scripts/llm_analysis.py
as required.
Make changes to the scripts/config.py
file to change the configuration of the analysis if needed.