Code smell evolution and refactoring: Empirical study

Code smells serve as indicators of underlying quality issues that negatively impact software maintainability. Refactoring is a widely recognized technique for improving code quality by restructuring code. This study aims to shed light, to the best of our knowledge for the first time in the literature, on code smells and the types of refactorings that one can apply to address such smells. In this empirical study, we aim to investigate the evolution of code smells and the impact of applied refactoring techniques to offer developers actionable advice to make better decisions about when and how to refactor code. This study examines $87$ open-source Java repositories to investigate the lifespan of code smells, the effectiveness of refactorings in resolving them, and their broader relationship with refactoring practices. Our findings provide a detailed mapping between specific refactoring techniques and various design and implementation smell categories. We combine automated detection with manual analysis to deepen our understanding of the interactions between code smells and refactoring.

Approach Overview

Computational requirements

Software Requirements

Windows/Linux
Java runtime v17
Executables:
- DesigniteJava v2.5.9
- RefactoringMiner v3.0.9
Python 3.11+
- GitPython
- pandas
- numpy
- scikit-learn
- openai
- tiktoken
- seaborn
- matplotlib
- lifelines
- chardet

Memory and Runtime Requirements

Any CPU with a minimum memory of 16GB with 4 cores should be sufficient to run the analysis.

Steps to reproduce

(Note: Provided steps are for linux based system)

To use the same repositories as analyzed in the study, the repositories can be downloaded from Zenodo.

To reproduce from scratch, the following steps are required:

Dependencies and Environment Set-Up

Clone this repository to the local folder 'cd' into the folder.

(optional) Setup virtual environment

python -m venv <venv_name>
source <venv_name>/bin/activate

Install all the dependencies
```
pip install -r requirements.txt
```

Repository selection

List of repositories shortlisted for the study can be found in the file bin/data/corpus_specs.json. The file contains the following fields:

parameters: The parameters used to filter the repositories.
items: The list of repositories that were selected based on the parameters.

Note: The repository index of list items is used to for further analysis with <repo_idx>.

Data preparation

Code smell detection:

python3 scripts/data_generation.py designite <repo_idx>

Refactoring identification:

python3 scripts/data_generation.py refminer <repo_idx>

Data analysis

After the data generation, the following steps are performed:

Individual repository analysis:

python3 scripts/analysis.py <repo_idx>

Aggregate analysis (Corpus level):

python3 scripts/analysis.py

This will generate the smells and refactorings collocation mapping for the entire corpus.

Post-processing

python3 scripts/postprocess.py

Manual analysis

To conduct the manual analysis, utilize respective modules in scripts/manual_analysis.py and scripts/llm_analysis.py as required.

Configuration

Make changes to the scripts/config.py file to change the configuration of the analysis if needed.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
bin		bin
jobs		jobs
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
approach.png		approach.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Code smell evolution and refactoring: Empirical study

Computational requirements

Software Requirements

Memory and Runtime Requirements

Steps to reproduce

Table of Contents

Dependencies and Environment Set-Up

Repository selection

Data preparation

Data analysis

Post-processing

Manual analysis

Configuration

License

About

Uh oh!

Releases

Packages

Languages

License

SMART-Dal/code_smell_evolution

Folders and files

Latest commit

History

Repository files navigation

Code smell evolution and refactoring: Empirical study

Computational requirements

Software Requirements

Memory and Runtime Requirements

Steps to reproduce

Table of Contents

Dependencies and Environment Set-Up

Repository selection

Data preparation

Data analysis

Post-processing

Manual analysis

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages