Skip to content

EarnGH/Idiomatic-Refactoring

Repository files navigation

Python Code Refactoring Analysis with ChatGPT

This project analyzes Python code refactoring using ChatGPT, focusing on Pythonic idioms such as list comprehensions. It addresses three research questions (RQ1–RQ3) related to refactoring consistency, feature differences, and reasoning references.


📦 Installation

  1. Clone this repository.
  2. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file in the project root with your OpenAI API key:
OPENAI_API_KEY=[your api key]

📂 Dataset

The dataset is located at:

csv_files/code_review_total_code_900.csv

📥 Fetching Files

Some files are already stored in:

downloaded_files/

To fetch additional files:

  • Open fetch_files.py
  • Adjust the configuration (currently set to fetch only list comprehension idiom files).

Run:

python fetch_files.py

🤖 Running Inference with ChatGPT

To generate ChatGPT refactorings:

python inference.py

You can modify:

  • Number of iterations
  • Selection criteria (e.g., files with < 10k characters)

After running inference.py, extract all code feature metrics by running:

python analyze_python_files_ast.py

This will create:

csv_files/base_code_summary.csv
csv_files/selected_base_code_summary.csv

📊 Research Questions

RQ1: How many refactorings involve list comprehensions?

Run:

python compare_iterations.py

Output:

csv_files/compare_iterations_list_comps.csv

RQ2: What are the feature differences between original and AI-refactored code?

Run:

python compare_features.py

Outputs:

csv_files/selected_comparison.csv
csv_files/selected_comparison_p_values.csv

RQ3: Does AI-generated reasoning reference specific code elements?

  • The raw reasoning file is:
result/[file_name]/reasoning.txt
  • For better formatting:
python convert_readme.py

This converts reasoning.txt into a Markdown-formatted file.


📜 Additional Utilities

  • add_original.py
    Adds the original source file alongside its refactored counterpart in the result folder for side-by-side comparison.

📁 Project Structure

csv_files/                 # CSV datasets and results
downloaded_files/          # Source files fetched from repositories
junk/                      # Temporary files
plots/                     # Visualizations
result/                    # AI-generated outputs and reasoning
requirements.txt           # Dependencies
fetch_files.py             # Fetches source files
inference.py               # Runs refactoring inference
analyze_python_files_ast.py# Extracts code metrics
compare_iterations.py      # RQ1 analysis
compare_features.py        # RQ2 analysis
convert_readme.py          # Formats reasoning output
add_original.py            # Adds original code to results

"# Idiomatic-Refactoring"

About

Idiomatic Refactoring: List Comprehension

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages