Source Code Automated Refactoring Toolkit (CodART) is a refactoring engine with the ability to perform many-objective program transformation and optimization. We have currently focused on automating the various refactoring operations for Java source codes. A complete list of refactoring supported by CodART can be found at CodART refactorings list.
The CodART project is under active development. The current version of CodART works fine on our benchmark projects. To understand how CodART works, read the CodART white-paper.
Your contributions to the project and your comments in the discussion section would be welcomed.
Also, feel free to email and ask any question:
m-zakeri[at]live[dot]com
.
Overall system architecture showing containerized services
Reinforcement learning components and data flow
Machine learning training pipeline workflow
CodART (Source Code Automated Refactoring Toolkit) is a multi-objective program transformation and optimization engine that combines search-based software engineering (SBSE) with automated refactoring operations to improve Java source code quality. The system includes a modern web-based interface, containerized deployment with Docker, reinforcement learning capabilities using PPO algorithms, and advanced machine learning models for testability prediction and intelligent code refactoring.
Key Innovation: CodART integrates traditional search-based refactoring with modern reinforcement learning to create an intelligent system that learns optimal refactoring sequences, making it unique in the automated software refactoring domain.
- Automated Java Refactoring: Supports 40+ refactoring operations including Extract Class, Move Method, Extract Interface, and more
- Multi-Objective Optimization: Uses NSGA-II and NSGA-III algorithms to optimize 8+ QMOOD quality metrics simultaneously
- Reinforcement Learning: PPO (Proximal Policy Optimization) algorithm for intelligent refactoring sequence generation
- Testability Prediction: Advanced ML models (RandomForest, GradientBoosting, MLP, VotingRegressor) predict code testability using 262+ source code metrics
- Code Smell Detection: PMD 7.11.0 integration with custom rulesets for automated quality analysis
- SciTools Understand Integration: Professional code analysis engine for parsing and metrics computation
- Web-based UI: Modern React interface with real-time progress tracking and project management
- Containerized Architecture: Docker-based deployment with microservices including API, UI, MinIO, Redis, RabbitMQ
- ANTLR4-based Parsing: Three grammar variants optimized for different parsing performance needs
- Benchmark Integration: 14 benchmark projects for testing and validation including JSON, JFreeChart, Weka
- Docker and Docker Compose (Latest versions)
- System Requirements: At least 8GB RAM and 4 CPU cores (12GB+ recommended for large projects)
- SciTools Understand License: Professional license required for code analysis
- Academic licenses available for research purposes
- License activation requires internet connectivity
- Storage: Minimum 20GB free disk space for containers and project data
git clone https://github.com/m-zakeri/CodART.git
cd CodART
Create a .env
file in the project root:
# Project Configuration
PROJECT_ROOT_DIR="/opt/projects"
UDB_ROOT_DIR="/opt/understand_dbs"
BENCHMARK_INDEX=2 # Index from codart/config.py (0-13)
# Search Algorithm Settings
POPULATION_SIZE=15
MAX_ITERATIONS=15
PROBLEM=2 # 0: Simple Genetic, 1: NSGA-II, 2: NSGA-III
NUMBER_OBJECTIVES=8 # QMOOD metrics count
MUTATION_PROBABILITY=0.2
CROSSOVER_PROBABILITY=0.8
# Warm Start Options (Optional)
WARM_START=1 # Enable warm start from previous results
INIT_POP_FILE="/path/to/initial_population.csv" # Optional
CSV_ROOT_DIR="/path/to/jdeodorant_csv" # Optional
# MinIO Credentials (Change for production)
MINIO_ACCESS_KEY=00jFBl7n9Jn0ex0XL7m1
MINIO_SECRET_KEY=kYfujzkdSGjXKLN9oQhPDIVgRUaZRijvj1yaXmIZ
# Experimental Settings (Optional)
USE_CPP_BACKEND=0 # Enable C++ parser backend for performance
EXPERIMENTER="Your Name" # For research tracking
DESCRIPTION="Experiment description" # For result documentation
# Build and start all services
docker-compose up --build
# Or run in background
docker-compose up -d --build
- Web Interface: http://localhost:3000 (React UI)
- API Documentation: http://localhost:8000/docs (FastAPI Swagger)
- MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
- RabbitMQ Management: http://localhost:15672 (guest/guest)
- Redis CLI:
docker exec -it codart_redis_1 redis-cli
(Direct access)
# Verify all services are running
docker-compose ps
# Check SciTools Understand license
docker exec -it codart_api_1 und license
# Upload a test project via web interface or API
curl -X POST "http://localhost:8000/projects/upload" \
-F "file=@your_project.zip" \
-F "project_name=TestProject"
- FastAPI Backend: RESTful API for all operations
- Celery Worker: Handles ML training and analysis tasks
- SciTools Understand: Code parsing and analysis engine
- PMD Integration: Code smell detection with custom rulesets
- Combined Architecture: API and worker run in same container for license sharing
- React Frontend: Modern web interface with real-time updates
- Project Management: Upload and manage Java projects
- ML Training Interface: Configure and monitor training sessions
- Task Monitoring: Real-time progress tracking with localStorage persistence
- MinIO: Object storage for models, reports, and temporary files
- Redis: Task result backend and caching
- Docker Volumes: Persistent data storage
- RabbitMQ: Async task processing with queues:
ml_training
: Machine learning training tasksml_evaluation
: Model evaluation taskscelery
: General background tasks
- PMD 7.11.0: Integrated static analysis tool
- Custom Rulesets: Tailored rules for design patterns, complexity, and best practices
- Automated Detection: GodClass, LawOfDemeter, CyclomaticComplexity, etc.
- CSV Reporting: Structured output for refactoring candidate selection
- MinIO Storage: Cloud-based report archival and retrieval
- ML Models: 7 different model types (RandomForest, GradientBoosting, MLP, etc.)
- Metric Analysis: 262 comprehensive source code metrics
- Real-time Prediction: Integration with refactoring operations
- Model Variants: Lightweight (68 metrics), Ultra-light (10 metrics), Design-based
- Distributed Training: Celery-based ML pipeline with model versioning
CodART/
├── application/ # FastAPI web service and APIs
│ ├── controllers/ # REST API endpoints
│ │ ├── learning_controller_testability.py
│ │ ├── project_management_controller.py
│ │ ├── rl/ # Reinforcement learning endpoints
│ │ └── reporter/ # Export and download controllers
│ ├── services/ # Business logic services
│ │ ├── minio_training_controller.py
│ │ └── config_integration.py
│ ├── celery_workers/ # Background task processors
│ │ ├── ml_training_task.py
│ │ └── model_prediction_task.py
│ └── main.py # FastAPI application entry point
├── codart/ # Core refactoring engine
│ ├── gen/ # ANTLR4-generated parsers
│ │ ├── JavaParserLabeled.py # Labeled grammar (preferred)
│ │ ├── JavaParserLabeledVisitor.py
│ │ └── JavaParserLabeledListener.py
│ ├── refactorings/ # 40+ refactoring implementations
│ │ ├── extract_class.py, extract_method.py
│ │ ├── move_method.py, move_field.py
│ │ ├── pullup_method.py, pushdown_method.py
│ │ └── handler.py # Refactoring registry
│ ├── metrics/ # Quality metrics computation
│ │ ├── qmood.py # QMOOD metrics (8 objectives)
│ │ ├── testability_prediction.py
│ │ └── learner_testability/ # ML models for testability
│ ├── smells/ # Code smell detection
│ │ ├── long_method.py
│ │ └── map_smell_refactoring.py
│ ├── sbse/ # Search-based software engineering
│ │ ├── search_based_refactoring2.py # Main SBSE engine
│ │ └── simple_genetics.py
│ ├── learner/ # Machine learning components
│ │ ├── genetic.py # Genetic algorithms
│ │ ├── alpha_zero_MCTS.py # Monte Carlo Tree Search
│ │ └── sbr_initializer/ # RL environment setup
│ └── utility/ # Common utilities
│ └── setup_understand.py
├── ui/ # React frontend application
│ ├── src/ # React source code
│ ├── public/ # Static assets
│ ├── Dockerfile # UI container build
│ └── nginx.conf # Production web server config
├── benchmark_projects/ # Test projects (14 Java projects)
├── tests/ # Individual refactoring test cases
├── pmd/ # PMD 7.11.0 code analysis tool
│ ├── bin/pmd # PMD executable
│ ├── rules/custom.xml # Custom rulesets
│ └── lib/ # PMD dependencies
├── scitools/ # SciTools Understand installation
│ ├── bin/ # Understand binaries
│ └── plugins/ # Analysis plugins
├── grammars/ # ANTLR4 grammar files
│ ├── JavaParserLabeled.g4 # Preferred fast grammar
│ ├── JavaParser.g4 # Original fast grammar
│ └── Java9_v2.g4 # Legacy slow grammar
├── docker-compose.yml # Multi-service orchestration
├── Dockerfile.api # API container build
├── Dockerfile.base # Base image with dependencies
└── requirements.txt # Python dependencies
- Project Upload: Upload Java projects (ZIP format) via web interface
- Understand Database Creation: Automatic
.und
database generation for code analysis - PMD Code Smell Detection: Automated smell detection using custom rulesets
- ML Training Configuration: Configure RL training parameters and objectives
- Training Execution: Monitor real-time progress with Celery task tracking
- Model Evaluation: View training metrics and model performance
- Results Download: Export trained models, reports, and refactored code
# Direct refactoring execution with SciTools Understand
python codart/refactoring_cli.py \
--udb_path "/path/to/project.und" \
--file_path "/path/to/SourceClass.java" \
--source_class "ClassName" \
--moved_methods "method1,method2" \
--core 0 # 0=Understand, 1=OpenUnderstand
# Search-based multi-objective optimization
python codart/sbse/search_based_refactoring2.py
# Individual refactoring testing
python tests/extract_method/test_1.py
# Testability prediction
python codart/metrics/testability_prediction.py --project-path /path/to/project
# Code smell detection with PMD
./pmd/bin/pmd check -d /path/to/source -R pmd/rules/custom.xml -f csv
# Upload and analyze project
curl -X POST "http://localhost:8000/projects/upload" \
-F "file=@project.zip" \
-F "project_name=MyProject"
# Start ML training with full configuration
curl -X POST "http://localhost:8000/ml-training/train" \
-H "Content-Type: application/json" \
-d '{
"project_id": "123",
"config": {
"population_size": 15,
"max_iterations": 20,
"problem_type": 2,
"objectives": ["ANA", "CAMC", "CIS", "DAM", "DCC", "DSC", "MFA", "MOA"]
}
}'
# Monitor training progress
curl "http://localhost:8000/tasks/{task_id}/status"
# Download results
curl "http://localhost:8000/projects/{project_id}/download/models" -o models.zip
# Get testability prediction
curl -X POST "http://localhost:8000/testability/predict" \
-H "Content-Type: application/json" \
-d '{"project_path": "/opt/projects/MyProject", "model_type": "voting_regressor"}'
CodART implements comprehensive testability prediction using multiple ML approaches:
- RandomForestRegressor: Primary ensemble model for robust predictions
- GradientBoostingRegressor: High-accuracy gradient-based learning
- MLPRegressor: Neural network for complex pattern recognition
- VotingRegressor: Ensemble combining top 3 models for optimal accuracy
- Package Metrics (59): Module-level design quality indicators
- Class Lexical Metrics (17): Code complexity and readability measures
- Class Ordinary Metrics (186): Comprehensive structural analysis
- Total: 262 source code metrics for comprehensive analysis
- Full Model: 262 metrics for maximum accuracy
- Lightweight: 68 metrics for fast real-time prediction
- Ultra-light: 10 most important metrics for instant feedback
- Design-based: Graph network analysis using NetworkX
Integrated PMD 7.11.0 provides automated code quality analysis:
- Design Issues: GodClass, LawOfDemeter, CyclomaticComplexity
- Best Practices: LooseCoupling, UnusedPrivateMethod
- Code Style: UnnecessaryModifier, ProperLogger
- Complexity: NPathComplexity, CognitiveComplexity
- Refactoring Guidance: PMD results guide candidate selection
- Real-time Analysis: Automated execution on project upload
- Cloud Storage: Results archived in MinIO for persistent access
- Custom Rules: Tailored ruleset for refactoring-specific analysis
The system uses Proximal Policy Optimization (PPO) to learn optimal refactoring sequences:
- Environment:
RefactoringSequenceEnvironment
simulates code transformation - State: Current code metrics and smell indicators
- Actions: Available refactoring operations
- Rewards: Multi-objective improvement in quality metrics
- Training: Experience replay with policy and value networks
The system optimizes for 8 design quality objectives:
- ANA (Average Number of Ancestors)
- CAMC (Cohesion Among Methods in Class)
- CIS (Class Interface Size)
- DAM (Data Access Metric)
- DCC (Direct Class Coupling)
- DSC (Design Size in Classes)
- MFA (Measure of Functional Abstraction)
- MOA (Measure of Aggregation)
Structural Refactorings:
- Extract Class, Extract Method, Extract Interface
- Move Method, Move Field, Move Class
- Inline Class, Collapse Hierarchy
Access Control:
- Increase/Decrease Field/Method Visibility
- Encapsulate Field
Inheritance Operations:
- Pull Up Method/Field/Constructor
- Push Down Method/Field
- Make Class Abstract/Concrete/Final
Code Quality:
- Rename Class/Method/Field/Package
- Remove Dead Code
- Replace Conditional with Polymorphism
# Core Paths (Container)
PROJECT_ROOT_DIR="/opt/projects" # Java projects storage
UDB_ROOT_DIR="/opt/understand_dbs" # Understand database files
CSV_ROOT_DIR="/opt/csv_reports" # PMD analysis reports
# SciTools Understand Configuration
STILICENSE="/root/.config/SciTools/License.conf"
STIHOME="/app/scitools" # Understand installation
STIDOSUTILDIR="/root/.config/SciTools" # License directory
UNDERSTAND_API_LICENSE="/root/.local/share/SciTools/Understand/python_api.cfg"
# PMD Configuration
PMD_PATH="/app/pmd/bin/pmd" # PMD executable
PMD_RULESET="/app/pmd/rules/custom.xml" # Custom analysis rules
PMD_CACHE_DIR="/app/pmd/cache" # PMD cache directory
# SBSE Algorithm Configuration
POPULATION_SIZE=15 # GA population size
MAX_ITERATIONS=15 # Maximum generations
NGEN=10 # Alternative iteration setting
PROBLEM=2 # 0=GA, 1=NSGA-II, 2=NSGA-III
NUMBER_OBJECTIVES=8 # QMOOD metrics count
MUTATION_PROBABILITY=0.2 # Mutation rate
CROSSOVER_PROBABILITY=0.8 # Crossover rate
LOWER_BAND=15 # Lower bound for metrics
UPPER_BAND=50 # Upper bound for metrics
# Warm Start Configuration
WARM_START=1 # Enable warm start
INIT_POP_FILE="" # Initial population file
RESUME_EXECUTION="" # Resume from checkpoint
# Service URLs (Container Network)
CELERY_BROKER_URL="amqp://guest:guest@rabbitmq:5672//"
CELERY_RESULT_BACKEND="redis://redis:6379/0"
MINIO_ENDPOINT="minio:9000"
MINIO_ACCESS_KEY="00jFBl7n9Jn0ex0XL7m1"
MINIO_SECRET_KEY="kYfujzkdSGjXKLN9oQhPDIVgRUaZRijvj1yaXmIZ"
# Performance Options
USE_CPP_BACKEND=0 # Enable C++ parser (faster)
QT_QPA_PLATFORM="offscreen" # Headless Qt for Understand
# Research Tracking (Optional)
EXPERIMENTER="Researcher Name"
SCRIPT="search_based_refactoring2.py"
DESCRIPTION="Experiment description"
The system includes 14 benchmark projects (defined in codart/config.py
):
Index | Project | Description | Size |
---|---|---|---|
0 | JSON20201115 | JSON parsing library | Small |
1 | JFreeChart | Chart generation library | Large |
2 | Weka | Machine learning toolkit | Large |
3 | FreeMind | Mind mapping software | Medium |
4 | Commons-codec | Apache commons codec | Small |
5 | JRDF | RDF framework | Medium |
6 | JMetal | Multi-objective optimization | Medium |
7 | AntApache | Build automation tool | Large |
8-13 | Additional projects | Various Java applications | Varies |
Configuration: Set BENCHMARK_INDEX
(0-13) in .env
file or codart/config.py
.
Project Structure: Each benchmark includes:
- Source code in standard Maven/Gradle structure
- Pre-generated
.und
database file - PMD analysis reports (CSV format)
- Initial metrics baseline
- Code smell detection results
# Install Python dependencies
pip install -r requirements.txt
# Setup SciTools Understand (Local Installation)
export PYTHONPATH="/opt/scitools/bin/linux64/Python:$PYTHONPATH"
export PATH="/opt/scitools/bin/linux64:$PATH"
export LD_LIBRARY_PATH="/opt/scitools/bin/linux64:$LD_LIBRARY_PATH"
# Activate Understand license
und -setofflinereplycode YOUR_LICENSE_CODE
# Install PMD (if not using Docker)
wget https://github.com/pmd/pmd/releases/download/pmd_releases%2F7.11.0/pmd-bin-7.11.0.zip
unzip pmd-bin-7.11.0.zip -d /opt/pmd
# Start development services
# Terminal 1: API server
uvicorn application.main:app --reload --host 0.0.0.0 --port 8000
# Terminal 2: Celery worker
celery -A application.celery_workers.ml_training_task worker --loglevel=info
# Terminal 3: UI development server
cd ui && npm install && npm start
# Terminal 4: Redis (if not using Docker)
redis-server
# Terminal 5: RabbitMQ (if not using Docker)
rabbitmq-server
# Generate parser from grammar (requires ANTLR4)
cd grammars
antlr4 -Dlanguage=Python3 JavaParserLabeled.g4 -visitor -listener
mv *.py ../codart/gen/
# Test grammar parsing speed
python tests/grammar_speed_tests/test_performance.py
-
Create refactoring module in
codart/refactorings/
:from codart.gen.JavaParserLabeledListener import JavaParserLabeledListener class MyRefactoring(JavaParserLabeledListener): def __init__(self, source_class, target_info): self.source_class = source_class # Implementation details
-
Inherit from appropriate base class:
JavaParserLabeledListener
(recommended)JavaParserLabeledVisitor
(for complex traversals)- Import from
codart.gen.JavaLabled
package
-
Add comprehensive tests in
tests/
directory:tests/my_refactoring/ ├── test_1.py # Main test script ├── input.java # Test input code ├── expected.java # Expected output └── README.md # Test documentation
-
Register refactoring in
codart/refactorings/handler.py
:from .my_refactoring import MyRefactoring REFACTORING_REGISTRY = { 'my_refactoring': MyRefactoring, # ... other refactorings }
-
Test on benchmark projects:
python codart/sbse/search_based_refactoring2.py
-
Update documentation with refactoring details and examples
# Run individual refactoring tests
python tests/extract_method/test_1.py
python tests/move_method/test_move_method.py
python tests/pullup_method/test_pullup.py
# Test specific refactoring with custom input
python -c "from codart.refactorings.extract_class import ExtractClass; \
ec = ExtractClass('input.java', 'SourceClass', ['field1', 'method1']); \
ec.do_refactor()"
# Run all tests in a category
find tests/ -name "test_*.py" -exec python {} \;
# Test on benchmark projects (full SBSE)
python codart/sbse/search_based_refactoring2.py
# Test PMD integration
./pmd/bin/pmd check -d benchmark_projects/JSON20201115/src \
-R pmd/rules/custom.xml -f csv -r results.csv
# Test Understand integration
python codart/utility/understand_install_test.py
# Test ML components
python codart/metrics/testability_prediction.py --test
# Performance testing
python tests/grammar_speed_tests/benchmark_parsing.py
SciTools License Error:
# Check license status
docker exec -it codart_api_1 und license
# Reactivate license
docker exec -it codart_api_1 /app/activate_license.sh
Memory Issues:
- Increase container memory limit in docker-compose.yml
- Reduce population size in configuration
- Use smaller benchmark projects for testing
Build Failures:
# Clean rebuild
docker-compose down -v
docker-compose build --no-cache
docker-compose up
- Use fast grammar
JavaParserLabeled.g4
for new development - Enable C++ backend for faster parsing (optional)
- Configure appropriate population size based on available resources
- Use SSD storage for Docker volumes
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
- Follow existing code patterns and naming conventions
- Use
JavaParserLabeled.g4
for new refactoring implementations - Test on individual files before benchmark projects
- Document new refactoring operations
- Follow security best practices
If you use CodART in your research, please cite:
@misc{codart2024,
title={CodART: Source Code Automated Refactoring Toolkit},
author={Zakeri, Morteza and contributors},
year={2024},
url={https://github.com/m-zakeri/CodART}
}
This project is licensed under the MIT License - see the LICENSE file for details.
- Official Documentation
- Refactoring Catalog
- Code Smells Reference
- Benchmark Projects
- API Documentation (when running locally)
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: m-zakeri[at]live[dot]com
CodART is actively developed at IUST Reverse Engineering Laboratory