Classical-AutoML

An AutoML framework for classical machine learning algorithms, automating model selection and hyperparameter tuning through Bayesian optimization, portfolio-based meta-learning, and multi-fidelity evaluation using Successive Halving.

🚀Highlights

🔍 Model and Hyperparameter Selection: Automatically selects the best model and configuration for a given dataset using SMBO (Sequential Model-Based Optimization).
🧠 Meta-Learning with Portfolio Construction: Warm-starts the optimization using a greedy portfolio of configurations collected across datasets.
⚖️ Multi-Fidelity Evaluation: Uses Successive Halving to allocate more resources to promising candidates and reduce evaluation cost.
⚙️ Focus on Classical ML Models: Supports a curated set of efficient, non-deep-learning models — including Random Forests, Extra Trees, MLP, Stochastic Gradient Descent, Passive Aggressive, and Histogram-based Gradient Boosting.
🖧 Parallelized Meta-Learning and Benchmarking with Ray: Ray is used to parallelize large-scale evaluation of configurations across datasets both during portfolio construction (meta-learning) and during benchmarking runs significantly speeding up experimentation.
🤝 Competitive Benchmarking: Evaluated against Auto-sklearn 2.0 and achieves competitive accuracy across test datasets.

⚙️ Ray Cluster Setup & Dashboard Access

This project uses Ray to parallelize configuration evaluations during meta-learning and benchmarking.

To simplify setup, a scripts/start_cluster.sh script is provided to automatically:

Start a Ray head node
Connect and start Ray worker nodes via SSH
Activate virtual environments and configure PYTHONPATH on all nodes

📌 Before running the script, ensure passwordless SSH access from the head node to all worker nodes.

📌 You must customize the VENV_PATH, PROJECT_PATH, HEAD_IP, and WORKER_NODES variables in the script based on your system and cluster configuration.

🔐 SSH Setup (One-Time)

On the head node:

ssh-keygen                         # Press Enter to accept default
ssh-copy-id username@<worker_ip>   # Repeat for each worker IP

Once started, the Ray dashboard is available at:

http://<HEAD_IP>:8265

▶️Running the Project

Requirements:

The requirements.txt file contains all the necessary Python dependencies for the project. Place any additional required dependencies here. You can install them using:

    pip install -r requirements.txt

To use the AutoML framework on your own dataset:

1. Import and Initialize

from backend.autoclassifier import AutoClassifier

clf = AutoClassifier(
    seed=42,
    walltime_limit=300,   # seconds
    min_budget=10,
    max_budget=200
)

2. Fit the model on training data

clf.fit(X_train, y_train)

Make sure your dataset is in pandas.DataFrame. The framework internally handles categorical encoding and scaling.

3. Predict on test data

y_pred = clf.predict(X_test)

4. Best configuration

best_config = clf.best_config

🗂️Project Structure

    backend
    │   autoclassifier.py
    │   autosklearn_benchmark.py
    │   benchmark.py
    │   config.json
    │   config.py
    │   Optimizer.py
    │   plots.py
    │   test.py
    │   
    ├───meta_learning
    │       best_candidate_run.py
    │       candidates.py
    │       candidates._openml.py
    │       performance_matrix.py
    │       portfolio.py
    │       preprocess.py
    │
    └───pipelines
            BasePipeline.py
            BuildConfigurations.py
            ExtraTreesPipeline.py
            HistGradientBoostingPipeline.py
            MLPPipeline.py
            PassiveAggressivePipeline.py
            PipelineRegistry.py
            RandomForestPipeline.py
            SgdPipeline.py
            util.py

📊Results

The framework was evaluated on benchmark datasets from OpenML under a strict 10-minute runtime constraint per task. The experiments focused on two aspects: the impact of meta-learning and the comparison against Auto-sklearn 2.0.

🔍 Meta-Learning Impact

On ~80% of datasets, meta-learning either improved test accuracy or matched baseline performance (within 1% difference).
This demonstrates that portfolio-based initialization is especially effective under tight budget constraints.

🆚 Comparison with Auto-sklearn 2.0

The framework outperformed Auto-sklearn on ~50% of the datasets.
On another ~30% of datasets, performance differed by less than 1%.
In rare cases of lower accuracy, it was due to training timeouts on large datasets.

⚙️ Experimental Setup

Max runtime: 10 minutes per task
Budgets: min_budget = 10, max_budget = 500
Experiments parallelized using Ray across 7 machines (164 CPUs)

📄 Read the full report for methodology, configuration details, and accuracy plots.

🎥 Demo

Watch the project in action! Check out the demo video:

Click the image or this link to watch the demo.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
backend		backend
data		data
logs		logs
misc		misc
recording		recording
report		report
results		results
scripts		scripts
README.md		README.md
commands.txt		commands.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classical-AutoML

🚀Highlights

⚙️ Ray Cluster Setup & Dashboard Access

🔐 SSH Setup (One-Time)

▶️Running the Project

Requirements:

1. Import and Initialize

2. Fit the model on training data

3. Predict on test data

4. Best configuration

🗂️Project Structure

📊Results

🔍 Meta-Learning Impact

🆚 Comparison with Auto-sklearn 2.0

⚙️ Experimental Setup

🎥 Demo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

tejakiransirivella/Classical-AutoML

Folders and files

Latest commit

History

Repository files navigation

Classical-AutoML

🚀Highlights

⚙️ Ray Cluster Setup & Dashboard Access

🔐 SSH Setup (One-Time)

▶️Running the Project

Requirements:

1. Import and Initialize

2. Fit the model on training data

3. Predict on test data

4. Best configuration

🗂️Project Structure

📊Results

🔍 Meta-Learning Impact

🆚 Comparison with Auto-sklearn 2.0

⚙️ Experimental Setup

🎥 Demo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages