This project demonstrates the implementation of the Hill Climbing algorithm for optimization problems and feature selection tasks. It includes two main applications:
- Hill Climbing for Optimization: A basic hill climbing algorithm that finds the minimum of an objective function (e.g., a quadratic function).
- Feature Selection via Hill Climbing: The hill climbing algorithm is used to select the best subset of features for training a Decision Tree Classifier using a synthetic classification dataset.
- Hill Climbing Algorithm: The main optimization algorithm used in the project is hill climbing, which iteratively improves solutions by evaluating and selecting neighboring candidates based on their cost or score.
- Objective Function: The objective functions evaluate the cost or error of solutions. For optimization tasks, the function aims to minimize the result (e.g.,
x^2
), and for feature selection, it evaluates model performance based on selected features. - Feature Selection: The project also includes a feature selection component, which evaluates various subsets of features and uses hill climbing to find the best feature subset that maximizes classification performance.
- Stochastic Search: The hill climbing algorithm incorporates a stochastic element through random mutations, which introduces variability in the search process.
- Objective Function: Functions that compute the cost or error of a given solution.
- Hill Climbing: The search algorithm that iteratively explores solutions to optimize the objective.
- Feature Selection: A process that uses hill climbing to select an optimal subset of features for classification tasks.
- Data Generation: The project uses synthetic datasets, generated using
sklearn.datasets.make_classification
, to simulate classification tasks with various feature subsets. - Model Evaluation: The Decision Tree Classifier from scikit-learn is used for evaluating the performance of selected feature subsets.
The following Python libraries are required for this project:
numpy
matplotlib
sklearn
To install the dependencies, you can use pip:
pip install numpy matplotlib scikit-learn
The algorithm starts by generating a random initial solution and evaluating its fitness using the objective function. It then explores neighboring solutions (mutated candidates) and accepts those that improve the current solution. This process continues for a specified number of iterations or until no improvement is found for a given number of iterations (patience).
In this task, the objective function evaluates the performance of a classifier (Decision Tree) based on a selected subset of features. The hill climbing algorithm iterates over possible subsets of features, mutating and evaluating their performance until it finds the best feature subset.
The optimization part of the project solves a simple quadratic optimization problem. To run the optimization:
python hill_climbing_optimization.py
This will execute the hill climbing algorithm to minimize the function f(x) = x^2
and display the optimization progress.
The feature selection task uses hill climbing to select the best feature subset for a Decision Tree Classifier. To run the feature selection:
python hill_climbing_feature_selection.py
This will execute the hill climbing algorithm for feature selection on a synthetic classification dataset, reporting the best feature subset and its performance.
For the optimization problem, the output will show the progress of the hill climbing algorithm as it iterates over possible solutions:
>0 f([0.5]) = 0.25000
>1 f([0.2]) = 0.04000
...
For feature selection, the output will display the feature subset selection process:
Initial solution: 50 features, Score: 0.8430
Iteration 001: 40 features, Score: 0.8450
Iteration 002: 42 features, Score: 0.8475
...
Done!
Best Solution: 42 features selected, Score: 0.8475
Selected Features: [1, 3, 5, 7, 9, 11, 13, 15, 17]