Skip to content

Releases: rodrigo-arenas/Sklearn-genetic-opt

0.12.0

23 Jul 16:49
9641e53

Choose a tag to compare

This release includes:

Features:

  • Support for outlier detection algorithms, by @XBastille

Bug Fixes:

0.11.1

17 Sep 16:58
f9a643a

Choose a tag to compare

Bug Fixes:

  • Fixed a bug that would generate AttributeError: 'GASearchCV' object has no attribute 'creator'

0.11.0

12 Sep 23:26
1314a7c

Choose a tag to compare

Features:

  • Added a parameter use_cache, which defaults to True. When enabled, the algorithm will skip re-evaluating solutions that have already been evaluated, retrieving the performance metrics from the cache instead.
    If use_cache is set to False, the algorithm will always re-evaluate solutions, even if they have been seen before, to obtain fresh performance metrics.

  • Added a parameter in GAFeatureSelectionCV named warm_start_configs, which defaults to None. This is a list of predefined hyperparameter configurations to seed the initial population. Each element in the list is a dictionary where the keys are the names of the hyperparameters, and the values are the corresponding hyperparameter values to be used for the individual.

    Example:

    warm_start_configs = [
        {"min_weight_fraction_leaf": 0.02, "bootstrap": True, "max_depth": None, "n_estimators": 100},
        {"min_weight_fraction_leaf": 0.4, "bootstrap": True, "max_depth": 5, "n_estimators": 200},
    ]

The genetic algorithm will initialize part of the population with these configurations to warm-start the optimization process. The remaining individuals in the population will be initialized randomly according to the defined hyperparameter space.

This parameter is useful when prior knowledge of good hyperparameter configurations exists, allowing the algorithm to focus on refining known good solutions while still exploring new areas of the hyperparameter space. If set to None, the entire population will be initialized randomly.

  • Introduced a novelty search strategy to the GASearchCV class. This strategy rewards solutions that are more distinct from others in the population by incorporating a novelty score into the fitness evaluation. The novelty score encourages exploration and promotes diversity, reducing the risk of premature convergence to local optima.

    * Novelty Score: Calculated based on the distance between an individual and its nearest neighbors in the population. Individuals with higher novelty scores are more distinct from the rest of the population.
    
    * Fitness Evaluation: The overall fitness is now a combination of the traditional performance score and the novelty score, allowing the algorithm to balance between exploiting known good solutions and exploring new, diverse ones.
    
    * Improved Exploration: This strategy helps explore new areas of the hyperparameter space, increasing the likelihood of discovering better solutions and avoiding local optima.
    

API Changes:

  • Dropped support for Python 3.8

0.10.1

15 Mar 01:37
c04408d

Choose a tag to compare

This is a small release for a minor bug fix

Features:

  • Install TensorFlow when using pip install sklearn-genetic-opt[all]

Bug Fixes:

  • Fixed a bug that wouldn’t allow cloning the GA classes when used inside a pipeline

0.10.0

15 Feb 02:43
3afd98b

Choose a tag to compare

This release brings support to python 3.10, it also comes with different API updates and algorithms optimization

API Changes:

  • GAFeatureSelectionCV now mimics the scikit-learn FeatureSelection algorithms API instead of Grid Search, this enables easier implementation as a selection method that is closer to the scikit-learn API
  • Improved GAFeatureSelectionCV candidate generation when max_features is set, it also ensures there is at least one feature selected
  • crossover_probability and mutation_probability are now correctly passed to the mate and mutation functions inside GAFeatureSelectionCV
  • Dropped support for python 3.7 and add support for python 3.10
  • Update most important packages from dev-requirements.txt to more recent versions
  • Update deprecated functions in tests

Thanks to the people who contributed with their ideas and suggestions

0.9.0

06 Jun 22:46
5ca13c4

Choose a tag to compare

This release comes with new features and general performance improvements

Features:

  • Introducing Adaptive Schedulers to enable adaptive mutation and crossover probabilities; currently, supported schedulers are: ConstantAdapter, ExponentialAdapter, InverseAdapter, and PotentialAdapter

  • Add random_state parameter (default= None) in Continuous, Categorical and Integer classes from space to leave fixed the random seed during hyperparameters sampling.

API Changes:

  • Changed the default values of mutation_probability and crossover_probability to 0.8 and 0.2, respectively.

  • The weighted_choice function used in GAFeatureSelectionCV was re-written to give more probability to a number of features closer to the max_features parameter

  • Removed unused and broken function plot_parallel_coordinates()

Bug Fixes

  • Now, when using the plot_search_space() function, all the parameters get cast as np.float64 to avoid errors on the seaborn package while plotting bool values.

0.8.1

09 Mar 19:34
667e396

Choose a tag to compare

This release implements a change when the max_features parameter from class GAFeatureSelectionCV is set, the initial population is now sampled giving more probability to solutions with less than max_features features.

0.8.0

05 Jan 02:43
f21ff28

Choose a tag to compare

This release comes with some requested features and enhancements.

Features:

  • Class GAFeatureSelectionCV now has a parameter called max_features, int, default=None. If it's not None, it will penalize individuals with more features than max_features, putting a "soft" upper bound to the number of features to be selected.

  • Classes GASearchCV and GAFeatureSelectionCV now support multi-metric evaluation the same way scikit-learn does; you will see this reflected on the logbook and cv_results_ objects, where now you get results for each metric. As in scikit-learn, if multi-metric is used, the refit parameter must be a str specifying the metric to evaluate the cv-scores.

  • Training gracefully stops if interrupted by some of these exceptions: KeyboardInterrupt, SystemExit, StopIteration.
    When one of these exceptions is raised, the model finishes the current generation and saves the current best model. It only works if at least one generation has been completed.

API Changes:

  • The following parameters changed their default values to create more extensive and different models with better results:

    • population_size from 10 to 50

    • generations from 40 to 80

    • mutation_probability from 0.1 to 0.2

Docs:

  • A new notebook called Iris_multimetric was added to showcase the new multi-metric capabilities.

0.7.0

17 Nov 23:15
a2c6c29

Choose a tag to compare

This is an exciting release! It introduces features selection capabilities to the package

Features:

  • GAFeatureSelectionCV class for feature selection along with any scikit-learn classifier or regressor. It optimizes the cv-score while minimizing the number of features to select. This class is compatible with the mlflow and tensorboard integration, the Callbacks, and the plot_fitness_evolution function.

API Changes:

The module mlflow was renamed to mlflow_log to avoid unexpected errors on name resolutions

0.6.1

04 Aug 16:07
e733cb7

Choose a tag to compare

This is a minor release that fixes a couple of bugs and adds some minor options.

Features:

  • Added the parameter generations to DeltaThreshold. Now it compares the maximum and minimum values of a metric from the last generations, instead of just the current and previous ones. The default value is 2, so the behavior remains the same as in previous versions.

Bug Fixes:

  • When a param_grid of length 1 is provided, a user warning is raised instead of an error. Internally it will swap the crossover operation to use the DEAP's tools.cxSimulatedBinaryBounded.
  • When using Continuous class with boundaries lower and upper, a uniform distribution with limits [lower, lower + upper] was sampled, now, it's properly sampled using a [lower, upper] limit.