Skip to content

why is the Optuna CMA-ES sampler better than my custom cmaes code? #163

@FlorinAndrei

Description

@FlorinAndrei

Summary of the Question

I have a dataset with features in the X dataframe, and the target in the y series. I am trying to select the features in X such that, when fitting a model, I reach the extreme of some objective function. Specifically, the model is linear regression, and the objective is BIC (Bayesian Information Criterion) - I'm trying to select the features in X so as to minimize BIC for the model.

X has a very large number of features, so exhaustive search of all feature combinations is not doable. But if you create a list of binary flags [1, 0, 0, 1, 1, 1, 0, 1, 0, 0, ...], one flag for each feature (0 means feature is not selected, 1 means feature is selected), then the problem becomes one of hyperparameter optimization: find the binary flag values that minimize BIC, by selecting the best features.

I've tried to use Optuna with CmaEsSampler(). I've also tried the cmaes library directly.

For some reason, Optuna with CmaEsSampler() finds a better solution (lower BIC), and does not get stuck, but is slow. The cmaes library, at least my implementation, only finds a slightly less good solution (slightly higher BIC), and appears to get stuck in a local minimum, but it iterates much faster.

I would like to use cmaes directly because it's so much faster, but I can't make it overcome the local minimum. What am I missing?

Detailed Explanation

Optuna code:

def fs_objective(trial, X, y, features):
    features = copy.deepcopy(features)
    random.shuffle(features)
    features_use = ['const'] + [f for f in features if trial.suggest_int(f, 0, 1) == 1]
    lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
    return lin_mod.bic

features_select = [f for f in X.columns if f != 'const']
fs_sampler = optuna.samplers.CmaEsSampler(n_startup_trials=1, seed=0, with_margin=True)
study = optuna.create_study(sampler=fs_sampler, study_name=study_name, direction='minimize')
study.optimize(lambda trial: fs_objective(trial, X, y, features_select))

My cmaes code, inspired from this link https://github.com/CyberAgentAILab/cmaes/blob/main/examples/cmaes_with_margin_binary.py

def cma_objective(fs):
    features_use = ['const'] + [f for i, f in enumerate(features_select) if fs[i,] == 1]
    lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
    return lin_mod.bic

features_select = [f for f in X.columns if f != 'const']
cma_bounds = np.tile([0, 1], (len(features_select), 1))
cma_steps = np.ones(len(features_select))
optimizer = CMAwM(mean=np.zeros(len(features_select)), sigma=2.0, bounds=cma_bounds, steps=cma_steps, seed=0)
pop_size = optimizer.population_size

gen_max = 10000
best_value = np.inf
best_gen = 0
best_sol_raw = None
history_values = np.full((gen_max,), np.nan)
history_values_best = np.full((gen_max,), np.nan)

for generation in tqdm(range(gen_max)):
    best_value_gen = np.inf
    sol = []
    solutions = []
    vals = np.full((pop_size,), np.nan)

    for i in range(optimizer.population_size):
        fs_for_eval, fs_for_tell = optimizer.ask()
        solutions.append(fs_for_eval)
        value = cma_objective(fs_for_eval)
        vals[i] = value
        sol.append((fs_for_tell, value))
    optimizer.tell(sol)

    best_value_gen = vals.min()
    if best_value_gen < best_value:
        best_value = best_value_gen
        best_gen = generation
        best_sol_raw = solutions[np.argmin(vals)]
        print(f'gen: {best_gen:5n}, new best objective: {best_value:.4f}')
    history_values[generation] = best_value_gen
    history_values_best[generation] = best_value

    if optimizer.should_stop():
        break
gen_completed = generation

Full code - this is the notebook with all the code, both Optuna and cmaes, along with other things I've attempted, and all required context (data loading, etc):

https://github.com/FlorinAndrei/feature_selection/blob/main/feature_selection.ipynb

Context and Environment

Python 3.11.7
cmaes 0.10.0
optuna 3.5.0
Ubuntu 22.04

Additional Information

Optuna history:

optuna

cmaes history:

cmaes

Why is there so much more variance in the Optuna trials? Why is it able to maintain that variance across many trials? It seems like the Optuna code would continue to find even better combinations if I let it run even more.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions