why is the Optuna CMA-ES sampler better than my custom cmaes code?

## Summary of the Question

I have a dataset with features in the X dataframe, and the target in the y series. I am trying to select the features in X such that, when fitting a model, I reach the extreme of some objective function. Specifically, the model is linear regression, and the objective is BIC (Bayesian Information Criterion) - I'm trying to select the features in X so as to minimize BIC for the model.

X has a very large number of features, so exhaustive search of all feature combinations is not doable. But if you create a list of binary flags `[1, 0, 0, 1, 1, 1, 0, 1, 0, 0, ...]`, one flag for each feature (0 means feature is not selected, 1 means feature is selected), then the problem becomes one of hyperparameter optimization: find the binary flag values that minimize BIC, by selecting the best features.

I've tried to use Optuna with `CmaEsSampler()`. I've also tried the `cmaes` library directly.

For some reason, Optuna with `CmaEsSampler()` finds a better solution (lower BIC), and does not get stuck, but is slow. The `cmaes` library, at least my implementation, only finds a slightly less good solution (slightly higher BIC), and appears to get stuck in a local minimum, but it iterates much faster.

I would like to use `cmaes` directly because it's so much faster, but I can't make it overcome the local minimum. What am I missing?

## Detailed Explanation

Optuna code:

```python
def fs_objective(trial, X, y, features):
    features = copy.deepcopy(features)
    random.shuffle(features)
    features_use = ['const'] + [f for f in features if trial.suggest_int(f, 0, 1) == 1]
    lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
    return lin_mod.bic

features_select = [f for f in X.columns if f != 'const']
fs_sampler = optuna.samplers.CmaEsSampler(n_startup_trials=1, seed=0, with_margin=True)
study = optuna.create_study(sampler=fs_sampler, study_name=study_name, direction='minimize')
study.optimize(lambda trial: fs_objective(trial, X, y, features_select))
```

My `cmaes` code, inspired from this link https://github.com/CyberAgentAILab/cmaes/blob/main/examples/cmaes_with_margin_binary.py

```python
def cma_objective(fs):
    features_use = ['const'] + [f for i, f in enumerate(features_select) if fs[i,] == 1]
    lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
    return lin_mod.bic

features_select = [f for f in X.columns if f != 'const']
cma_bounds = np.tile([0, 1], (len(features_select), 1))
cma_steps = np.ones(len(features_select))
optimizer = CMAwM(mean=np.zeros(len(features_select)), sigma=2.0, bounds=cma_bounds, steps=cma_steps, seed=0)
pop_size = optimizer.population_size

gen_max = 10000
best_value = np.inf
best_gen = 0
best_sol_raw = None
history_values = np.full((gen_max,), np.nan)
history_values_best = np.full((gen_max,), np.nan)

for generation in tqdm(range(gen_max)):
    best_value_gen = np.inf
    sol = []
    solutions = []
    vals = np.full((pop_size,), np.nan)

    for i in range(optimizer.population_size):
        fs_for_eval, fs_for_tell = optimizer.ask()
        solutions.append(fs_for_eval)
        value = cma_objective(fs_for_eval)
        vals[i] = value
        sol.append((fs_for_tell, value))
    optimizer.tell(sol)

    best_value_gen = vals.min()
    if best_value_gen < best_value:
        best_value = best_value_gen
        best_gen = generation
        best_sol_raw = solutions[np.argmin(vals)]
        print(f'gen: {best_gen:5n}, new best objective: {best_value:.4f}')
    history_values[generation] = best_value_gen
    history_values_best[generation] = best_value

    if optimizer.should_stop():
        break
gen_completed = generation
```

Full code - this is the notebook with all the code, both Optuna and `cmaes`, along with other things I've attempted, and all required context (data loading, etc):

https://github.com/FlorinAndrei/feature_selection/blob/main/feature_selection.ipynb

## Context and Environment

Python 3.11.7
cmaes 0.10.0
optuna 3.5.0
Ubuntu 22.04

## Additional Information

Optuna history:

![optuna](https://github.com/CyberAgentAILab/cmaes/assets/901867/8297c05d-dfcc-49f1-ad92-4b2ed7d6ce58)

cmaes history:

![cmaes](https://github.com/CyberAgentAILab/cmaes/assets/901867/756aadb8-a622-462a-b631-8ea1908e6cdf)


Why is there so much more variance in the Optuna trials? Why is it able to maintain that variance across many trials? It seems like the Optuna code would continue to find even better combinations if I let it run even more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

why is the Optuna CMA-ES sampler better than my custom cmaes code? #163

Summary of the Question

Detailed Explanation

Context and Environment

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

why is the Optuna CMA-ES sampler better than my custom cmaes code? #163

Description

Summary of the Question

Detailed Explanation

Context and Environment

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions