-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Summary of the Question
I have a dataset with features in the X dataframe, and the target in the y series. I am trying to select the features in X such that, when fitting a model, I reach the extreme of some objective function. Specifically, the model is linear regression, and the objective is BIC (Bayesian Information Criterion) - I'm trying to select the features in X so as to minimize BIC for the model.
X has a very large number of features, so exhaustive search of all feature combinations is not doable. But if you create a list of binary flags [1, 0, 0, 1, 1, 1, 0, 1, 0, 0, ...], one flag for each feature (0 means feature is not selected, 1 means feature is selected), then the problem becomes one of hyperparameter optimization: find the binary flag values that minimize BIC, by selecting the best features.
I've tried to use Optuna with CmaEsSampler(). I've also tried the cmaes library directly.
For some reason, Optuna with CmaEsSampler() finds a better solution (lower BIC), and does not get stuck, but is slow. The cmaes library, at least my implementation, only finds a slightly less good solution (slightly higher BIC), and appears to get stuck in a local minimum, but it iterates much faster.
I would like to use cmaes directly because it's so much faster, but I can't make it overcome the local minimum. What am I missing?
Detailed Explanation
Optuna code:
def fs_objective(trial, X, y, features):
features = copy.deepcopy(features)
random.shuffle(features)
features_use = ['const'] + [f for f in features if trial.suggest_int(f, 0, 1) == 1]
lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
return lin_mod.bic
features_select = [f for f in X.columns if f != 'const']
fs_sampler = optuna.samplers.CmaEsSampler(n_startup_trials=1, seed=0, with_margin=True)
study = optuna.create_study(sampler=fs_sampler, study_name=study_name, direction='minimize')
study.optimize(lambda trial: fs_objective(trial, X, y, features_select))My cmaes code, inspired from this link https://github.com/CyberAgentAILab/cmaes/blob/main/examples/cmaes_with_margin_binary.py
def cma_objective(fs):
features_use = ['const'] + [f for i, f in enumerate(features_select) if fs[i,] == 1]
lin_mod = sm.OLS(y, X[features_use], hasconst=True).fit()
return lin_mod.bic
features_select = [f for f in X.columns if f != 'const']
cma_bounds = np.tile([0, 1], (len(features_select), 1))
cma_steps = np.ones(len(features_select))
optimizer = CMAwM(mean=np.zeros(len(features_select)), sigma=2.0, bounds=cma_bounds, steps=cma_steps, seed=0)
pop_size = optimizer.population_size
gen_max = 10000
best_value = np.inf
best_gen = 0
best_sol_raw = None
history_values = np.full((gen_max,), np.nan)
history_values_best = np.full((gen_max,), np.nan)
for generation in tqdm(range(gen_max)):
best_value_gen = np.inf
sol = []
solutions = []
vals = np.full((pop_size,), np.nan)
for i in range(optimizer.population_size):
fs_for_eval, fs_for_tell = optimizer.ask()
solutions.append(fs_for_eval)
value = cma_objective(fs_for_eval)
vals[i] = value
sol.append((fs_for_tell, value))
optimizer.tell(sol)
best_value_gen = vals.min()
if best_value_gen < best_value:
best_value = best_value_gen
best_gen = generation
best_sol_raw = solutions[np.argmin(vals)]
print(f'gen: {best_gen:5n}, new best objective: {best_value:.4f}')
history_values[generation] = best_value_gen
history_values_best[generation] = best_value
if optimizer.should_stop():
break
gen_completed = generationFull code - this is the notebook with all the code, both Optuna and cmaes, along with other things I've attempted, and all required context (data loading, etc):
https://github.com/FlorinAndrei/feature_selection/blob/main/feature_selection.ipynb
Context and Environment
Python 3.11.7
cmaes 0.10.0
optuna 3.5.0
Ubuntu 22.04
Additional Information
Optuna history:
cmaes history:
Why is there so much more variance in the Optuna trials? Why is it able to maintain that variance across many trials? It seems like the Optuna code would continue to find even better combinations if I let it run even more.

