Unexplained running out of RAM memory with ensemble voting classifier model

Hello,

I am experiencing an issue with high RAM usage when running ExplainerDashboard for a `VotingClassifier` ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:


### Model Setup
```python
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Other imports...

# Define pipelines for individual models (example)
lr_pipeline = Pipeline([...])
xgb_pipeline = Pipeline([...])
# Other pipelines...

# VotingClassifier ensemble
eclf = VotingClassifier(
    estimators=[
        ('lr', lr_pipeline),
        ('xgb', xgb_pipeline),
        # Other models...
    ],
    voting='soft'
)

eclf.fit(X_train, y_train)


from explainerdashboard import ClassifierExplainer, ExplainerDashboard

# Initialize the Explainer without target labels
explainer = ClassifierExplainer(eclf, future_predict, shap='kernel', model_output='probability')

# Create and run the dashboard
dashboard = ExplainerDashboard(explainer)
dashboard.run(port=8050)
ngrok_tunnel = ngrok.connect(8050)
print('Public URL:', ngrok_tunnel.public_url)
```

This is the output:
```
WARNING: For shap='kernel', shap interaction values can unfortunately not be calculated!
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')
Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
No y labels were passed to the Explainer, so setting model_summary=False...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...
/usr/local/lib/python3.10/dist-packages/dash/dash.py:538: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.
```

In just a few seconds of running the code use exceeds RAM availability and crashes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unexplained running out of RAM memory with ensemble voting classifier model #295

Model Setup

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Unexplained running out of RAM memory with ensemble voting classifier model #295

Description

Model Setup

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions