Skip to content

Unexplained running out of RAM memory with ensemble voting classifier model #295

@apavlo89

Description

@apavlo89

Hello,

I am experiencing an issue with high RAM usage when running ExplainerDashboard for a VotingClassifier ensemble model. The model is trained on a 2k dataset with 400 features in total however each classifier in voting classifier uses a subset of this feature set. My purpose is to understand how my algo algo makes predictions for a dataset in which i do not know the label outcome yet. Despite the dataset for explanation being relatively small (28 rows), the RAM usage spikes to over 51GB. Keep in mind I am running this in google collab. I suspect this might be related to how ExplainerDashboard handles ensemble models or the computation of SHAP values for such complex models or it might just be a bug. Below is a simplified version of my setup:

Model Setup

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
# Other imports...

# Define pipelines for individual models (example)
lr_pipeline = Pipeline([...])
xgb_pipeline = Pipeline([...])
# Other pipelines...

# VotingClassifier ensemble
eclf = VotingClassifier(
    estimators=[
        ('lr', lr_pipeline),
        ('xgb', xgb_pipeline),
        # Other models...
    ],
    voting='soft'
)

eclf.fit(X_train, y_train)


from explainerdashboard import ClassifierExplainer, ExplainerDashboard

# Initialize the Explainer without target labels
explainer = ClassifierExplainer(eclf, future_predict, shap='kernel', model_output='probability')

# Create and run the dashboard
dashboard = ExplainerDashboard(explainer)
dashboard.run(port=8050)
ngrok_tunnel = ngrok.connect(8050)
print('Public URL:', ngrok_tunnel.public_url)

This is the output:

WARNING: For shap='kernel', shap interaction values can unfortunately not be calculated!
Note: shap values for shap='kernel' normally get calculated against X_background, but paramater X_background=None, so setting X_background=shap.sample(X, 50)...
Generating self.shap_explainer = shap.KernelExplainer(model, X, link='identity')
Building ExplainerDashboard..
Detected google colab environment, setting mode='external'
No y labels were passed to the Explainer, so setting model_summary=False...
For this type of model and model_output interactions don't work, so setting shap_interaction=False...
The explainer object has no decision_trees property. so setting decision_trees=False...
Generating layout...
Calculating shap values...
/usr/local/lib/python3.10/dist-packages/dash/dash.py:538: UserWarning:

JupyterDash is deprecated, use Dash instead.
See https://dash.plotly.com/dash-in-jupyter for more details.

In just a few seconds of running the code use exceeds RAM availability and crashes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions