Skip to content

Shapley-Based Visualization for Prediction Analysis #623

@guillaume-vignal

Description

@guillaume-vignal

Description

Currently, Shapash allows us to visualize model predictions and errors using the True values vs. Predicted values plot. This is available as a scatter plot for regression problems and a violin plot (similar to a confusion matrix) for classification problems. This visualization provides an overall view of whether predictions are accurate or not and in what proportion. It also enables picking specific cases to further analyze them using the local plot.

Image

Image

Image

However, this approach focuses solely on whether a prediction is correct or incorrect based on the target variable. It does not consider why a prediction is well-estimated or misestimated. Two well-predicted individuals may be correctly classified for entirely different reasons.

Feature Proposal

We propose a new visualization based on Shapley values to better understand why certain predictions are accurate or not. The idea is to project Shapley values of each instance into a 2D space, allowing for a more interpretable visualization of how different factors influence predictions.

This visualization would include:

  1. Shapley Projection Plot

    • Each instance is projected into a 2D space (UMAP) based on its Shapley values.
    • Points are colored based on their prediction (for regression) or probability score (for classification).
    • A similar plot is created using true values instead of predictions.
  2. Error-Based Shapley Projection Plot

    • A variation of the same plot but colored based on prediction error, enabling users to identify areas where the model struggles the most.

In our mind it would be something like that:

Image

Expected Benefits

  • Identify clusters of predictions and analyze if misclassified instances share similar characteristics.
  • Select cases with diverse feature importance by interacting with different areas of the projection space.
  • Compare prediction distributions with true values to highlight zones of high uncertainty.
  • Enhance model interpretability by visualizing why predictions are correct or incorrect.

This feature would greatly improve how users analyze and understand their model’s behavior, providing a more detailed, interpretable, and interactive approach to prediction analysis in Shapash.

Would love to hear feedback and suggestions on how to improve this proposal! 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions