-
Notifications
You must be signed in to change notification settings - Fork 351
Description
Description
Currently, Shapash allows us to visualize model predictions and errors using the True values vs. Predicted values plot. This is available as a scatter plot for regression problems and a violin plot (similar to a confusion matrix) for classification problems. This visualization provides an overall view of whether predictions are accurate or not and in what proportion. It also enables picking specific cases to further analyze them using the local plot.
However, this approach focuses solely on whether a prediction is correct or incorrect based on the target variable. It does not consider why a prediction is well-estimated or misestimated. Two well-predicted individuals may be correctly classified for entirely different reasons.
Feature Proposal
We propose a new visualization based on Shapley values to better understand why certain predictions are accurate or not. The idea is to project Shapley values of each instance into a 2D space, allowing for a more interpretable visualization of how different factors influence predictions.
This visualization would include:
-
Shapley Projection Plot
- Each instance is projected into a 2D space (UMAP) based on its Shapley values.
- Points are colored based on their prediction (for regression) or probability score (for classification).
- A similar plot is created using true values instead of predictions.
-
Error-Based Shapley Projection Plot
- A variation of the same plot but colored based on prediction error, enabling users to identify areas where the model struggles the most.
In our mind it would be something like that:

Expected Benefits
- Identify clusters of predictions and analyze if misclassified instances share similar characteristics.
- Select cases with diverse feature importance by interacting with different areas of the projection space.
- Compare prediction distributions with true values to highlight zones of high uncertainty.
- Enhance model interpretability by visualizing why predictions are correct or incorrect.
This feature would greatly improve how users analyze and understand their model’s behavior, providing a more detailed, interpretable, and interactive approach to prediction analysis in Shapash.
Would love to hear feedback and suggestions on how to improve this proposal! 🚀