Shapley-Based Visualization for Prediction Analysis

### Description  
Currently, Shapash allows us to visualize model predictions and errors using the **True values vs. Predicted values** plot. This is available as a **scatter plot** for regression problems and a **violin plot** (similar to a confusion matrix) for classification problems. This visualization provides an overall view of whether predictions are accurate or not and in what proportion. It also enables **picking specific cases** to further analyze them using the **local plot**.  

![Image](https://github.com/user-attachments/assets/c84cd0e4-a4d8-472d-bd12-02d38db85867)

![Image](https://github.com/user-attachments/assets/9b90bcff-d699-49c3-b842-57dafae486d4)

![Image](https://github.com/user-attachments/assets/0a9548dc-c9dd-4867-8d8a-34eba9d97314)


However, this approach focuses solely on whether a prediction is correct or incorrect based on the target variable. It does not consider **why** a prediction is well-estimated or misestimated. Two well-predicted individuals may be correctly classified for entirely different reasons.  

### Feature Proposal  
We propose a **new visualization** based on **Shapley values** to better understand why certain predictions are accurate or not. The idea is to project **Shapley values** of each instance into a **2D space**, allowing for a **more interpretable visualization** of how different factors influence predictions.  

This visualization would include:  
1. **Shapley Projection Plot**  
   - Each instance is projected into a 2D space (UMAP) based on its Shapley values.  
   - Points are colored based on their **prediction** (for regression) or **probability score** (for classification).  
   - A similar plot is created using **true values** instead of predictions.  

2. **Error-Based Shapley Projection Plot**  
   - A variation of the same plot but colored based on **prediction error**, enabling users to identify areas where the model struggles the most.  

In our mind it would be something like that:

<img width="533" alt="Image" src="https://github.com/user-attachments/assets/306a06c1-c2b9-4d9a-8c99-b984c1815b80" />

### Expected Benefits  
- **Identify clusters of predictions** and analyze if misclassified instances share similar characteristics.  
- **Select cases with diverse feature importance** by interacting with different areas of the projection space.  
- **Compare prediction distributions** with true values to highlight zones of high uncertainty.  
- **Enhance model interpretability** by visualizing why predictions are correct or incorrect.  

This feature would greatly improve how users analyze and understand their model’s behavior, providing a **more detailed, interpretable, and interactive approach** to prediction analysis in Shapash.  

Would love to hear feedback and suggestions on how to improve this proposal! 🚀 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shapley-Based Visualization for Prediction Analysis #623

Description

Feature Proposal

Expected Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shapley-Based Visualization for Prediction Analysis #623

Description

Description

Feature Proposal

Expected Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions