Skip to content

Shapley-Based Correlation Matrix #624

@guillaume-vignal

Description

@guillaume-vignal

Description

Currently, Shapash provides a correlation plot that displays relationships between variables in the training dataset. This intelligent plot highlights only highly correlated variables, grouping them into specific zones for better readability. The visualization is a two-dimensional matrix, where each cell represents the correlation between two variables using color intensity.

Image

Feature Proposal

We propose an enhanced correlation matrix based on Shapley values instead of raw feature values. This would allow us to analyze correlations in terms of feature importance rather than just feature values.

Key Enhancements

  1. Shapley Value Correlation Matrix

    • Instead of computing correlations on feature values, correlations will be computed on their Shapley values.
    • This allows us to capture relationships based on their impact on the model’s predictions, rather than their raw statistical correlation.
  2. Shapley-Weighted Correlations

    • The correlation computation should be weighted by the absolute values of Shapley attributions.
    • If two features have a 99% identical distribution, but their Shapley values are mostly zero, their correlation is irrelevant.
    • Only the remaining 1% where Shapley values are significant should contribute to the correlation score.
  3. Consistent Aesthetics & UX

    • The visualization should maintain the same look and feel as the existing correlation plot.
    • Color mapping should be adjusted to reflect the new correlation computation method.
    • The user should be able to interact with the visualization in the same way as the original plot.

Expected Benefits

  • Helps understand which features influence predictions similarly, rather than just being statistically correlated.
  • Avoids misleading correlations based on raw feature values by focusing on impact correlations.
  • Provides better insights into feature interactions in the context of model interpretability.

This feature would enhance Shapash’s explainability tools by allowing users to visualize correlations in a way that aligns more closely with model decision-making, rather than just dataset structure.

Looking forward to feedback and suggestions! 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions