A powerful, interactive dashboard for performing and visualizing regression analysis on various datasets. Built with Dash and Python, this application provides an intuitive interface for exploring relationships between variables, conducting statistical analysis, and visualizing results.
- Interactive Dataset Selection: Choose from multiple pre-configured datasets
- Dynamic Predictor Selection: Select which variables to include in your regression model
- Comprehensive Statistical Analysis:
- Full regression model summary with coefficients and statistics
- Type I (Sequential) ANOVA analysis
- Correlation matrix visualization
- Visual Diagnostics:
- Residual plots for model validation
- Interactive correlation heatmap
- Interactive Data Table:
- Sort and filter capabilities
- Multi-row selection
- Pagination
- Clone this repository:
git clone <repository-url>
cd <repository-directory>
- Install required dependencies:
pip install -r requirements.txt
- Start the dashboard:
python dash_mls.py
- Open your web browser and navigate to:
http://localhost:8050
- Using the dashboard:
- Select a dataset from the dropdown menu
- Choose predictors using the checklist
- Examine the regression results in the left panel
- Explore the visualizations in the right panel
- Use the interactive data table at the bottom for detailed data exploration
The application uses a config.yaml
file to manage datasets and their properties. Each dataset should be configured with:
datasets:
dataset_name:
file: "path/to/data.csv"
description: "Dataset Description"
dependent_var: "target_variable"
label_var: "label_column"
predictors:
- name: "predictor1"
label: "Predictor 1 Label"
- name: "predictor2"
label: "Predictor 2 Label"
- Binary variables are automatically detected and converted to 0/1 encoding
- Binary predictors are marked with their reference category (e.g., "Sex (M=1)")
- Displays comprehensive model statistics
- Shows coefficient estimates, standard errors, and p-values
- Includes R-squared, adjusted R-squared, and F-statistic
- Shows sequential contribution of each predictor
- Includes:
- Sum of Squares
- Degrees of Freedom
- Mean Square
- F-value
- p-value
- Visualizes model residuals vs. fitted values
- Helps identify potential issues with model assumptions
- Interactive hover information for each point
- Heatmap showing correlations between all variables
- Interactive visualization with exact correlation values
- Uses red-blue color scale for positive/negative correlations
dash_mls.py
: Main application entry pointconfig.yaml
: Dataset and configuration managementutils/
: Helper functions and utilitiesdata_processing.py
: Data preprocessing and transformationstatistics.py
: Statistical analysis functionsvisualization.py
: Plot generation utilities
-
Data Loading Layer
- YAML configuration parsing
- Dataset loading and validation
- Automatic type detection and conversion
-
Processing Layer
- Data preprocessing and cleaning
- Feature engineering
- Statistical computations
-
Visualization Layer
- Interactive Plotly components
- Dynamic layout generation
- Real-time updates
-
UI Layer
- Dash components and callbacks
- Layout management
- User interaction handling
- Handles dataset loading and preprocessing
- Manages data transformations
- Provides data access to other components
- Performs statistical analysis
- Generates model summaries
- Computes ANOVA tables
- Creates and updates plots
- Manages plot layouts
- Handles interactive features
The application uses a hierarchical callback structure:
-
Dataset Selection Callbacks
- Update available predictors
- Reset analysis state
- Load new data
-
Analysis Callbacks
- Update regression results
- Generate visualizations
- Compute statistics
-
UI Interaction Callbacks
- Handle user selections
- Update table views
- Manage plot interactions
- Uses Dash's built-in state management
- Maintains session-specific data
- Handles data persistence between callbacks
- Implements lazy loading for large datasets
- Uses caching for expensive computations
- Optimizes plot rendering for large datasets
- Filtering: Filter rows based on any column value
- Sorting: Sort by multiple columns
- Selection: Select multiple rows for detailed examination
- Pagination: Navigate through large datasets easily
- Python 3.7+
- Dash
- Pandas
- NumPy
- Statsmodels
- Plotly
- PyYAML
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
A powerful dashboard for performing causal discovery analysis using various algorithms. Built with Dash and Python, this application provides an intuitive interface for exploring causal relationships between variables using state-of-the-art causal discovery algorithms.
- Multiple Causal Discovery Algorithms:
- PC Algorithm: Constraint-based learning for causal structure
- FCI Algorithm: Handles unobserved confounders
- GES Algorithm: Score-based Bayesian network learning
- GRaSP Algorithm: Permutation-based sparse structure learning
- Interactive Variable Selection: Choose which variables to include in the analysis
- Algorithm Information Panel: Detailed explanation of each algorithm's:
- Assumptions
- Strengths
- Limitations
- Parameters
- Visual Causal Graphs: Clear visualization of discovered causal relationships
- Interactive Data Table: Explore the underlying data
- Start the dashboard:
python dash_cd.py
- Open your web browser and navigate to:
http://localhost:8052
- Using the dashboard:
- Select a dataset from the dropdown menu
- Choose an algorithm (GES is selected by default)
- Select variables for analysis
- Adjust the significance level (α) if needed
- Examine the causal graph and algorithm information
- Use the data table for detailed data exploration
- Constraint-based approach using conditional independence tests
- Fast and efficient for high-dimensional data
- Assumes no unobserved confounders
- Uses significance level (α) for independence testing
- Extension of PC that handles unobserved confounders
- Produces partial ancestral graphs
- More robust but computationally intensive
- Uses significance level (α) for independence testing
- Score-based approach using Bayesian network learning
- Can handle non-linear relationships
- Assumes multivariate normal distribution
- No significance level parameter needed
- Permutation-based approach for sparse structure learning
- Robust to assumption violations
- Can handle non-linear relationships
- No significance level parameter needed
The application uses the same config.yaml
file as the regression dashboard. Each dataset should be configured with:
datasets:
dataset_name:
file: "path/to/data.csv"
description: "Dataset Description"
dependent_var: "target_variable"
predictors:
- name: "predictor1"
label: "Predictor 1 Label"
- name: "predictor2"
label: "Predictor 2 Label"
- Automatically handles missing values
- Converts binary variables to 0/1 encoding
- Ensures numeric data for analysis
- Handles dataset loading and preprocessing
- Manages variable type detection and conversion
- Ensures data compatibility with selected algorithm
- Implements multiple causal discovery algorithms
- Handles graph visualization
- Manages algorithm-specific parameters
- Coordinates all components
- Manages user interactions
- Provides algorithm information
- Handles layout and styling
-
Dataset Selection
- Updates available variables
- Resets analysis state
- Loads new data
-
Variable Selection
- Updates causal graph
- Updates algorithm information
- Maintains state consistency
-
Algorithm Selection
- Updates visualization method
- Updates parameter controls
- Updates algorithm information
The project includes an interactive dashboard (dash_viz.py
) that provides visual analysis of the datasets. The dashboard features:
- Dataset selection dropdown to switch between different datasets
- Variable selection checklist to choose which variables to visualize
- Interactive scatterplot matrix showing relationships between selected variables
- Data preview table showing the raw data
- Hover information including labels (e.g., state names, car names) when available
To run the dashboard:
python dash_viz.py
The dashboard will be available at http://localhost:8052
. You can:
- Select a dataset from the dropdown menu
- Choose variables to visualize from the checklist
- Interact with the scatterplot matrix by hovering over points
- View the raw data in the preview table below
- Scatterplot Matrix: Shows pairwise relationships between selected variables
- Histograms: Displayed on the diagonal of the scatterplot matrix
- Interactive Hover: Shows detailed information including labels when available
- Responsive Layout: Adapts to different screen sizes
- Data Preview: Allows quick inspection of the raw data