Automated Regression Workflow for Interpretable Deflection Prediction in Bio-Inspired Laminated Composite Plates

This repository contains the code and data associated with the research paper:

"Automated Regression Workflow for Interpretable Deflection Prediction in Bio-Inspired Laminated Composite Plates" Shakti P. Padhy, Shubham Saurabh, Raj Kiran, Nhon Nguyen-Thanh

This paper is accepted for publication and currently in the process of publishing online in Frontiers of Structural and Civil Engineering.

Cite us

If you used the database or the codes for your research, please consider citing our GitHub repository:

GitHub repository citation

@software{padhy_2025_15927988,
  author       = {Padhy, Shakti Prasad and
                  Saurabh, Shubham and
                  Choudhary, Krishana and
                  Kiran, Raj and
                  Nguyen-Thanh, Nhon},
  title        = {{Data and codes for Automated Regression Workflow
                   for Interpretable Deflection Prediction in Bio-
                   Inspired Laminated Composite Plates: Data and
                   codes release v1.0.0
                  }},
  month        = jul,
  year         = 2025,
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.15927988},
  url          = {[https://doi.org/10.5281/zenodo.15927988](https://doi.org/10.5281/zenodo.15927988)},
}

Keywords

Automated Regression Workflow; Interpretable Machine Learning; Bio-inspired laminated composites; Finite element analysis

Project Overview

This project focuses on developing an Automated Regression Workflow (ARW) in Python to streamline the process of building, optimizing, and interpreting machine learning models for predicting the deflection of bio-inspired laminated composite plates. The workflow automates hyperparameter optimization, model training, and performance evaluation, making it highly efficient and reproducible.

Methodology

The core of this project is the Automated Regression Workflow (ARW), developed using Python. This workflow automates approximately 300 lines of detailed Python code for complete development, hyperparameter optimization, and evaluation of ML regression models into a concise 10-20 lines of user-defined code.

ARW Workflow in Brief:

The Automated Regression Workflow (ARW) is implemented through the mlregworkflow.py script. Here's a brief overview of its key steps and how it's applied in this work:

Data Preprocessing:
- The raw dataset (Bioinspired composite_loading-25.csv) is loaded using pandas.
- Categorical features (Loading Type, Constraint, Configuration) are identified and converted into numerical format using one-hot encoding (pd.get_dummies).
- Numerical features (Number of Layer, E1/E2, a/h) are separated from the target variable (Deflection).
- All features (X) and the target (y) are then scaled using StandardScaler to normalize their ranges.
Data Splitting:
- The preprocessed data is split into training and testing sets using train_test_split (80% for training, 20% for testing) to ensure unbiased model evaluation.
Model Definition:
- Six different machine learning regression models are defined: Linear Regression (Linear), Support Vector Regression (SVR), Gradient Boosting Regression (GBR), Random Forest Regression (RFR), eXtreme Gradient Boosting Regression (XGBR), and Neural Network Regression (NNR).
- For each model (except Linear Regression), a hyperparameter search space is defined using skopt.space.Real, skopt.space.Integer, and skopt.space.Categorical. This includes parameters like C, epsilon, kernel for SVR; n_estimators, loss, learning_rate, alpha, max_depth for GBR; and hidden_layers, units, activation, learning_rate for Neural Networks.
Model Optimization and Evaluation:
- The optimize_model function (from mlregworkflow.py) is the central component for this step.
- It uses Bayesian Optimization (BO) via skopt.gp_minimize to systematically search for the optimal hyperparameters for each model, aiming to maximize the $R^2$ score.
- A robust 5-fold cross-validation (KFold) is integrated into the optimization process to ensure reliable and generalized model performance. For the Neural Network, a manual cross-validation loop (cross_val_nn) is implemented due to its specific training requirements.
- After optimization, each model is trained on the full training set with its best parameters and evaluated on both the training and unseen testing data.
- Performance metrics ($R^2$, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE)) are computed for both sets.
Automated Visualizations and Results Output:
- For each model, diagnostic plots (predicted vs. true values) are automatically generated and saved to a plots directory.
- A results.csv file is generated, summarizing the performance of all models, including their optimized hyperparameters and evaluation metrics.

Key Features of ARW:

Bayesian Optimization (BO): Utilizes Scikit-Optimize for systematic identification of optimal hyperparameters, maximizing the coefficient of determination ($R^2$).
Cross-Validation: Integrates a robust 5-fold cross-validation process for reliable and generalized model performance.
Train-Test Split: Employs an 80-20 train-test data split for unbiased assessment of predictive accuracy.
Automated Visualizations: Generates diagnostic plots (predicted vs. true values) for training and testing datasets, stored in a plots directory.
Performance Metrics Output: Outputs a results.csv file containing model names, optimized hyperparameters, and performance metrics ($R^2$, MAE, RMSE).

Predictive Regression Models

Six different regression models were employed to predict deflection values:

Linear Regression (Linear)
Support Vector Regression (SVR)
Gradient Boosting Regression (GBR)
Random Forest Regression (RFR)
eXtreme Gradient Boosting Regression (XGBR)
Neural Network Regression (NNR)

Hyperparameter optimization was performed for all models (except Linear Regression) using the ARW.

Interpretability Analysis

To understand the influence of individual features on deflection predictions, SHAP (SHapley Additive exPlanations) via the SHAP library and LIME (Local Interpretable Model-Agnostic Explanations) via the lime library were performed on the best-performing model.

Data

The study utilizes a comprehensive dataset of 672 datapoints generated through finite element simulations. The input features include:

Layup Configurations: Helicoidal Recursive (HR1, HR2, HR3), Helicoidal Exponential (HE1, HE2, HE3), Helicoidal Semicircular (HS1, HS2, HS3), Linear Helicoidal (LH1, LH2, LH3), Fibonacci Helicoidal (FH), and Quasi-Isotropic (QI).
Elasticity Ratio: $E_1/E_2$ (10 and 40)
Loading Types: Uniformly Distributed Load (UDL) and Sinusoidal Distributed Load (SSL)
Boundary Conditions: Simply Supported (SSSS) or Clamped (CCCC)
Length-to-Thickness Ratio: a/h (10 and 100)
Number of Layers: 12, 16, and 20

Categorical variables (configurations, loading types, boundary conditions) were one-hot encoded.

Results

Model Performance: All models (except Linear Regression) achieved high fidelity ($R^2\approx0.996–0.998$; $MAE\approx0.010–0.015$; $RMSE\approx0.014–0.021$).
Superior Model: The eXtreme Gradient Boosting Regression (XGBR) model demonstrated superior performance, achieving the highest $R^2$ (0.999) and lowest MAE (0.010) and RMSE (0.013) on the test dataset.
Key Influential Factors (SHAP & LIME):
- Boundary Conditions (Constraint_CCCC): Most influential factor. Clamped boundary conditions significantly reduce predicted deflection.
- Ratio of Elastic Moduli ($E_1/E_2$): Second most critical factor. Larger ratios (stiffer material distribution) lead to reduced deflection.
- Aspect Ratio (a/h) and Loading Type (Loading Type_SSL): Showed moderate influence.
- Configuration Encodings and Number of Layers: Exhibited minimal global impact on predictions.

These findings provide crucial insights for optimizing the design parameters of bio-inspired composite structures.

Installation and Usage

To use the Automated Regression Workflow, follow these steps:

Clone the repository:

git clone https://github.com/Shakti-95/ARW-Interpretable-ML-Composite-Deflection.git
cd ARW-Interpretable-ML-Composite-Deflection

Install dependencies:
It is recommended to use a virtual environment.

python -m venv venv  
source venv/bin/activate # On Windows, use \`venv\\Scripts\\activate\`  
pip install -r requirements.txt

Prepare your data:
Ensure your dataset is in a suitable format (e.g., CSV). The provided data should be used as input.

Run the workflow:
Place the mlregworkflow.py script (which contains the run_workflow function) in your working directory.

# Example usage in your main script (e.g., `main.py`)
from mlregworkflow import run_workflow
import pandas as pd
from sklearn.svm import SVR
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from xgboost import XGBRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import LinearRegression

# Load your dataset
# Replace 'your_data.csv' with the actual path to your dataset
data = pd.read_csv('your_data.csv')

# Define your features (X) and target (y)
# Adjust column names as per your dataset
X = data.drop('deflection_output_column', axis=1)
y = data['deflection_output_column']

# Define the models and their hyperparameter search spaces
models_to_run = {
    "SVR": {
        "model": SVR(),
        "params": {
            "kernel": ["linear", "rbf"],
            "epsilon": [0.01, 1.0],
            "C": [0.1, 10.0]
        }
    },
    "GBR": {
        "model": GradientBoostingRegressor(),
        "params": {
            "n_estimators": [3, 10],
            "loss": ["squared_error", "absolute_error", "huber", "quantile"],
            "learning_rate": [0.005, 0.9],
            "alpha": [0.005, 0.9, prior="log-uniform"],
            "max_depth": [1, 7]
        }
    },
    "RFR": {
        "model": RandomForestRegressor(),
        "params": {
            "n_estimators": [3, 10],
            "max_depth": [3, 10]
        }
    },
    "XGBR": {
        "model": XGBRegressor(use_label_encoder=False, eval_metric='rmse'), # Add eval_metric for newer XGBoost versions
        "params": {
            "eta": [0.005, 1],
            "n_estimators": [1, 5], # Changed to match paper's search space
            "max_depth": [1, 7],     # Changed to match paper's search space
            "subsample": [0.1, 0.9],
            "colsample_bytree": [0.005, 0.9]
        }
    },
    "NeuralNetwork": {
        "model": MLPRegressor(max_iter=100, batch_size=32), # Fixed epochs and batch size as per paper
        "params": {
            "hidden_layer_sizes": [(32,), (64,), (128,), (32, 32), (64, 64), (128, 128)], # Example for hidden_layers and units
            "activation": ["relu", "tanh"],
            "learning_rate_init": [1e-6, 1e-2] # Corresponds to learning_rate in paper
        }
    },
    "Linear": { # Linear Regression does not require hyperparameter tuning
        "model": LinearRegression(),
        "params": {}
    }
}

# Run the automated workflow
run_workflow(X, y, models_to_run)

Repository Structure (Proposed)

.  
├── data/  
│ └── Bioinspired composite_loading-25.csv # The finite element generated dataset  
├── plots/ # Automatically generated plots by ARW  
├── models/ # To store trained neural network models (.h5 files)  
├── mlregworkflow.py # The Automated Regression Workflow script  
├── Bioinspired ML.ipynb # Jupyter Notebook with overall project code and analysis  
├── results.csv # Automatically generated results by ARW  
├── requirements.txt # Python dependencies  
└── README.md # This file  
└── LICENSE # MIT License file

Contributing

We welcome contributions to this project. Please feel free to fork the repository, make changes, and submit pull requests.

License

MIT License

Contact

For any questions or inquiries, please contact:

Dr. Shakti Prasad Padhy: padhy.shaktiprasad@gmail.com, shaktippadhy@tamu.edu
Dr. Raj Kiran: raj@iitmandi.ac.in
Dr. Nhon Nguyen-Thanh: nguyenthanhnhon@tdtu.edu.vn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Regression Workflow for Interpretable Deflection Prediction in Bio-Inspired Laminated Composite Plates

Cite us

GitHub repository citation

Keywords

Project Overview

Methodology

ARW Workflow in Brief:

Key Features of ARW:

Predictive Regression Models

Interpretability Analysis

Data

Results

Installation and Usage

Repository Structure (Proposed)

Contributing

License

Contact

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
models		models
plots		plots
Bioinspired ML.ipynb		Bioinspired ML.ipynb
LICENSE		LICENSE
README.md		README.md
mlregworkflow.py		mlregworkflow.py
requirements.txt		requirements.txt
results.csv		results.csv

License

Shakti-95/ARW-Interpretable-ML-Composite-Deflection

Folders and files

Latest commit

History

Repository files navigation

Automated Regression Workflow for Interpretable Deflection Prediction in Bio-Inspired Laminated Composite Plates

Cite us

GitHub repository citation

Keywords

Project Overview

Methodology

ARW Workflow in Brief:

Key Features of ARW:

Predictive Regression Models

Interpretability Analysis

Data

Results

Installation and Usage

Repository Structure (Proposed)

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages