Skip to content

☀️This project is a machine learning pipeline designed to predict the efficiency of solar panels based on sensor and operational data. The goal is to help optimize solar panel performance, anticipate maintenance needs, and improve energy yield by providing accurate efficiency predictions.

Notifications You must be signed in to change notification settings

Gopal-dutta/Solar_eff_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solar Panel Efficiency Prediction Model

Overview

This project is a machine learning pipeline designed to predict the efficiency of solar panels based on sensor and operational data. The goal is to help optimize solar panel performance, anticipate maintenance needs, and improve energy yield by providing accurate efficiency predictions.


Why was this project developed?

  • Renewable energy optimization: Solar panel efficiency can be affected by many factors (weather, age, soiling, etc.). Predicting efficiency helps operators maximize output and plan maintenance.
  • Data-driven insights: By leveraging advanced machine learning, we can uncover hidden patterns and relationships in solar panel data.
  • Automation: The pipeline automates feature selection, hyperparameter tuning, and model ensembling for robust, production-ready predictions.

What does the project do?

  • Loads and preprocesses solar panel data (from CSV files)
  • Selects the most important features using LightGBM
  • Tunes model hyperparameters using Optuna (Bayesian optimization)
  • Trains a stacking ensemble of multiple models (LightGBM, XGBoost, CatBoost, RandomForest)
  • Evaluates model performance with metrics like RMSE, MAE, R², etc.
  • Saves the trained model and selected features for reproducible predictions
  • Generates predictions for new, unseen test data

Tech Stack

1. Python

  • The main programming language for data science and machine learning.

2. Pandas & NumPy

  • For data manipulation, cleaning, and numerical operations.

3. scikit-learn

  • For model selection, feature selection, stacking ensemble, and evaluation metrics.
  • Key features used:
    • SelectFromModel: Feature selection based on model importance.
    • StackingRegressor: Combines multiple models for improved performance.
    • train_test_split, KFold: For splitting data and cross-validation.

4. LightGBM

  • Fast, efficient gradient boosting framework.
  • Key hyperparameters:
    • learning_rate: Controls how much the model learns in each iteration.
    • num_leaves: Number of leaves in one tree (controls complexity).
    • max_depth: Maximum tree depth.
    • n_estimators: Number of boosting rounds (trees).
    • min_child_samples: Minimum samples in a leaf.
    • subsample, colsample_bytree: Row/column sampling for regularization.
    • reg_alpha, reg_lambda: L1/L2 regularization.
    • categorical_feature: Native support for categorical columns.
    • early_stopping: Stops training when validation score doesn't improve.

5. XGBoost

  • Another high-performance gradient boosting library.
  • Used as a base learner in the stacking ensemble.

6. CatBoost

  • Gradient boosting with excellent categorical feature support.
  • Used as a base learner in the stacking ensemble.

7. Optuna

  • Automated hyperparameter optimization framework.
  • Features:
    • Bayesian optimization for efficient search.
    • Parallel/distributed search support.
    • Easy integration with scikit-learn and LightGBM.

8. Joblib

  • For saving and loading models and feature lists.

9. Matplotlib & Seaborn

  • For data visualization and exploratory data analysis (EDA).

Project Structure

dataset/
    Clean_X_Train.csv
    Clean_Test_Data.csv
src/
    modelling/
        model_training.py
        model_evaluation.py
    utils/
        visualization.py
    reports/
        evaluation_report.csv
        final_submission.csv
    models/
        ensemble_model.pkl
        selected_features.pkl
main.py
predict.py
requirements.txt
README.md

How to Run

  1. Install dependencies:

    pip install -r requirements.txt
  2. Prepare your data:

    • Place cleaned train/test CSVs in the dataset/ folder.
  3. Train and evaluate the model:

    python main.py
  4. Generate predictions for new data:

    python predict.py

Customization

  • Feature engineering: Add new features in your preprocessing scripts for better performance.
  • Model tuning: Adjust Optuna search space or stacking ensemble in model_training.py.
  • Evaluation: Extend model_evaluation.py for more metrics or plots.

Authors

  • Team Elytra -Gopal Dutta -Chaitany Agrawal

License

This project is for educational and research purposes.

About

☀️This project is a machine learning pipeline designed to predict the efficiency of solar panels based on sensor and operational data. The goal is to help optimize solar panel performance, anticipate maintenance needs, and improve energy yield by providing accurate efficiency predictions.

Topics

Resources

Stars

Watchers

Forks