Solar Panel Efficiency Prediction Model

Overview

This project is a machine learning pipeline designed to predict the efficiency of solar panels based on sensor and operational data. The goal is to help optimize solar panel performance, anticipate maintenance needs, and improve energy yield by providing accurate efficiency predictions.

Why was this project developed?

Renewable energy optimization: Solar panel efficiency can be affected by many factors (weather, age, soiling, etc.). Predicting efficiency helps operators maximize output and plan maintenance.
Data-driven insights: By leveraging advanced machine learning, we can uncover hidden patterns and relationships in solar panel data.
Automation: The pipeline automates feature selection, hyperparameter tuning, and model ensembling for robust, production-ready predictions.

What does the project do?

Loads and preprocesses solar panel data (from CSV files)
Selects the most important features using LightGBM
Tunes model hyperparameters using Optuna (Bayesian optimization)
Trains a stacking ensemble of multiple models (LightGBM, XGBoost, CatBoost, RandomForest)
Evaluates model performance with metrics like RMSE, MAE, R², etc.
Saves the trained model and selected features for reproducible predictions
Generates predictions for new, unseen test data

Tech Stack

1. Python

The main programming language for data science and machine learning.

2. Pandas & NumPy

For data manipulation, cleaning, and numerical operations.

3. scikit-learn

For model selection, feature selection, stacking ensemble, and evaluation metrics.
Key features used:
- SelectFromModel: Feature selection based on model importance.
- StackingRegressor: Combines multiple models for improved performance.
- train_test_split, KFold: For splitting data and cross-validation.

4. LightGBM

Fast, efficient gradient boosting framework.
Key hyperparameters:
- learning_rate: Controls how much the model learns in each iteration.
- num_leaves: Number of leaves in one tree (controls complexity).
- max_depth: Maximum tree depth.
- n_estimators: Number of boosting rounds (trees).
- min_child_samples: Minimum samples in a leaf.
- subsample, colsample_bytree: Row/column sampling for regularization.
- reg_alpha, reg_lambda: L1/L2 regularization.
- categorical_feature: Native support for categorical columns.
- early_stopping: Stops training when validation score doesn't improve.

5. XGBoost

Another high-performance gradient boosting library.
Used as a base learner in the stacking ensemble.

6. CatBoost

Gradient boosting with excellent categorical feature support.
Used as a base learner in the stacking ensemble.

7. Optuna

Automated hyperparameter optimization framework.
Features:
- Bayesian optimization for efficient search.
- Parallel/distributed search support.
- Easy integration with scikit-learn and LightGBM.

8. Joblib

For saving and loading models and feature lists.

9. Matplotlib & Seaborn

For data visualization and exploratory data analysis (EDA).

Project Structure

dataset/
    Clean_X_Train.csv
    Clean_Test_Data.csv
src/
    modelling/
        model_training.py
        model_evaluation.py
    utils/
        visualization.py
    reports/
        evaluation_report.csv
        final_submission.csv
    models/
        ensemble_model.pkl
        selected_features.pkl
main.py
predict.py
requirements.txt
README.md

How to Run

Install dependencies:
```
pip install -r requirements.txt
```
Prepare your data:
- Place cleaned train/test CSVs in the dataset/ folder.
Train and evaluate the model:
```
python main.py
```
Generate predictions for new data:
```
python predict.py
```

Customization

Feature engineering: Add new features in your preprocessing scripts for better performance.
Model tuning: Adjust Optuna search space or stacking ensemble in model_training.py.
Evaluation: Extend model_evaluation.py for more metrics or plots.

Authors

Team Elytra -Gopal Dutta -Chaitany Agrawal

License

This project is for educational and research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
catboost_info		catboost_info
dataset		dataset
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Solar Panel Efficiency Prediction Model

Overview

Why was this project developed?

What does the project do?

Tech Stack

1. Python

2. Pandas & NumPy

3. scikit-learn

4. LightGBM

5. XGBoost

6. CatBoost

7. Optuna

8. Joblib

9. Matplotlib & Seaborn

Project Structure

How to Run

Customization

Authors

License

About

Uh oh!

Languages

Gopal-dutta/Solar_eff_model

Folders and files

Latest commit

History

Repository files navigation

Solar Panel Efficiency Prediction Model

Overview

Why was this project developed?

What does the project do?

Tech Stack

1. Python

2. Pandas & NumPy

3. scikit-learn

4. LightGBM

5. XGBoost

6. CatBoost

7. Optuna

8. Joblib

9. Matplotlib & Seaborn

Project Structure

How to Run

Customization

Authors

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages