This repository demonstrates a complete end-to-end MLOps pipeline for building a weather forecasting model. By leveraging tools like DVC, Airflow, and MLFlow, we’ve created an automated, scalable, and reproducible workflow for collecting data, training models, and monitoring performance.
- Project Overview
- Features
- Architecture
- Getting Started
- Usage
- MLFlow Integration
- Key Tools
- Results
- Contributing
- License
The primary goal of this project is to:
- Collect live weather data 🌐.
- Preprocess and clean the data 🧹.
- Train machine learning models to predict weather conditions 🤖.
- Automate workflows for scalability and reproducibility ⚙️.
- Track models and experiments for easy version control 📊.
This project adopts MLOps best practices to ensure a seamless integration of machine learning into production workflows.
- Automated Data Collection: Fetch real-time weather data using APIs.
- Data Versioning: Track datasets with DVC to ensure reproducibility.
- Workflow Automation: Manage and automate tasks with Airflow DAGs.
- Model Versioning and Experiment Tracking: Use MLFlow for logging experiments and comparing model performance.
- Scalability: Modular design for easy extension and deployment.
Here’s an overview of the pipeline architecture:
graph TD;
A[Fetch Weather Data 🌦️] --> B[Preprocess Data 🧹]
B --> C[Train ML Model 🤖]
C --> D[Evaluate and Monitor Results 📊]
Each stage is modular and automated using Airflow, with DVC and MLFlow ensuring version control and tracking.
- Python 3.8+
- Git
- DVC
- Apache Airflow
- MLFlow
-
Clone the repository:
git clone https://github.com/your-repo/weather-mlops.git cd weather-mlops
-
Install dependencies:
pip install -r requirements.txt
-
Initialize DVC:
dvc init
-
Configure Airflow:
airflow db init airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
Use the provided Python script to fetch live weather data:
python fetch_data.py --api_key YOUR_API_KEY --city "New York"
Activate the Airflow scheduler and webserver:
airflow scheduler
airflow webserver
Access the Airflow UI at http://localhost:8080
, enable the DAG, and watch the pipeline run! 🎡
Train models while logging parameters and metrics to MLFlow:
python train_model.py
Launch the MLFlow UI to view experiment results:
mlflow ui
Access it at http://localhost:5000
.
MLFlow is integrated into the pipeline for:
- Logging model parameters (e.g., learning rate, batch size).
- Tracking metrics (e.g., accuracy, RMSE).
- Managing model versions and artifacts.
- Comparing experiment results.
Here’s how the pipeline logs experiments to MLFlow:
-
Initialization: MLFlow is initialized with a remote tracking URI or a local directory.
import mlflow mlflow.set_tracking_uri("http://localhost:5000") mlflow.set_experiment("Weather Forecasting")
-
Logging Parameters and Metrics: During training, parameters (e.g., hyperparameters) and metrics (e.g., validation accuracy) are logged:
with mlflow.start_run(): mlflow.log_param("learning_rate", 0.001) mlflow.log_metric("rmse", 2.3)
-
Storing Artifacts: Model artifacts (e.g., trained model files) are saved and versioned:
mlflow.log_artifact("models/weather_model.pkl")
-
Model Registry: The best-performing model is promoted to production:
mlflow.register_model("models:/WeatherModel/production", "weather_model")
The MLFlow UI provides a comprehensive interface to:
- Compare experiments.
- Visualize metrics.
- Manage model versions.
To launch the UI:
mlflow ui
Access it at http://localhost:5000
.
- DVC: Tracks datasets and models, ensuring reproducibility.
- Airflow: Automates pipeline tasks.
- MLFlow: Logs and tracks experiments, parameters, and artifacts.
- Weather API: Provides live weather data.
- Scikit-learn: Used for model training.
The pipeline outputs include:
- Cleaned Datasets: Version-controlled and stored using DVC.
- Trained Models: Versioned with DVC for reproducibility.
- Experiment Logs: Detailed metrics, parameters, and artifacts tracked using MLFlow.
- Automated Pipeline: Tasks run seamlessly using Airflow.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch
- Commit your changes:
git commit -m "Add feature"
- Push to the branch:
git push origin feature-branch
- Submit a pull request.
This project is licensed under the MIT License. Feel free to use and modify it as you like. 🎉
If you found this project helpful, feel free to:
- 🌟 Star the repo!
Happy coding! 🚀