This project is a fully Dockerized ETL (Extract, Transform, Load) pipeline built using Apache Airflow. It automates the daily collection of real-time weather data from the OpenWeather API, processes it into a structured format, and sends the output to an AWS S3 bucket.
This project demonstrates:
- Production Docker Setup with multi-service architecture
- Real API Integration with proper error handling
- Data Engineering Patterns including ETL, validation, and monitoring
- Custom Airflow Components (operators, hooks, sensors)
- Clean Code Practices with PEP 8 compliance and documentation
- Scalable Architecture ready for cloud deployment
Perfect for showcasing data engineering skills in interviews and portfolios.
- Download this folder to your local machine
- Install Docker Desktop on your computer
- Get OpenWeatherMap API key (free at https://openweathermap.org/api)
- Run the project with one command
- Docker Desktop installed
- OpenWeatherMap API key (free)
- 4GB RAM available for Docker
- Sign up at https://openweathermap.org/api
- Go to "API Keys" section
- Copy your API key
# Create .env file with your API key
echo "OPENWEATHER_API_KEY=your_api_key_here" > .env
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
- URL: http://localhost:8080
- Username: admin
- Password: admin
- Daily Weather Data Fetching from OpenWeatherMap
- Data Transformation with Pandas
- Custom Airflow Hooks and Operators
- S3 Upload (future: Azure SQL insert)
- Dockerized Deployment
- Retry Logic & Monitoring
- Data Validation with Logging
- Apache Airflow 2.10.4
- Python 3.11
- Docker / Docker Compose
- AWS S3 (current target)
- OpenWeather API
- Pandas
docker-local-setup/
βββ docker-compose.yml # Docker services configuration
βββ Dockerfile # Airflow image with dependencies
βββ requirements.txt # Python dependencies
βββ dags/ # Airflow DAGs
β βββ weather_etl_dag.py # Main weather pipeline
βββ plugins/ # Custom operators and hooks
β βββ custom_operators/ # Data quality operators
β βββ custom_hooks/ # Weather API hook
βββ utils/ # Data cleaning functions
βββ data/ # Sample data files and output
- Fetch weather for multiple cities
- Clean + transform with Pandas
- Save locally or upload to AWS S3
- (Future) Insert into Azure SQL
- Open http://localhost:8080
- Login with admin/admin
- Enable the "simple_weather_etl" DAG
- Click "Trigger DAG" to run manually
- Watch real weather data being processed!
- Grid View: See task status and dependencies
- Graph View: Visualize workflow structure
- Logs: Check detailed execution logs
- Data Lineage: Track data transformations
docker-compose down
docker-compose down -v
docker-compose up -d
- CSV filename:
weather_data_2025-08-01_07-00-00.csv
- S3 key:
daily_weather/weather_data_*.csv
- Secrets stored via
.env
- No keys hardcoded
- Safe for cloud deployment
These improvements are planned or under consideration:
-
Upload to Azure SQL Database instead of exporting to CSV:
- Store weather data directly into a cloud-hosted SQL database (Azure SQL or PostgreSQL on Azure).
- Replace or complement the S3 upload operator.
-
Cloud Hosting:
- Host the full Dockerized Airflow environment on a cloud platform such as Azure Web Apps, AWS ECS/Fargate, or Google Cloud Run.
- Use managed databases and blob storage for scalability.
-
CI/CD Integration:
- Add GitHub Actions for test and deployment automation.
-
Slack/Email Alerts:
- Notify on DAG failures or success.