This project implements a data pipeline using Apache Airflow to gather information about items published on the MercadoLibre ecommerce site, store it in a database, and send email alerts based on specific criteria.
This project implements an automated data pipeline that:
- Fetches product data from MercadoLibre's API
- Stores the data in a PostgreSQL database
- Sends email alerts for high-value inventory items
- Uses Apache Airflow for orchestration
- Docker and Docker Compose
- Python 3.8+
- MercadoLibre API credentials
- Gmail account with App Password for email notifications
Mercado-Libre-Data-Pipeline-Challenge/
├── dags/
│ └── postgres.py # Main Airflow DAG definition
├── plugins/
│ ├── operators/
│ │ └── PostgresFileOperator.py # Custom Airflow operator
│ └── tmp/
│ ├── api_fetch.py # MercadoLibre API interaction script
│ └── file.tsv # Temporary data storage
├── docker-compose.yaml # Docker services configuration
├── .env # Environment variables
└── README.md
- Create a
.env
file with the following variables:
BEARER_TOKEN=your_mercadolibre_token
EMAIL_PASSWORD=your_gmail_app_password
- Set up Airflow variables in the UI:
project_path
:/opt/airflow
BEARER_TOKEN
: Your MercadoLibre API tokenEMAIL_PASSWORD
: Your Gmail app password
The project uses PostgreSQL for data storage. Tables are created automatically by the pipeline.
Schema:
create table if not exists mercado_libre_data (
id varchar(100),
site_id varchar(100),
title varchar(100),
price varchar(100),
available_quantity varchar(100),
thumbnail varchar(100),
created_date varchar(100),
primary key(id, created_date)
);
- Start the Docker containers:
docker-compose up -d
- Access Airflow UI:
http://localhost:8080
Default credentials:
- Username: airflow
- Password: airflow
- Enable the "postgres" DAG in the Airflow UI
The pipeline consists of four main tasks:
create_table
: Creates the PostgreSQL table if it doesn't existconsulting_API
: Fetches data from MercadoLibre APIinsert_data
: Loads the fetched data into PostgreSQLreading_data
: Queries high-value items and sends email alerts
Task Dependencies:
create_table >> consulting_API >> insert_data >> reading_data
The pipeline sends email alerts for items with:
- Available quantity > 0
- Total value (price × quantity) > 7,000,000
Email Configuration:
- Sender: Gmail account with App Password
- Protocol: SMTP over SSL (port 465)
- Content: List of matching items with details
While the core functionality of the data pipeline has been implemented, there are certain limitations and areas for future improvements:
-
Bonus Features: The bonus features, such as deployability, unit/E2E testing, additional metadata, data lineage information, and automation, have not been implemented in this version of the project.
-
Code Quality: The project code can be further improved for readability, maintainability, and adherence to best coding practices. Consider refactoring and optimizing the code as needed.
-
Enhancements: Explore opportunities for enhancing the functionality of the data pipeline, adding more features, or integrating with additional services.
-
User Documentation: Provide comprehensive user documentation for setting up and configuring the pipeline, making it more accessible to a wider audience.