Welcome to the Airflow Tutorial Project! This repository contains various examples of Apache Airflow DAGs demonstrating different features and functionalities. Each DAG is designed to help you understand and implement specific use cases in Airflow.
This project contains a series of Directed Acyclic Graphs (DAGs) created with Apache Airflow to illustrate various concepts and features. Whether you are new to Airflow or looking to expand your knowledge, these examples will provide valuable insights into building and managing workflows.
Before you start, ensure you have the following installed:
- Python 3.6+
- Apache Airflow 2.x
- Docker (optional, for certain examples)
-
Clone the repository:
git clone https://github.com/stbbash/airflow_tutorial.git cd airflow_tutorial
-
Set up a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use
venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Initialize the Airflow database:
airflow db init
-
Start the Airflow web server and scheduler:
airflow webserver --port 8080 airflow scheduler
airflow_tutorial/ ├── create_dag_with_python_operator.py ├── dag_with_catchup_backfill.py ├── dag_with_cron_expression.py ├── dag_with_minio_s3.py ├── dag_with_postgres_hook.py ├── dag_with_postgres_operator.py ├── dag_with_python_dependencies.py ├── dag_with_taskflow_api.py ├── our_first_dag.py └── README.md
This DAG demonstrates how to create a simple DAG using the PythonOperator. The task runs a Python function that prints a message.
This DAG illustrates how to configure a DAG to handle catchup and backfill. It ensures that all missed intervals are processed when the DAG is first deployed.
This DAG shows how to schedule tasks using cron expressions. It runs tasks at specific intervals defined by a cron schedule.
This DAG integrates with MinIO, an S3-compatible object storage. It demonstrates how to upload and download files using Airflow.
This DAG uses the PostgresHook to interact with a PostgreSQL database. It runs SQL queries and retrieves data within Airflow tasks.
This DAG demonstrates the use of the PostgresOperator to execute SQL commands in a PostgreSQL database.
This DAG shows how to handle Python dependencies within tasks. It demonstrates the use of Python virtual environments or Docker to manage dependencies.
This DAG uses the TaskFlow API introduced in Airflow 2.0. It provides a cleaner and more intuitive way to define tasks and manage dependencies.
This is a simple introductory DAG that serves as a starting point for those new to Airflow. It includes basic tasks and dependencies to get you started.
Contributions are welcome! If you have any suggestions or improvements, feel free to open an issue or submit a pull request.