Skip to content

stbbash/airflow_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Airflow Tutorial Project

Welcome to the Airflow Tutorial Project! This repository contains various examples of Apache Airflow DAGs demonstrating different features and functionalities. Each DAG is designed to help you understand and implement specific use cases in Airflow.

Table of Contents

Introduction

This project contains a series of Directed Acyclic Graphs (DAGs) created with Apache Airflow to illustrate various concepts and features. Whether you are new to Airflow or looking to expand your knowledge, these examples will provide valuable insights into building and managing workflows.

Prerequisites

Before you start, ensure you have the following installed:

  • Python 3.6+
  • Apache Airflow 2.x
  • Docker (optional, for certain examples)

Installation

  1. Clone the repository:

    git clone https://github.com/stbbash/airflow_tutorial.git cd airflow_tutorial

  2. Set up a virtual environment (optional but recommended):

    python -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate

  3. Install the required packages:

    pip install -r requirements.txt

  4. Initialize the Airflow database:

    airflow db init

  5. Start the Airflow web server and scheduler:

    airflow webserver --port 8080 airflow scheduler

Project Structure

airflow_tutorial/ ├── create_dag_with_python_operator.py ├── dag_with_catchup_backfill.py ├── dag_with_cron_expression.py ├── dag_with_minio_s3.py ├── dag_with_postgres_hook.py ├── dag_with_postgres_operator.py ├── dag_with_python_dependencies.py ├── dag_with_taskflow_api.py ├── our_first_dag.py └── README.md

DAGs

create_dag_with_python_operator.py

This DAG demonstrates how to create a simple DAG using the PythonOperator. The task runs a Python function that prints a message.

dag_with_catchup_backfill.py

This DAG illustrates how to configure a DAG to handle catchup and backfill. It ensures that all missed intervals are processed when the DAG is first deployed.

dag_with_cron_expression.py

This DAG shows how to schedule tasks using cron expressions. It runs tasks at specific intervals defined by a cron schedule.

dag_with_minio_s3.py

This DAG integrates with MinIO, an S3-compatible object storage. It demonstrates how to upload and download files using Airflow.

dag_with_postgres_hook.py

This DAG uses the PostgresHook to interact with a PostgreSQL database. It runs SQL queries and retrieves data within Airflow tasks.

dag_with_postgres_operator.py

This DAG demonstrates the use of the PostgresOperator to execute SQL commands in a PostgreSQL database.

dag_with_python_dependencies.py

This DAG shows how to handle Python dependencies within tasks. It demonstrates the use of Python virtual environments or Docker to manage dependencies.

dag_with_taskflow_api.py

This DAG uses the TaskFlow API introduced in Airflow 2.0. It provides a cleaner and more intuitive way to define tasks and manage dependencies.

our_first_dag.py

This is a simple introductory DAG that serves as a starting point for those new to Airflow. It includes basic tasks and dependencies to get you started.

Contributing

Contributions are welcome! If you have any suggestions or improvements, feel free to open an issue or submit a pull request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published