🌬️ Basics of Airflow for Modern AI and MLOps

This repository contains the example code and setup instructions for the "Airflow Basics for Modern AI and MLOps" tutorial. It will guide you through setting up Apache Airflow using Docker and running a basic example DAG.

👩‍💻 Installation & Setup

Follow these steps to get your Airflow environment up and running.

1️⃣ Fork / Clone this Repository

First, get the tutorial example code onto your local machine:

git clone https://github.com/mlrepa/airflow-for-modern-ai-and-mlops.git
cd airflow-for-modern-ai-and-mlops

🚀 Launch Airflow

Running options:

Standalone (python environment)
Docker Compose (docker container)

1. Run Airflow Standalone (Local Installation)

For development or learning purposes, you can run Airflow in standalone mode directly on your machine. This method is simpler but less production-ready than the Docker Compose approach.

Step 1: Install Airflow Locally

If you haven't installed Airflow locally, you can do so using the development environment setup:

# Create virtual environment and install dependencies
uv venv .venv --python 3.12
source .venv/bin/activate
uv sync

Step 2: Run Airflow Standalone

The Standalone command will initialise the database, make a user, and start all components for you.

export AIRFLOW__CORE__DAGS_FOLDER=$(pwd)/dags
airflow standalone

📝 Note: The admin username and password will be displayed in the command output. Look for lines like:
standalone | Airflow is ready
standalone | Login with username: admin  password: [generated_password]

Step 3: Configure DAGs Folder (Optional)

To make DAGs from this repository's dags/ folder visible in your local Airflow instance:

Stop Airflow by pressing Ctrl+C in the terminal where airflow standalone is running

Update the configuration in ~/airflow/airflow.cfg:

# Find the [core] section and update the dags_folder setting
dags_folder = [PATH_TO_YOUR_REPO]/dags

Restart Airflow:
```
airflow standalone
```

Access the Web Interface

URL: http://localhost:8080
Credentials: Use the admin username and password shown in the startup output

2. Run Aiflow in Docker ⚙️

Step 1: Set Up Environment Variables

For Linux users, it's crucial to set the host user ID to ensure correct file permissions for files created by Airflow within the Docker containers.

echo -e "AIRFLOW_UID=$(id -u)" > .env

👉 Tip: This step helps avoid permission issues where files written by Airflow containers (running as root by default) become inaccessible to your host user.

Step 2: Initialize the Database

This step sets up the Airflow metadata database and creates the initial admin user.

docker compose up airflow-init

⚠️ Default Credentials: The account created by airflow-init has the login: airflow and password: airflow. Remember to change these for any non-local or production environment!

Step 3: Run with Docker Compose

With the environment initialized, you can now start all the Airflow services:

docker compose up -d

The -d flag runs the containers in detached mode (in the background).

🔍 Understanding the Docker Compose Services

The docker-compose.yaml file defines several services that work together:

airflow-webserver: The Airflow User Interface (UI). Accessible at http://localhost:8080.
airflow-scheduler: The core Airflow component that monitors DAGs and triggers task runs. It doesn't have exposed ports.
airflow-worker (if defined): Executes the tasks scheduled by the scheduler (common in CeleryExecutor setups).
airflow-triggerer (if defined): Manages deferrable tasks.
postgres (or similar): The PostgreSQL database used by Airflow to store its metadata. Accessible (e.g., for DB tools) typically on localhost:5432.
redis (if defined): Often used as a message broker for CeleryExecutor.

For more details on the official Docker Compose setup, refer to the Airflow Docker Documentation.

To check the status of your running containers and ensure they are healthy:

docker ps

📺 Accessing Your Airflow Environment

Once Airflow is running, you can interact with it in several ways:

1️⃣ Via the Web Interface (UI)

This is the most common way to monitor and manage your DAGs.

URL: http://localhost:8080
Login: airflow
Password: airflow

2️⃣ Using the Command Line Interface (CLI)

You can execute Airflow CLI commands directly within the running Airflow containers.

Option A: Using docker compose run (for one-off commands)

This starts a new, temporary container to run the command. Replace airflow-1-get-started-airflow-scheduler-1 with the actual name of your scheduler or webserver service if it differs (check docker-compose.yaml or docker ps). Often, the scheduler service is a good target for CLI commands.

# Example: Get Airflow version and configuration info
docker compose run airflow-scheduler airflow info
# Example: List DAGs
docker compose run airflow-scheduler airflow dags list

👉 Note: The service name like airflow-1-get-started-airflow-apiserver-1 in your original example might be from a custom compose file. For the official one, it's usually airflow-scheduler or airflow-webserver.

Option B: Executing commands in an existing container

This opens an interactive shell inside a running container (e.g., the webserver or scheduler).

# Get a bash shell inside the webserver container
docker exec -ti airflow-1-get-started-airflow-webserver-1 /bin/bash
# Or, more commonly for official compose file:
# docker exec -ti <your_project_name>-airflow-webserver-1 /bin/bash

# Once inside, you can run Airflow commands directly:
# airflow dags list
# airflow tasks test your_dag_id your_task_id YYYY-MM-DD
# exit

(Replace <your_project_name>-airflow-webserver-1 with the actual name from docker ps)

🧹 Cleaning Up Your Environment

When you're done, you can stop and remove the Airflow containers and associated resources.

Stop the Cluster:

docker compose down

Full Cleanup (Stop, Remove Containers, Volumes, and Orphans):

This command is useful if you want to reset your environment completely, including the database data.

docker compose down -v --remove-orphans

⚠️ Data Loss: Using the -v flag will remove the Docker volumes, including your Airflow metadata database. This means all DAG run history, connections, etc., will be lost.

Full Cleanup including Images (Use with Caution):

This will also remove the Docker images that were downloaded or built for this setup.

docker compose down --volumes --rmi all

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
assets/images		assets/images
config		config
dags		dags
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.python-version		.python-version
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
tutorial.md		tutorial.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌬️ Basics of Airflow for Modern AI and MLOps

👩‍💻 Installation & Setup

1️⃣ Fork / Clone this Repository

🚀 Launch Airflow

1. Run Airflow Standalone (Local Installation)

2. Run Aiflow in Docker ⚙️

📺 Accessing Your Airflow Environment

1️⃣ Via the Web Interface (UI)

2️⃣ Using the Command Line Interface (CLI)

🧹 Cleaning Up Your Environment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mlrepa/airflow-for-modern-ai-and-mlops

Folders and files

Latest commit

History

Repository files navigation

🌬️ Basics of Airflow for Modern AI and MLOps

👩‍💻 Installation & Setup

1️⃣ Fork / Clone this Repository

🚀 Launch Airflow

1. Run Airflow Standalone (Local Installation)

2. Run Aiflow in Docker ⚙️

📺 Accessing Your Airflow Environment

1️⃣ Via the Web Interface (UI)

2️⃣ Using the Command Line Interface (CLI)

🧹 Cleaning Up Your Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages