You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* docs: pod overrides and other improvements
* recognise->recognize
* use tabs in the installation instructions
* more wording changes
* more improvements
* Apply suggestions from code review
---------
Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/getting_started/first_steps.adoc
+33-23Lines changed: 33 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,19 @@
1
1
= First steps
2
2
:description: Set up an Apache Airflow cluster using Stackable Operator, PostgreSQL, and Redis. Run and monitor example workflows (DAGs) via the web UI or command line.
3
3
4
-
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, you will now deploy a Airflow cluster and its dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
4
+
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, deploy an Airflow cluster and its dependencies.
5
+
Afterward you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
5
6
6
7
== Setup
7
8
8
-
As we have installed the external dependencies required by Airflow (Postgresql and Redis) we can now install the Airflow cluster itself.
9
+
With the external dependencies required by Airflow (Postgresql and Redis) installed, install the Airflow Stacklet itself.
9
10
10
11
Supported versions for PostgreSQL and Redis can be found in the https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html#prerequisites[Airflow documentation].
11
12
12
13
== Secret with Airflow credentials
13
14
14
-
A secret with the necessary credentials must be created, this entails database connection credentials as well as an admin account for Airflow itself. Create a file called `airflow-credentials.yaml`:
15
+
Create a Secret with the necessary credentials, this entails database connection credentials as well as an admin account for Airflow itself.
The `connections.secretKey` will be used for securely signing the session cookies and can be used for any other security related needs by extensions. It should be a long random string of bytes.
26
+
The `connections.secretKey` is used for securely signing the session cookies and can be used for any other security related needs by extensions.
27
+
It should be a long random string of bytes.
25
28
26
29
`connections.sqlalchemyDatabaseUri` must contain the connection string to the SQL database storing the Airflow metadata.
27
30
28
-
`connections.celeryResultBackend` must contain the connection string to the SQL database storing the job metadata (in the example above we are using the same postgresql database for both).
31
+
`connections.celeryResultBackend` must contain the connection string to the SQL database storing the job metadata (the example above uses the same PostgreSQL database for both).
29
32
30
33
`connections.celeryBrokerUrl` must contain the connection string to the Redis instance used for queuing the jobs submitted to the airflow executor(s).
31
34
32
35
The `adminUser` fields are used to create an admin user.
33
-
Please note that the admin user will be disabled if you use a non-default authentication mechanism like LDAP.
36
+
37
+
NOTE: The admin user is disabled if you use a non-default authentication mechanism like LDAP.
34
38
35
39
== Airflow
36
40
37
41
An Airflow cluster is made of up three components:
38
42
39
-
- `webserver`: this provides the main UI for user-interaction
40
-
- `executors`: the `CeleryExecutor` or `KubernetesExecutor` nodes over which the job workload will be distributed by the scheduler
41
-
- `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
43
+
* `webserver`: this provides the main UI for user-interaction
44
+
* `executors`: the CeleryExecutor or KubernetesExecutor nodes over which the job workload is distributed by the scheduler
45
+
* `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
42
46
43
47
Create a file named `airflow.yaml` with the following contents:
- `metadata.name` contains the name of the Airflow cluster.
59
-
- the product version of the Docker image provided by Stackable must be set in `spec.image.productVersion`.
60
-
- `spec.celeryExecutors`: deploy executors managed by Airflow's Celery engine. Alternatively you can use `kuberenetesExectors` that will use Airflow's Kubernetes engine for executor management. For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
61
-
- the `spec.clusterConfig.loadExamples` key is optional and defaults to `false`. It is set to `true` here as the example DAGs will be used when verifying the installation.
62
-
- the `spec.clusterConfig.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
63
-
- the previously created secret must be referenced in `spec.clusterConfig.credentialsSecret`.
64
-
65
-
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache Airflow. You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator. For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry].
62
+
* `metadata.name` contains the name of the Airflow cluster.
63
+
* the product version of the Docker image provided by Stackable must be set in `spec.image.productVersion`.
64
+
* `spec.celeryExecutors`: deploy executors managed by Airflow's Celery engine.
65
+
Alternatively you can use `kuberenetesExectors` that use Airflow's Kubernetes engine for executor management.
66
+
For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
67
+
* the `spec.clusterConfig.loadExamples` key is optional and defaults to `false`.
68
+
It is set to `true` here as the example DAGs are used when verifying the installation.
69
+
* the `spec.clusterConfig.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
70
+
* the previously created secret must be referenced in `spec.clusterConfig.credentialsSecret`.
71
+
72
+
NOTE: The version you need to specify for `spec.image.productVersion` is the desired version of Apache Airflow.
73
+
You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator.
74
+
Check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry] for a list of available versions.
66
75
It should generally be safe to simply use the latest version that is available.
67
76
68
-
This will create the actual Airflow cluster.
77
+
This creates the actual Airflow cluster.
69
78
70
79
After a while, all the Pods in the StatefulSets should be ready:
71
80
@@ -99,11 +108,11 @@ The Webserver UI can now be opened in the browser with `http://localhost:8080`.
Select one of these DAGs by clicking on the name in the left-hand column e.g. `example_trigger_target_dag`. Click on the arrow in the top right of the screen, select "Trigger DAG" and the DAG nodes will be automatically highlighted as the job works through its phases.
115
+
Select one of these DAGs by clicking on the name in the left-hand column e.g. `example_trigger_target_dag`. Click on the arrow in the top right of the screen, select "Trigger DAG" and the DAG nodes are automatically highlighted as the job works through its phases.
107
116
108
117
image::getting_started/airflow_running.png[Airflow DAG in action]
109
118
@@ -117,15 +126,16 @@ If you prefer to interact directly with the API instead of using the web interfa
A DAG can then be triggered by providing the DAG name (in this case, `example_trigger_target_dag`). The response identifies the DAG identifier, which we can parse out of the JSON like this:
129
+
A DAG can then be triggered by providing the DAG name (in this case, `example_trigger_target_dag`).
130
+
The response identifies the DAG identifier, which can be parse out of the JSON like this:
If we read this identifier into a variable such as `dag_id` (or replace it manually in the command) we can run this command to access the status of the DAG run:
134
+
If this identifier is stored in a variable such as `dag_id` (manually replaced in the command) you can run this command to access the status of the DAG run:
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/getting_started/index.adoc
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,18 @@
1
1
= Getting started
2
2
:description: Get started with the Stackable Operator for Apache Airflow by installing the operator, SQL database, and Redis, then setting up and running your first DAG.
3
3
4
-
This guide will get you started with Airflow using the Stackable Operator.
5
-
It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
4
+
This guide gets you started with Airflow using the Stackable Operator.
5
+
It guides you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
6
6
7
7
== Prerequisites for this guide
8
8
9
-
You will need:
9
+
You need:
10
10
11
11
* a Kubernetes cluster
12
12
* kubectl
13
13
* Helm
14
14
15
-
Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
15
+
Resource sizing depends on cluster type(s), usage and scope, but as a minimum starting point the following resources are recommended for this operator:
:description: Install the Stackable operator for Apache Airflow with PostgreSQL, Redis, and required components using Helm or stackablectl.
3
+
:kind: https://kind.sigs.k8s.io/
3
4
4
-
On this page you will install the Stackable Airflow Operator, the software that Airflow depends on - Postgresql and Redis - as well as the commons, secret and listener operator which are required by all Stackable Operators.
5
+
Install the Stackable Airflow operator, the software that Airflow depends on -- PostgreSQL and Redis -- as well as the commons, secret and listener operator which are required by all Stackable operators.
5
6
6
-
== Required external components: Postgresql and Redis
7
+
== Required external components: PostgreSQL and Redis
7
8
8
-
Postgresql is required by Airflow to store metadata about DAG runs, and Redis is required by the Celery executor to schedule and/or queue DAG jobs. They are components that may well already be available for customers, in which case we treat them here as pre-requisites for an airflow cluster and hence as part of the installation process. These components will be installed using Helm. Note that specific versions are declared:
9
+
PostgreSQL is required by Airflow to store metadata about DAG runs, and Redis is required by the Celery executor to schedule and/or queue DAG jobs.
10
+
They are components that may well already be available for customers, in which case they are treated as prerequisites for an Airflow cluster and hence as part of the installation process.
WARNING: Do not use this setup in production! Supported databases and versions are listed on the xref:required-external-components.adoc[required external components] page for this Operator. Please follow the instructions of those components for a production setup.
27
+
WARNING: Do not use this setup in production!
28
+
Supported databases and versions are listed on the xref:required-external-components.adoc[required external components] page for this operator.
29
+
Follow the instructions of those components for a production setup.
24
30
25
-
== Stackable Operators
31
+
== Stackable operators
26
32
27
-
There are 2 ways to run Stackable Operators
33
+
There are multiple ways to install the Stackable operator for Apache Airflow.
34
+
xref:management:stackablectl:index.adoc[] is the preferred way, but Helm is also supported.
35
+
OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.
28
36
29
-
1. Using xref:management:stackablectl:index.adoc[]
30
-
31
-
2. Using Helm
32
-
33
-
=== stackablectl
34
-
35
-
stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
37
+
[tabs]
38
+
====
39
+
stackablectl::
40
+
+
41
+
--
42
+
stackablectl is the command line tool to interact with Stackable operators and our recommended way to install operators.
36
43
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.
37
44
38
-
After you have installed stackablectl run the following command to install all Operators necessary for Airflow:
45
+
After you have installed stackablectl run the following command to install all operators necessary for Airflow:
Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the
68
-
CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.
77
+
Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the CRDs for the required operators).
78
+
--
79
+
====
69
80
70
81
== What's next
71
82
72
-
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and
73
-
xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example
74
-
DAG.
83
+
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example DAG.
0 commit comments