You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/getting_started/first_steps.adoc
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,5 @@
1
1
= First steps
2
+
:description: Set up an Apache Airflow cluster using Stackable Operator, PostgreSQL, and Redis. Run and monitor example workflows (DAGs) via the web UI or command line.
2
3
3
4
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, you will now deploy a Airflow cluster and its dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/getting_started/index.adoc
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Getting started
2
+
:description: Get started with the Stackable Operator for Apache Airflow by installing the operator, SQL database, and Redis, then setting up and running your first DAG.
2
3
3
-
This guide will get you started with Airflow using the Stackable Operator. It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
4
+
This guide will get you started with Airflow using the Stackable Operator.
5
+
It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/getting_started/installation.adoc
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,5 @@
1
1
= Installation
2
+
:description: Install the Stackable operator for Apache Airflow with PostgreSQL, Redis, and required components using Helm or stackablectl.
2
3
3
4
On this page you will install the Stackable Airflow Operator, the software that Airflow depends on - Postgresql and Redis - as well as the commons, secret and listener operator which are required by all Stackable Operators.
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/index.adoc
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
= Stackable Operator for Apache Airflow
2
-
:description: The Stackable Operator for Apache Airflow is a Kubernetes operator that can manage Apache Airflow clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Airflow versions.
:description: The Stackable Operator for Apache Airflow manages Airflow clusters on Kubernetes, supporting custom workflows, executors, and external databases for efficient orchestration.
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/required-external-components.adoc
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
= Required external components
2
+
:description: Airflow requires PostgreSQL, MySQL, or SQLite for database support, and Redis for Celery executors. MSSQL has experimental support.
2
3
3
-
Airflow requires an SQL database to operate. The https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html[Airflow documentation] specifies:
4
+
Airflow requires an SQL database to operate.
5
+
The https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html[Airflow documentation] specifies:
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/usage-guide/applying-custom-resources.adoc
+17-7Lines changed: 17 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,10 @@
1
1
= Applying Custom Resources
2
+
:description: Learn to apply custom resources in Airflow, such as Spark jobs, using Kubernetes connections, roles, and modular DAGs with git-sync integration.
2
3
3
-
Airflow can be used to apply custom resources from within a cluster. An example of this could be a SparkApplication job that is to be triggered by Airflow. The steps below describe how this can be done. The DAG will consist of modularized python files and will be provisioned using the git-sync facility.
4
+
Airflow can be used to apply custom resources from within a cluster.
5
+
An example of this could be a SparkApplication job that is to be triggered by Airflow.
6
+
The steps below describe how this can be done.
7
+
The DAG will consist of modularized Python files and will be provisioned using the git-sync facility.
Now for the DAG itself. The job to be started is a modularized DAG that uses starts a one-off Spark job that calculates the value of pi. The file structure fetched to the root git-sync folder looks like this:
45
+
Now for the DAG itself.
46
+
The job to be started is a modularized DAG that uses starts a one-off Spark job that calculates the value of pi.
47
+
The file structure fetched to the root git-sync folder looks like this:
42
48
43
49
----
44
50
dags
@@ -57,12 +63,15 @@ The Spark job will calculate the value of pi using one of the example scripts th
57
63
include::example$example-pyspark-pi.yaml[]
58
64
----
59
65
60
-
This will be called from within a DAG by using the connection that was defined earlier. It will be wrapped by the `KubernetesHook` that the Airflow Kubernetes provider makes available https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py[here].There are two classes that are used to:
66
+
This will be called from within a DAG by using the connection that was defined earlier.
67
+
It will be wrapped by the `KubernetesHook` that the Airflow Kubernetes provider makes available https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py[here].
68
+
There are two classes that are used to:
61
69
62
-
- start the job
63
-
- monitor the status of the job
70
+
* start the job
71
+
* monitor the status of the job
64
72
65
-
The classes `SparkKubernetesOperator` and `SparkKubernetesSensor` are located in two different Python modules as they will typically be used for all custom resources and thus are best decoupled from the DAG that calls them. This also demonstrates that modularized DAGs can be used for Airflow jobs as long as all dependencies exist in or below the root folder pulled by git-sync.
73
+
The classes `SparkKubernetesOperator` and `SparkKubernetesSensor` are located in two different Python modules as they will typically be used for all custom resources and thus are best decoupled from the DAG that calls them.
74
+
This also demonstrates that modularized DAGs can be used for Airflow jobs as long as all dependencies exist in or below the root folder pulled by git-sync.
66
75
67
76
[source,python]
68
77
----
@@ -100,6 +109,7 @@ TIP: A full example of the above is used as an integration test https://github.c
100
109
101
110
== Logging
102
111
103
-
As mentioned above, the logs are available from the webserver UI if the jobs run with the `celeryExecutor`. If the SDP logging mechanism has been deployed, log information can also be retrieved from the vector backend (e.g. Opensearch):
112
+
As mentioned above, the logs are available from the webserver UI if the jobs run with the `celeryExecutor`.
113
+
If the SDP logging mechanism has been deployed, log information can also be retrieved from the vector backend (e.g. Opensearch):
Copy file name to clipboardExpand all lines: docs/modules/airflow/pages/usage-guide/listenerclass.adoc
+5-2Lines changed: 5 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,11 @@
1
1
= Service exposition with ListenerClasses
2
+
:description: Configure Airflow service exposure with ListenerClasses: cluster-internal, external-unstable, or external-stable.
2
3
3
-
Airflow offers a web UI and an API, both are exposed by the webserver process under the `webserver` role. The Operator deploys a service called `<name>-webserver` (where `<name>` is the name of the AirflowCluster) through which Airflow can be reached.
4
+
Airflow offers a web UI and an API, both are exposed by the webserver process under the `webserver` role.
5
+
The Operator deploys a service called `<name>-webserver` (where `<name>` is the name of the AirflowCluster) through which Airflow can be reached.
4
6
5
-
This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`. Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level.
7
+
This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`.
8
+
Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level.
0 commit comments