Skip to content

Commit aaf61ca

Browse files
fhennigrazvan
andauthored
Add descriptions (#503)
Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com>
1 parent 2f1e91a commit aaf61ca

15 files changed

+64
-25
lines changed

docs/modules/airflow/pages/getting_started/first_steps.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= First steps
2+
:description: Set up an Apache Airflow cluster using Stackable Operator, PostgreSQL, and Redis. Run and monitor example workflows (DAGs) via the web UI or command line.
23

34
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, you will now deploy a Airflow cluster and its dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
45

docs/modules/airflow/pages/getting_started/index.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
= Getting started
2+
:description: Get started with the Stackable Operator for Apache Airflow by installing the operator, SQL database, and Redis, then setting up and running your first DAG.
23

3-
This guide will get you started with Airflow using the Stackable Operator. It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
4+
This guide will get you started with Airflow using the Stackable Operator.
5+
It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
46

57
== Prerequisites for this guide
68

docs/modules/airflow/pages/getting_started/installation.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Installation
2+
:description: Install the Stackable operator for Apache Airflow with PostgreSQL, Redis, and required components using Helm or stackablectl.
23

34
On this page you will install the Stackable Airflow Operator, the software that Airflow depends on - Postgresql and Redis - as well as the commons, secret and listener operator which are required by all Stackable Operators.
45

docs/modules/airflow/pages/index.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
= Stackable Operator for Apache Airflow
2-
:description: The Stackable Operator for Apache Airflow is a Kubernetes operator that can manage Apache Airflow clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Airflow versions.
3-
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, engineer, big data, metadata, job pipeline, scheduler, workflow, ETL
2+
:description: The Stackable Operator for Apache Airflow manages Airflow clusters on Kubernetes, supporting custom workflows, executors, and external databases for efficient orchestration.
3+
:keywords: Stackable Operator, Apache Airflow, Kubernetes, k8s, operator, job pipeline, scheduler, ETL
44
:airflow: https://airflow.apache.org/
55
:dags: https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html
66
:k8s-crs: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/

docs/modules/airflow/pages/required-external-components.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
= Required external components
2+
:description: Airflow requires PostgreSQL, MySQL, or SQLite for database support, and Redis for Celery executors. MSSQL has experimental support.
23

3-
Airflow requires an SQL database to operate. The https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html[Airflow documentation] specifies:
4+
Airflow requires an SQL database to operate.
5+
The https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html[Airflow documentation] specifies:
46

57
Fully supported for production usage:
68

docs/modules/airflow/pages/usage-guide/applying-custom-resources.adoc

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
= Applying Custom Resources
2+
:description: Learn to apply custom resources in Airflow, such as Spark jobs, using Kubernetes connections, roles, and modular DAGs with git-sync integration.
23

3-
Airflow can be used to apply custom resources from within a cluster. An example of this could be a SparkApplication job that is to be triggered by Airflow. The steps below describe how this can be done. The DAG will consist of modularized python files and will be provisioned using the git-sync facility.
4+
Airflow can be used to apply custom resources from within a cluster.
5+
An example of this could be a SparkApplication job that is to be triggered by Airflow.
6+
The steps below describe how this can be done.
7+
The DAG will consist of modularized Python files and will be provisioned using the git-sync facility.
48

59
== Define an in-cluster Kubernetes connection
610

@@ -38,7 +42,9 @@ include::example$example-airflow-spark-clusterrolebinding.yaml[]
3842

3943
== DAG code
4044

41-
Now for the DAG itself. The job to be started is a modularized DAG that uses starts a one-off Spark job that calculates the value of pi. The file structure fetched to the root git-sync folder looks like this:
45+
Now for the DAG itself.
46+
The job to be started is a modularized DAG that uses starts a one-off Spark job that calculates the value of pi.
47+
The file structure fetched to the root git-sync folder looks like this:
4248

4349
----
4450
dags
@@ -57,12 +63,15 @@ The Spark job will calculate the value of pi using one of the example scripts th
5763
include::example$example-pyspark-pi.yaml[]
5864
----
5965

60-
This will be called from within a DAG by using the connection that was defined earlier. It will be wrapped by the `KubernetesHook` that the Airflow Kubernetes provider makes available https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py[here].There are two classes that are used to:
66+
This will be called from within a DAG by using the connection that was defined earlier.
67+
It will be wrapped by the `KubernetesHook` that the Airflow Kubernetes provider makes available https://github.com/apache/airflow/blob/main/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py[here].
68+
There are two classes that are used to:
6169

62-
- start the job
63-
- monitor the status of the job
70+
* start the job
71+
* monitor the status of the job
6472

65-
The classes `SparkKubernetesOperator` and `SparkKubernetesSensor` are located in two different Python modules as they will typically be used for all custom resources and thus are best decoupled from the DAG that calls them. This also demonstrates that modularized DAGs can be used for Airflow jobs as long as all dependencies exist in or below the root folder pulled by git-sync.
73+
The classes `SparkKubernetesOperator` and `SparkKubernetesSensor` are located in two different Python modules as they will typically be used for all custom resources and thus are best decoupled from the DAG that calls them.
74+
This also demonstrates that modularized DAGs can be used for Airflow jobs as long as all dependencies exist in or below the root folder pulled by git-sync.
6675

6776
[source,python]
6877
----
@@ -100,6 +109,7 @@ TIP: A full example of the above is used as an integration test https://github.c
100109

101110
== Logging
102111

103-
As mentioned above, the logs are available from the webserver UI if the jobs run with the `celeryExecutor`. If the SDP logging mechanism has been deployed, log information can also be retrieved from the vector backend (e.g. Opensearch):
112+
As mentioned above, the logs are available from the webserver UI if the jobs run with the `celeryExecutor`.
113+
If the SDP logging mechanism has been deployed, log information can also be retrieved from the vector backend (e.g. Opensearch):
104114

105115
image::airflow_dag_log_opensearch.png[Opensearch]
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,4 @@
11
= Usage guide
2+
:description: Practical instructions to make the most out of the Stackable operator for Apache Airflow.
3+
4+
Practical instructions to make the most out of the Stackable operator for Apache Airflow.

docs/modules/airflow/pages/usage-guide/listenerclass.adoc

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
= Service exposition with ListenerClasses
2+
:description: Configure Airflow service exposure with ListenerClasses: cluster-internal, external-unstable, or external-stable.
23

3-
Airflow offers a web UI and an API, both are exposed by the webserver process under the `webserver` role. The Operator deploys a service called `<name>-webserver` (where `<name>` is the name of the AirflowCluster) through which Airflow can be reached.
4+
Airflow offers a web UI and an API, both are exposed by the webserver process under the `webserver` role.
5+
The Operator deploys a service called `<name>-webserver` (where `<name>` is the name of the AirflowCluster) through which Airflow can be reached.
46

5-
This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`. Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level.
7+
This service can have three different types: `cluster-internal`, `external-unstable` and `external-stable`.
8+
Read more about the types in the xref:concepts:service-exposition.adoc[service exposition] documentation at platform level.
69

710
This is how the listener class is configured:
811

docs/modules/airflow/pages/usage-guide/logging.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Log aggregation
2+
:description: Forward Airflow logs to a Vector aggregator by configuring the ConfigMap and enabling the log agent.
23

34
The logs can be forwarded to a Vector log aggregator by providing a discovery
45
ConfigMap for the aggregator and by enabling the log agent:
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Monitoring
2+
:description: Airflow instances export Prometheus metrics for monitoring.
23

3-
The managed Airflow instances are automatically configured to export Prometheus metrics. See
4-
xref:operators:monitoring.adoc[] for more details.
4+
The managed Airflow instances are automatically configured to export Prometheus metrics.
5+
See xref:operators:monitoring.adoc[] for more details.

0 commit comments

Comments
 (0)