Skip to content

Commit 97b675d

Browse files
fhennigmaltesander
andauthored
docs: pod overrides and other improvements (#513)
* docs: pod overrides and other improvements * recognise->recognize * use tabs in the installation instructions * more wording changes * more improvements * Apply suggestions from code review --------- Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
1 parent 0667c73 commit 97b675d

17 files changed

+182
-149
lines changed

docs/modules/airflow/pages/getting_started/first_steps.adoc

Lines changed: 33 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,19 @@
11
= First steps
22
:description: Set up an Apache Airflow cluster using Stackable Operator, PostgreSQL, and Redis. Run and monitor example workflows (DAGs) via the web UI or command line.
33

4-
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, you will now deploy a Airflow cluster and its dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
4+
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the Operator and its dependencies, deploy an Airflow cluster and its dependencies.
5+
Afterward you can <<_verify_that_it_works, verify that it works>> by running and tracking an example DAG.
56

67
== Setup
78

8-
As we have installed the external dependencies required by Airflow (Postgresql and Redis) we can now install the Airflow cluster itself.
9+
With the external dependencies required by Airflow (Postgresql and Redis) installed, install the Airflow Stacklet itself.
910

1011
Supported versions for PostgreSQL and Redis can be found in the https://airflow.apache.org/docs/apache-airflow/stable/installation/prerequisites.html#prerequisites[Airflow documentation].
1112

1213
== Secret with Airflow credentials
1314

14-
A secret with the necessary credentials must be created, this entails database connection credentials as well as an admin account for Airflow itself. Create a file called `airflow-credentials.yaml`:
15+
Create a Secret with the necessary credentials, this entails database connection credentials as well as an admin account for Airflow itself.
16+
Create a file called `airflow-credentials.yaml`:
1517

1618
[source,yaml]
1719
include::example$getting_started/code/airflow-credentials.yaml[]
@@ -21,24 +23,26 @@ And apply it:
2123
[source,bash]
2224
include::example$getting_started/code/getting_started.sh[tag=apply-airflow-credentials]
2325

24-
The `connections.secretKey` will be used for securely signing the session cookies and can be used for any other security related needs by extensions. It should be a long random string of bytes.
26+
The `connections.secretKey` is used for securely signing the session cookies and can be used for any other security related needs by extensions.
27+
It should be a long random string of bytes.
2528

2629
`connections.sqlalchemyDatabaseUri` must contain the connection string to the SQL database storing the Airflow metadata.
2730

28-
`connections.celeryResultBackend` must contain the connection string to the SQL database storing the job metadata (in the example above we are using the same postgresql database for both).
31+
`connections.celeryResultBackend` must contain the connection string to the SQL database storing the job metadata (the example above uses the same PostgreSQL database for both).
2932

3033
`connections.celeryBrokerUrl` must contain the connection string to the Redis instance used for queuing the jobs submitted to the airflow executor(s).
3134

3235
The `adminUser` fields are used to create an admin user.
33-
Please note that the admin user will be disabled if you use a non-default authentication mechanism like LDAP.
36+
37+
NOTE: The admin user is disabled if you use a non-default authentication mechanism like LDAP.
3438

3539
== Airflow
3640

3741
An Airflow cluster is made of up three components:
3842

39-
- `webserver`: this provides the main UI for user-interaction
40-
- `executors`: the `CeleryExecutor` or `KubernetesExecutor` nodes over which the job workload will be distributed by the scheduler
41-
- `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
43+
* `webserver`: this provides the main UI for user-interaction
44+
* `executors`: the CeleryExecutor or KubernetesExecutor nodes over which the job workload is distributed by the scheduler
45+
* `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database
4246

4347
Create a file named `airflow.yaml` with the following contents:
4448

@@ -55,17 +59,22 @@ include::example$getting_started/code/getting_started.sh[tag=install-airflow]
5559

5660
Where:
5761

58-
- `metadata.name` contains the name of the Airflow cluster.
59-
- the product version of the Docker image provided by Stackable must be set in `spec.image.productVersion`.
60-
- `spec.celeryExecutors`: deploy executors managed by Airflow's Celery engine. Alternatively you can use `kuberenetesExectors` that will use Airflow's Kubernetes engine for executor management. For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
61-
- the `spec.clusterConfig.loadExamples` key is optional and defaults to `false`. It is set to `true` here as the example DAGs will be used when verifying the installation.
62-
- the `spec.clusterConfig.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
63-
- the previously created secret must be referenced in `spec.clusterConfig.credentialsSecret`.
64-
65-
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache Airflow. You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator. For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry].
62+
* `metadata.name` contains the name of the Airflow cluster.
63+
* the product version of the Docker image provided by Stackable must be set in `spec.image.productVersion`.
64+
* `spec.celeryExecutors`: deploy executors managed by Airflow's Celery engine.
65+
Alternatively you can use `kuberenetesExectors` that use Airflow's Kubernetes engine for executor management.
66+
For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
67+
* the `spec.clusterConfig.loadExamples` key is optional and defaults to `false`.
68+
It is set to `true` here as the example DAGs are used when verifying the installation.
69+
* the `spec.clusterConfig.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
70+
* the previously created secret must be referenced in `spec.clusterConfig.credentialsSecret`.
71+
72+
NOTE: The version you need to specify for `spec.image.productVersion` is the desired version of Apache Airflow.
73+
You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator.
74+
Check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry] for a list of available versions.
6675
It should generally be safe to simply use the latest version that is available.
6776

68-
This will create the actual Airflow cluster.
77+
This creates the actual Airflow cluster.
6978

7079
After a while, all the Pods in the StatefulSets should be ready:
7180

@@ -99,11 +108,11 @@ The Webserver UI can now be opened in the browser with `http://localhost:8080`.
99108

100109
image::getting_started/airflow_login.png[Airflow login screen]
101110

102-
Since the examples were loaded in the cluster definition, they will appear under the DAGs tabs:
111+
Since the examples were loaded in the cluster definition, they appear under the DAGs tabs:
103112

104113
image::getting_started/airflow_dags.png[Example Airflow DAGs]
105114

106-
Select one of these DAGs by clicking on the name in the left-hand column e.g. `example_trigger_target_dag`. Click on the arrow in the top right of the screen, select "Trigger DAG" and the DAG nodes will be automatically highlighted as the job works through its phases.
115+
Select one of these DAGs by clicking on the name in the left-hand column e.g. `example_trigger_target_dag`. Click on the arrow in the top right of the screen, select "Trigger DAG" and the DAG nodes are automatically highlighted as the job works through its phases.
107116

108117
image::getting_started/airflow_running.png[Airflow DAG in action]
109118

@@ -117,15 +126,16 @@ If you prefer to interact directly with the API instead of using the web interfa
117126
[source,bash]
118127
include::example$getting_started/code/getting_started.sh[tag=enable-dag]
119128
120-
A DAG can then be triggered by providing the DAG name (in this case, `example_trigger_target_dag`). The response identifies the DAG identifier, which we can parse out of the JSON like this:
129+
A DAG can then be triggered by providing the DAG name (in this case, `example_trigger_target_dag`).
130+
The response identifies the DAG identifier, which can be parse out of the JSON like this:
121131
[source,bash]
122132
include::example$getting_started/code/getting_started.sh[tag=run-dag]
123133
124-
If we read this identifier into a variable such as `dag_id` (or replace it manually in the command) we can run this command to access the status of the DAG run:
134+
If this identifier is stored in a variable such as `dag_id` (manually replaced in the command) you can run this command to access the status of the DAG run:
125135
[source,bash]
126136
include::example$getting_started/code/getting_started.sh[tag=check-dag]
127137
====
128138

129139
== What's next
130140

131-
Look at the xref:usage-guide/index.adoc[] to find out more about configuring your Airflow cluster and loading your own DAG files.
141+
Look at the xref:usage-guide/index.adoc[] to find out more about configuring your Airflow Stacklet and loading your own DAG files.

docs/modules/airflow/pages/getting_started/index.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
= Getting started
22
:description: Get started with the Stackable Operator for Apache Airflow by installing the operator, SQL database, and Redis, then setting up and running your first DAG.
33

4-
This guide will get you started with Airflow using the Stackable Operator.
5-
It will guide you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
4+
This guide gets you started with Airflow using the Stackable Operator.
5+
It guides you through the installation of the Operator as well as an SQL database and Redis instance for trial usage, setting up your first Airflow cluster and connecting to it, and viewing and running one of the example workflows (called DAGs = Direct Acyclic Graphs).
66

77
== Prerequisites for this guide
88

9-
You will need:
9+
You need:
1010

1111
* a Kubernetes cluster
1212
* kubectl
1313
* Helm
1414

15-
Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
15+
Resource sizing depends on cluster type(s), usage and scope, but as a minimum starting point the following resources are recommended for this operator:
1616

1717
include::partial$hardware-requirements.adoc[]
1818

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
= Installation
22
:description: Install the Stackable operator for Apache Airflow with PostgreSQL, Redis, and required components using Helm or stackablectl.
3+
:kind: https://kind.sigs.k8s.io/
34

4-
On this page you will install the Stackable Airflow Operator, the software that Airflow depends on - Postgresql and Redis - as well as the commons, secret and listener operator which are required by all Stackable Operators.
5+
Install the Stackable Airflow operator, the software that Airflow depends on -- PostgreSQL and Redis -- as well as the commons, secret and listener operator which are required by all Stackable operators.
56

6-
== Required external components: Postgresql and Redis
7+
== Required external components: PostgreSQL and Redis
78

8-
Postgresql is required by Airflow to store metadata about DAG runs, and Redis is required by the Celery executor to schedule and/or queue DAG jobs. They are components that may well already be available for customers, in which case we treat them here as pre-requisites for an airflow cluster and hence as part of the installation process. These components will be installed using Helm. Note that specific versions are declared:
9+
PostgreSQL is required by Airflow to store metadata about DAG runs, and Redis is required by the Celery executor to schedule and/or queue DAG jobs.
10+
They are components that may well already be available for customers, in which case they are treated as prerequisites for an Airflow cluster and hence as part of the installation process.
11+
Install these components using Helm.
12+
Note that specific versions are declared:
913

1014
[source,bash]
1115
----
@@ -20,55 +24,60 @@ include::example$getting_started/code/getting_started.sh[tag=helm-add-bitnami-pg
2024
include::example$getting_started/code/getting_started.sh[tag=helm-add-bitnami-redis]
2125
----
2226

23-
WARNING: Do not use this setup in production! Supported databases and versions are listed on the xref:required-external-components.adoc[required external components] page for this Operator. Please follow the instructions of those components for a production setup.
27+
WARNING: Do not use this setup in production!
28+
Supported databases and versions are listed on the xref:required-external-components.adoc[required external components] page for this operator.
29+
Follow the instructions of those components for a production setup.
2430

25-
== Stackable Operators
31+
== Stackable operators
2632

27-
There are 2 ways to run Stackable Operators
33+
There are multiple ways to install the Stackable operator for Apache Airflow.
34+
xref:management:stackablectl:index.adoc[] is the preferred way, but Helm is also supported.
35+
OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.
2836

29-
1. Using xref:management:stackablectl:index.adoc[]
30-
31-
2. Using Helm
32-
33-
=== stackablectl
34-
35-
stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
37+
[tabs]
38+
====
39+
stackablectl::
40+
+
41+
--
42+
stackablectl is the command line tool to interact with Stackable operators and our recommended way to install operators.
3643
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.
3744
38-
After you have installed stackablectl run the following command to install all Operators necessary for Airflow:
45+
After you have installed stackablectl run the following command to install all operators necessary for Airflow:
3946
4047
[source,bash]
4148
----
4249
include::example$getting_started/code/getting_started.sh[tag=stackablectl-install-operators]
4350
----
4451
45-
The tool will show
52+
The tool shows
4653
4754
[source]
4855
include::example$getting_started/code/install_output.txt[]
4956
50-
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use stackablectl. For
51-
example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
52-
53-
=== Helm
57+
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use stackablectl.
58+
For example, you can use the `--cluster kind` flag to create a Kubernetes cluster with {kind}[kind].
59+
--
5460
55-
You can also use Helm to install the Operators. Add the Stackable Helm repository:
61+
Helm::
62+
+
63+
--
64+
You can also use Helm to install the operators.
65+
Add the Stackable Helm repository:
5666
[source,bash]
5767
----
5868
include::example$getting_started/code/getting_started.sh[tag=helm-add-repo]
5969
----
6070
61-
Then install the Stackable Operators:
71+
Then install the Stackable operators:
6272
[source,bash]
6373
----
6474
include::example$getting_started/code/getting_started.sh[tag=helm-install-operators]
6575
----
6676
67-
Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the
68-
CRDs for the required operators). You are now ready to deploy Apache Airflow in Kubernetes.
77+
Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the Airflow cluster (as well as the CRDs for the required operators).
78+
--
79+
====
6980

7081
== What's next
7182

72-
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and
73-
xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example
74-
DAG.
83+
xref:getting_started/first_steps.adoc[Set up an Airflow cluster] and its dependencies and xref:getting_started/first_steps.adoc#_verify_that_it_works[verify that it works] by inspecting and running an example DAG.

0 commit comments

Comments
 (0)