Skip to content

Commit 3e996da

Browse files
authored
Merge pull request #12880 from kalexand-rh/4.0_arch
4.0 arch draft
2 parents 5117003 + 49d443f commit 3e996da

14 files changed

+606
-2
lines changed

_topic_map.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,13 @@ Topics:
5757
- Name: OpenShift CCS modular docs conventions
5858
File: mod-docs-conventions-ocp
5959
---
60+
Name: Architecture
61+
Dir: architecture
62+
Distros: openshift-*
63+
Topics:
64+
- Name: Architecture
65+
File: architecture
66+
---
6067
Name: Authentication
6168
Dir: authentication
6269
Distros: openshift-*

architecture/architecture.adoc

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
// This assembly is included in the following assemblies:
2+
//
3+
// * n/a
4+
5+
[id='architecture']
6+
= {product-title} architecture
7+
include::modules/common-attributes.adoc[]
8+
:context: architecture
9+
toc::[]
10+
11+
12+
[IMPORTANT]
13+
====
14+
This assembly is a temporary placeholder to port the valid information from
15+
the 3.11 collection and include specific changes for 4.0 as that information
16+
becomes available.
17+
====
18+
19+
include::modules/architecture-overview.adoc[leveloffset=+1]
20+
21+
include::modules/installation-options.adoc[leveloffset=+2]
22+
23+
include::modules/update-service-overview.adoc[leveloffset=+2]
24+
25+
include::modules/node-management.adoc[leveloffset=+2]
26+
27+
include::modules/node-types.adoc[leveloffset=+2]
28+
29+
include::modules/operators-overview.adoc[leveloffset=+2]
30+
31+
include::modules/abstraction-layers.adoc[leveloffset=+2]
32+
33+
include::modules/machine-api-overview.adoc[leveloffset=+1]
34+
35+
[[observability-architecture]]
36+
== Observability
37+
38+
[IMPORTANT]
39+
====
40+
This section of the assembly is a placeholder for the Observability section,
41+
which will explain how monitoring, alerting, grafana, logging, and telemetry fit together.
42+
====
43+
44+
include::modules/telemetry-service-overview.adoc[leveloffset=+2]
194 KB
Loading

modules/abstraction-layers.adoc

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='abstraction-layers-{context}']
6+
= {product-title} abstraction layers
7+
8+
The container service provides the abstraction for packaging and creating
9+
Linux-based, lightweight container images. Kubernetes provides the
10+
cluster management and orchestrates containers on multiple hosts.
11+
12+
{product-title} adds:
13+
14+
- Source code management, builds, and deployments for developers
15+
- Managing and promoting images at scale as they flow through your system
16+
- Application management at scale
17+
- Team and user tracking for organizing a large developer organization
18+
- Networking infrastructure that supports the cluster
19+
20+
.{product-title} Architecture Overview
21+
image::../images/architecture_overview.png[{product-title} Architecture Overview]
22+
23+
The cluster uses a combination of master and worker nodes.

modules/architecture-overview.adoc

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='architecture-overview-{context}']
6+
= Architecture overview
7+
8+
With {product-title} v4, the core story remains unchanged: {product-title} offers
9+
your developers a set of tools to evolve their applications under operational oversight
10+
and using Kubernetes to provide application infrastructure. The key change to v4 is
11+
that the infrastructure and its management are flexible, automated, and self-managing.
12+
13+
A major difference between {product-title} v3 and v4 is that v4 uses Operators
14+
as both the fundamental unit of the product and an option for easily deploying
15+
and managing utilities that your apps use.
16+
17+
{product-title} v4 runs on top of a Kubernetes cluster, with data about the
18+
objects stored in etcd, a reliable clustered key-value store. The cluster is
19+
enhanced with standard components that you need to run your cluster, including
20+
network, ingress, logging, and monitoring, that run as Operators to increase the
21+
ease and automation of installation, scaling, and maintenance.
22+
23+
////
24+
The core services include:
25+
26+
* Operators, which run the core {product-title} services.
27+
* REST APIs, which expose each of the core objects:
28+
** Containers and images, which are the building blocks for deploying your
29+
applications.
30+
** Pods and services, which containers use to communicate with each other and
31+
proxy connections.
32+
** Projects and users, which provide the space and means for communities to
33+
organize and manage their content together.
34+
** Builds and image streams allow you to
35+
build working images and react to new images.
36+
** Deployments, which expand support for the software development and deployment
37+
lifecycle.
38+
** Ingress and routes, which announce your service to the world.
39+
* Controllers, which read those REST APIs, apply changes to other objects, and
40+
report status or write back to the object.
41+
////
42+
43+
{product-title} offers a catalog of supporting application infrastructure that
44+
includes:
45+
46+
* Operators, which expose APIs that automate the complete component lifecycle
47+
and include components like databases
48+
* Service bindings, which consume services that run outside the cluster
49+
* Templates, which are simple instant examples
50+
51+
Users make calls to the REST API to change the state of the system. Controllers
52+
use the REST API to read the user's desired state and then try to bring the
53+
other parts of the system into sync. For example, when you request a build, the
54+
REST APIs create a `build` object. The build controller sees that a new build has been created, and
55+
runs a process on the cluster to perform that build. When the build completes,
56+
the controller updates the build object via the REST API and the user sees that
57+
their build is complete.
58+
59+
The controller pattern means that much of the functionality in {product-title}
60+
is extensible. The way that builds are run and launched can be customized
61+
independently of how images are managed, or how deployments happen. The controllers
62+
perform the "business logic" of the system, taking user actions and
63+
transforming them into reality. By customizing those controllers or replacing
64+
them with your own logic, you can implement different behaviors. From a system
65+
administration perspective, this also means that you can use the API to script common
66+
administrative actions on a repeating schedule. Those scripts are also
67+
controllers that watch for changes and take action. {product-title} makes the
68+
ability to customize the cluster in this way a first-class behavior.
69+
70+
To make this possible, controllers use a reliable stream of changes to the
71+
system to sync their view of the system with what users are doing. This event
72+
stream pushes changes from etcd to the REST API and then to the controllers as
73+
soon as changes occur so changes can efficiently ripple through the system.
74+
However, because failures can occur at any time, the controllers
75+
must also be able to get the latest state of the system at start up and confirm
76+
that everything is in the right state. This resynchronization is important
77+
because it means that even if something goes wrong, you can
78+
restart the affected components, and the system confirms its status before it
79+
continues. Because the controllers can always bring the system into sync, the
80+
system eventually converges to your intent.

modules/cloud_installations.adoc

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44
// * installation/installing-customizations-cloud.adoc
55

66
[id='cloud-installations-{context}']
7-
= {product-title} clusters on clouds
7+
= {product-title} clusters on Installer Provisioned Infrastructure
88

99
[IMPORTANT]
1010
====
1111
In version {product-version}, you can install {product-title} on only Amazon
12-
Web Services (AWS) or your own hosts.
12+
Web Services (AWS).
1313
====
1414

1515
You can install either a standard cluster or a customized cluster. With a
@@ -19,8 +19,46 @@ cluster. These details include ...
1919
With a customized cluster, you can specify more details of the installation,
2020
such as ...
2121

22+
When you install {product-title} cluster with Installer Provisioned Infrastructure (IPI), you download the
23+
installer from link:try.openshift.com. This site manages:
24+
* REST API for accounts
25+
* Registry tokens, which are the pull secrets that you use to obtain the required
26+
components
27+
* Cluster registration, which associates the cluster identity to your Red Hat
28+
account to facilitate the gathering of usage metrics
29+
30+
In {product-title} v4, the installer is a Go binary that performs a
31+
series of file transformations on a set of assets. When you use Installer
32+
Provisioned Infrastructure,
33+
you delegate the infrastructure bootstrapping and provisioning to the installer
34+
instead of doing it yourself. Because you do not use the installer to upgrade or
35+
update your cluster, if you do not highly customize your cluster, you run the
36+
installer only once.
37+
38+
You use three sets of files during installation: an installation configuration
39+
file, Kubernetes manifests, and Ingition configurations for your machine types.
40+
41+
The installation configuration file is transformed into Kubernetes manifests, and
42+
then the manifests are wrapped into Ignition configurations. The installer uses
43+
these Ingition configurations to create the cluster.
44+
45+
The install configuration files are all pruned when you run the installer,
46+
so be sure to back up all configuration files that you want to use again.
47+
2248
[IMPORTANT]
2349
====
2450
You cannot modify the parameters that you set during installation. You have to
2551
...
2652
====
53+
54+
////
55+
There are individual commands to perform the different actions in cluster creation
56+
if you want to try to make customizations, but you can run openshift-install
57+
create cluster to get the default cluster done quick.
58+
59+
$ openshift-install --help
60+
$ openshift-install create install-config
61+
$ openshift-install create manifests
62+
$ openshift-install create ignition-configs
63+
$ openshift-install create cluster
64+
////

modules/installation-options.adoc

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='installation-options-{context}']
6+
= Installation options
7+
8+
In {product-title} version 4.0, you can install only clusters that use
9+
Installer Provisioned Infrastructure in Amazon Web Services (AWS).
10+
These clusters use Red Hat CoreOS
11+
nodes as the operating system. Future versions of {product-title} will support
12+
clusters that use both Installer Provisioned Infrastructure
13+
and User Provisioned Infrastructure on more cloud providers and on bare metal.
14+
With all cluster types, you must use Red Hat CoreOS as the operating system for
15+
control plane nodes.
16+
////
17+
If you want to
18+
use any other cloud or install your cluster on-premise, use the bring your own
19+
infrastructure option to install your cluster on existing Red Hat Enterprise
20+
Linux (RHEL) hosts.
21+
////
22+
23+
Using Installer Provisioned Infrastructure offers full-stack automation to:
24+
25+
* Manage compute
26+
* Manage operating system (RHCOS)
27+
* Manage control plane
28+
* Manage nodes
29+
30+
////
31+
With the bring your own infrastructure option, you have more responsibilities.
32+
You must provide the hosts and update RHEL on them. {product-title} provides:
33+
34+
* Managed control plane
35+
* Ansible to manage kubelet and container runtime
36+
////
37+
38+
Installation and upgrade both use an Operator
39+
that constantly reconciles component versions as if it were any other Kubernetes
40+
controller.

modules/machine-api-overview.adoc

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='machine-api-overview-{context}']
6+
= Machine API overview
7+
8+
For {product-title} v4 clusters, the Machine API performs all node
9+
management actions after the cluster installation finishes. Because of this
10+
system, {product-title} v4 offers an elastic, dynamic provisioning
11+
method on top of public or private cloud infrastructure.
12+
13+
The Machine API is a combination of primary resources that are based on the upstream
14+
link:https://github.com/kubernetes-sigs/cluster-api[Cluster API] project and
15+
custom {product-title} resources.
16+
17+
The three primary resources are:
18+
19+
`Machines`:: A fundamental unit that describes a `Node`. A `machine` has a
20+
class, which describes the types of compute nodes that are offered for different
21+
cloud platforms. For example, a `machine` type for a worker node on Amazon Web
22+
Services (AWS) might define a specific machine type and required metadata.
23+
`MachineClasses`:: A unit that defines a class of `machines` and facilitates
24+
configuration reuse across `machines` of the same class. This unit functions
25+
like a `StorageClass` for PersistentVolumeClaims.
26+
`MachineSets`:: Groups of machines. `MachineSets` are to `machines` as
27+
`ReplicaSets` are to `Pods`. If you need more `machines` or need to scale them down,
28+
you change the *replicas* field on the `MachineSet` to meet your compute need.
29+
30+
31+
The following custom resources add more capabilities to your cluster:
32+
33+
`MachineAutoscaler`:: This resource automatically scales `machines` in
34+
a cloud. You can set the minimum and maximum scaling boundaries for nodes in a
35+
specified `MachineSet`, and the `MachineAutoscaler` maintains that range of nodes.
36+
The `MachineAutoscaler` object takes effect after a `ClusterAutoscaler` object
37+
exists. Both `ClusterAutoscaler` and `MachineAutoscaler` resources are made
38+
available by the `ClusterAutoscalerOperator`.
39+
`MachineHealthChecker`:: This resource detects when a machine is unhealthy,
40+
deletes it, and, on supported platforms, makes a new machine.
41+
`ClusterAutoscaler`:: This resource is based on the upstream
42+
link:https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler[Cluster Autoscaler]
43+
project. In the {product-title} implementation, it is integrated with the
44+
Cluster API by extending the `MachineSet` API.
45+
`ClusterAutoscalerOperator`:: Instead of interacting with the `ClusterAutoscaler`
46+
itself, you use its Operator. The `ClusterAutoscalerOperator` manages
47+
the `ClusterAutoscaler` deployment. With this Operator, you can set cluster-wide
48+
scaling limits for resources such as cores, nodes, memory, and GPU.
49+
and so on. You can set the priority so that the cluster prioritizes pods so that
50+
new nodes are not brought online for less important pods. You can also set the
51+
ScalingPolicy so you can scale up nodes but not scale them down.
52+
53+
54+
In {product-title} version 3.11, you could not roll out a multi-zone architecture easily because the cluster
55+
did not manage machine provisioning. It is easier in 4.0. Each `MachineSet` is scoped
56+
to a single zone, so the installer sends out `MachineSets` across availability zones
57+
on your behalf. And then because your compute is dynamic, and in
58+
the face of a zone failure, you always have a zone for when you need to rebalance
59+
your machines. The autoscaler provides best-effort balancing over the life of a cluster.

modules/node-management.adoc

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='node-management-{context}']
6+
= Node management in {product-title}
7+
8+
{product-title} version 4.0 integrates management of
9+
the container operating system and cluster management. Because the cluster manages
10+
its updates, including updates to Red Hat CoreOS on cluster nodes, {product-title} provides an opinionated
11+
lifecycle management experience that simplifies the orchestration of upgrades.
12+
13+
{product-title} employs three DaemonSets and controllers to simplify node management:
14+
15+
* The `machine-config-controller` coordinates machine upgrades.
16+
* The `machine-config-daemon` DaemonSet is a subset of the Ingition configuration that
17+
applies the specified machine configuration and controls kubelet configuration.
18+
* The `machine-config-server` DaemonSet provides the Ignition config to new hosts.
19+
20+
These tools orchestrate operating system updates and configuration changes to
21+
the hosts by using standard Kuberentes-style constructs. A `machine-config-daemon`
22+
DaemonSet runs on each machine in the cluster and watches for changes in
23+
the machine configuration for it to apply. The machine configuration is a subset
24+
of the Ignition configuration. The `machine-config-daemon` reads the machine configuration to see
25+
if it needs to do an OSTree update, if it should apply a series of systemd
26+
kubelet file changes, configuration changes, or other changes to the
27+
operating system or {product-title} configuration.
28+
29+
The masters also run the `machine-config-controller` process that monitors all of the cluster nodes
30+
and orchestrates their configuration updates. So if you try to apply
31+
an update or configuration change to a node on the cluster, the `machine-config-controller`
32+
directs a node to update. The node sees that it needs to change, drains off its
33+
pods, applies the update, and reboots. This process is key to the success of
34+
managing {product-title} and RHCOS updates together.
35+
36+
The `machine-config-server` provides configurations to nodes as they join the
37+
cluster. It orchestrates configuration to nodes and changes to the operating system
38+
and is used in both cluster installation and node maintenance. The
39+
`machine-config-server` components upgrade the operating system and controls the Ignition
40+
configuration for nodes.
41+
42+
////
43+
The `bootkube` process calls the `machine-config-server` component when the
44+
{product-title} installer bootstraps the initial master node. After installation,
45+
the `machine-config-server` runs in the cluster. It reads the `machine-config`
46+
custom resource definitions (CRDs) and serves the required Ignition configurations
47+
to new nodes when they join the cluster.
48+
////

modules/node-types.adoc

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * architecture/architecture.adoc
4+
5+
[id='node-roles-{context}']
6+
= Node roles in {product-title}
7+
8+
{product-title} assigns hosts different roles. These roles define the function
9+
of the node within the cluster. The cluster contains standard definitions for
10+
standard role types, such as bootstrap, master, and worker.
11+
12+
A node with the bootstrap role
13+
provides the initial configuration to clusters and is used only during initial
14+
configuration.
15+
16+
Nodes with the master role run the cluster
17+
infrastructure and required components. Instead of being grouped into a `MachineSet`,
18+
they are a series of standalone machine API resources. Extra controls apply to
19+
master nodes to prevent you from deleting all master nodes and breaking your
20+
cluster.
21+
22+
Nodes with the worker role drive compute workloads. Each type of worker node is
23+
governed by a specific machine pool that autoscales them.

0 commit comments

Comments
 (0)