Skip to content

Commit 4dca81b

Browse files
authored
Merge pull request #94127 from jeana-redhat/OSDOCS-14523-CAPI-AWS-parity
OSDOCS-14523: Porting AWS MAPI features to CAPI docs
2 parents ea2e9f3 + 70efb3e commit 4dca81b

12 files changed

+437
-9
lines changed

machine_management/applying-autoscaling.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ include::modules/cluster-autoscaler-cr.adoc[leveloffset=+3]
3232
include::modules/cluster-autoscaler-config-priority-expander.adoc[leveloffset=+3]
3333

3434
//Labeling GPU machine sets for the cluster autoscaler
35-
include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leveloffset=+3]
35+
include::modules/machineset-label-gpu-autoscaler.adoc[leveloffset=+3]
3636

3737
:FeatureName: cluster autoscaler
3838
:FeatureResourceName: ClusterAutoscaler

machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,45 @@ include::modules/capi-yaml-machine-template-aws.adoc[leveloffset=+2]
2222
//Sample YAML for a CAPI AWS compute machine set resource
2323
include::modules/capi-yaml-machine-set-aws.adoc[leveloffset=+2]
2424

25-
// [id="cluster-api-supported-features-aws_{context}"]
26-
// == Enabling {aws-full} features with the Cluster API
25+
[id="cluster-api-supported-features-aws_{context}"]
26+
== Enabling {aws-full} features with the Cluster API
2727

28-
// You can enable the following features by updating values in the Cluster API custom resource manifests.
28+
You can enable the following features by updating values in the Cluster API custom resource manifests.
2929

30-
//Not sure what, if anything, we can add here at this time.
30+
////
31+
//Not yet supported, relies on Cluster API CAS support
32+
// Cluster autoscaler GPU labels
33+
include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leveloffset=+2]
34+
35+
[role="_additional-resources"]
36+
.Additional resources
37+
* xref:../../../machine_management/applying-autoscaling.adoc#cluster-autoscaler-cr_applying-autoscaling[Cluster autoscaler resource definition]
38+
////
39+
40+
// Elastic Fabric Adapter instances and placement group options
41+
include::modules/machine-feature-aws-existing-placement-group.adoc[leveloffset=+2]
42+
43+
// Amazon EC2 Instance Metadata Service configuration options
44+
include::modules/machine-feature-aws-imds-options.adoc[leveloffset=+2]
45+
46+
////
47+
//This link is for a note that does not apply to TP clusters, reassess for Cluster API GA
48+
[role="_additional-resources"]
49+
.Additional resources
50+
* xref:../../../machine_configuration/mco-update-boot-images.adoc#mco-update-boot-images[Updated boot images]
51+
////
52+
53+
// Dedicated Instances configuration options
54+
include::modules/machine-feature-aws-dedicated-instances.adoc[leveloffset=+2]
55+
56+
// Non-guaranteed Spot Instances and hourly cost limits
57+
include::modules/machine-feature-agnostic-nonguaranteed-instances.adoc[leveloffset=+2]
58+
59+
// Capacity Reservation configuration options
60+
include::modules/machine-feature-agnostic-capacity-reservation.adoc[leveloffset=+2]
61+
62+
//Adding a GPU node to a machine set (stesmith)
63+
include::modules/machine-feature-aws-add-nvidia-gpu-node.adoc[leveloffset=+2]
64+
65+
// //Deploying the Node Feature Discovery Operator (stesmith)
66+
// include::modules/nvidia-gpu-aws-deploying-the-node-feature-discovery-operator.adoc[leveloffset=+1]

machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-bare-metal.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ include::modules/capi-yaml-machine-template-bare-metal.adoc[leveloffset=+2]
2222
//Sample YAML for a CAPI bare metal compute machine set resource
2323
include::modules/capi-yaml-machine-set-bare-metal.adoc[leveloffset=+2]
2424

25+
////
26+
//Section depends on migration support
2527
[id="cluster-api-supported-features-bare-metal_{context}"]
2628
== Enabling bare metal features with the Cluster API
2729
@@ -33,3 +35,4 @@ include::modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc[leve
3335
[role="_additional-resources"]
3436
.Additional resources
3537
* xref:../../../machine_management/applying-autoscaling.adoc#cluster-autoscaler-cr_applying-autoscaling[Cluster autoscaler resource definition]
38+
////

modules/capi-yaml-machine-template-aws.adoc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,11 @@ metadata:
1919
spec:
2020
template:
2121
spec: # <3>
22-
uncompressedUserData: true
2322
iamInstanceProfile: # ...
2423
instanceType: m5.large
2524
ignition:
2625
storageType: UnencryptedUserData
27-
version: "3.2"
26+
version: "3.4"
2827
ami:
2928
id: # ...
3029
subnet:
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
4+
// There are parallel features in Azure so this module is set up for reuse.
5+
6+
ifeval::["{context}" == "cluster-api-config-options-aws"]
7+
:aws:
8+
endif::[]
9+
10+
:_mod-docs-content-type: CONCEPT
11+
[id="machine-feature-agnostic-capacity-reservation_{context}"]
12+
= Capacity Reservation configuration options
13+
14+
{product-title} version {product-version} and later supports
15+
ifdef::azure[on-demand Capacity Reservation with Capacity Reservation groups on {azure-full} clusters.]
16+
ifdef::aws[Capacity Reservations on {aws-full} clusters, including On-Demand Capacity Reservations and Capacity Blocks for ML.]
17+
18+
You can deploy machines on any available resources that match the parameters of a capacity request that you define.
19+
These parameters specify the
20+
ifdef::azure[VM size,]
21+
ifdef::aws[instance type,]
22+
region, and number of instances that you want to reserve.
23+
If your
24+
ifdef::azure[{azure-short} subscription quota]
25+
ifdef::aws[Capacity Reservation]
26+
can accommodate the capacity request, the deployment succeeds.
27+
28+
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
29+
30+
ifdef::azure[]
31+
[NOTE]
32+
====
33+
You cannot change an existing Capacity Reservation configuration for a machine set.
34+
To use a different Capacity Reservation group, you must replace the machine set and the machines that the previous machine set deployed.
35+
====
36+
endif::azure[]
37+
38+
.Sample Capacity Reservation configuration
39+
[source,yaml]
40+
----
41+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
42+
kind: AWSMachineTemplate
43+
# ...
44+
spec:
45+
template:
46+
spec:
47+
capacityReservationId: <capacity_reservation> # <1>
48+
marketType: <market_type> # <2>
49+
# ...
50+
----
51+
<1> Specify the ID of the
52+
ifdef::azure[Capacity Reservation group]
53+
ifdef::aws[Capacity Block for ML or On-Demand Capacity Reservation]
54+
that you want to deploy machines on.
55+
ifdef::aws[]
56+
<2> Specify the market type to use.
57+
The following values are valid:
58+
`CapacityBlock`:: Use this market type with Capacity Blocks for ML.
59+
`OnDemand`:: Use this market type with On-Demand Capacity Reservations.
60+
`Spot`:: Use this market type with Spot Instances.
61+
This option is not compatible with Capacity Reservations.
62+
endif::aws[]
63+
64+
For more information, including limitations and suggested use cases for this offering, see
65+
ifdef::azure[link:https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview[On-demand Capacity Reservation] in the {azure-full} documentation.]
66+
ifdef::aws[link:https://docs.aws.amazon.com/en_us/AWSEC2/latest/UserGuide/capacity-reservation-overview.html[On-Demand Capacity Reservations and Capacity Blocks for ML] in the {aws-short} documentation.]
67+
68+
ifeval::["{context}" == "cluster-api-config-options-aws"]
69+
:!aws:
70+
endif::[]
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
4+
// There are parallel features in Azure and GCP so this module is set up for reuse.
5+
6+
ifeval::["{context}" == "cluster-api-config-options-aws"]
7+
:aws:
8+
endif::[]
9+
10+
:_mod-docs-content-type: CONCEPT
11+
[id="machine-feature-agnostic-nonguaranteed-instances_{context}"]
12+
ifdef::aws[= Non-guaranteed Spot Instances and hourly cost limits]
13+
14+
ifdef::aws[]
15+
You can deploy machines as non-guaranteed Spot Instances on {aws-first}.
16+
Spot Instances use spare AWS EC2 capacity and are less expensive than On-Demand Instances.
17+
You can use Spot Instances for workloads that can tolerate interruptions, such as batch or stateless, horizontally scalable workloads.
18+
endif::aws[]
19+
20+
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
21+
22+
ifdef::aws[]
23+
[IMPORTANT]
24+
====
25+
AWS EC2 can reclaim the capacity for a Spot Instance at any time.
26+
====
27+
28+
.Sample Spot Instance configuration
29+
[source,yaml]
30+
----
31+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
32+
kind: AWSMachineTemplate
33+
# ...
34+
spec:
35+
template:
36+
spec:
37+
spotMarketOptions: <1>
38+
maxPrice: <price_per_hour> <2>
39+
# ...
40+
----
41+
<1> Specifies the use of Spot Instances.
42+
<2> Optional: Specifies an hourly cost limit in US dollars for the Spot Instance.
43+
For example, setting the `<price_per_hour>` value to `2.50` limits the cost of the Spot Instance to USD 2.50 per hour.
44+
When this value is not set, the maximum price charges up to the On-Demand Instance price.
45+
+
46+
[WARNING]
47+
====
48+
Setting a specific `maxPrice: <price_per_hour>` value might increase the frequency of interruptions compared to using the default On-Demand Instance price.
49+
It is strongly recommended to use the default On-Demand Instance price and to not set the maximum price for Spot Instances.
50+
====
51+
52+
Interruptions can occur when using Spot Instances for the following reasons:
53+
54+
* The instance price exceeds your maximum price
55+
* The demand for Spot Instances increases
56+
* The supply of Spot Instances decreases
57+
58+
AWS gives a two-minute warning to the user when an interruption occurs.
59+
{product-title} begins to remove the workloads from the affected instances when AWS issues the termination warning.
60+
61+
When AWS terminates an instance, a termination handler running on the Spot Instance node deletes the machine resource.
62+
To satisfy the compute machine set `replicas` quantity, the compute machine set creates a machine that requests a Spot Instance.
63+
endif::aws[]
64+
65+
ifeval::["{context}" == "cluster-api-config-options-aws"]
66+
:!aws:
67+
endif::[]

modules/machine-feature-agnostic-options-label-gpu-autoscaler.adoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
// Module included in the following assemblies:
22
//
3-
// * machine_management/applying-autoscaling.adoc
4-
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
53

64
:_mod-docs-content-type: CONCEPT
75
[id="machine-feature-agnostic-options-label-gpu-autoscaler_{context}"]
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="machine-feature-aws-add-nvidia-gpu-node_{context}"]
7+
= GPU-enabled machine options
8+
9+
You can deploy GPU-enabled compute machines on {aws-first}.
10+
The following sample configuration uses an link:https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing[{aws-short} G4dn instance type], which includes an NVIDIA Tesla T4 Tensor Core GPU, as an example.
11+
12+
For more information about supported instance types, see the following pages in the NVIDIA documentation:
13+
14+
* link:https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html[NVIDIA GPU Operator Community support matrix]
15+
16+
* link:https://docs.nvidia.com/ai-enterprise/latest/product-support-matrix/index.html[NVIDIA AI Enterprise support matrix]
17+
18+
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template-and-machine-set]
19+
20+
// Cluster API machine template spec
21+
.Sample GPU-enabled machine template configuration
22+
[source,yaml]
23+
----
24+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
25+
kind: AWSMachineTemplate
26+
# ...
27+
spec:
28+
template:
29+
spec:
30+
instanceType: g4dn.xlarge <1>
31+
# ...
32+
----
33+
<1> Specifies a G4dn instance type.
34+
35+
// Cluster API machine set spec
36+
.Sample GPU-enabled machine set configuration
37+
[source,yaml]
38+
----
39+
apiVersion: cluster.x-k8s.io/v1beta1
40+
kind: MachineSet
41+
metadata:
42+
name: <cluster_name>-gpu-<region> <1>
43+
namespace: openshift-cluster-api
44+
labels:
45+
cluster.x-k8s.io/cluster-name: <cluster_name>
46+
spec:
47+
clusterName: <cluster_name>
48+
replicas: 1
49+
selector:
50+
matchLabels:
51+
test: example
52+
cluster.x-k8s.io/cluster-name: <cluster_name>
53+
cluster.x-k8s.io/set-name: <cluster_name>-gpu-<region> <2>
54+
template:
55+
metadata:
56+
labels:
57+
test: example
58+
cluster.x-k8s.io/cluster-name: <cluster_name>
59+
cluster.x-k8s.io/set-name: <cluster_name>-gpu-<region> <3>
60+
node-role.kubernetes.io/<role>: ""
61+
# ...
62+
----
63+
<1> Specifies a name that includes the `gpu` role. The name includes the cluster ID as a prefix and the region as a suffix.
64+
<2> Specifies a selector label that matches the machine set name.
65+
<3> Specifies a template label that matches the machine set name.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="machine-feature-aws-dedicated-instances_{context}"]
7+
= Dedicated Instance configuration options
8+
9+
You can deploy machines that are backed by Dedicated Instances on {aws-first} clusters.
10+
11+
Dedicated Instances run in a virtual private cloud (VPC) on hardware that is dedicated to a single customer.
12+
These Amazon EC2 instances are physically isolated at the host hardware level.
13+
The isolation of Dedicated Instances occurs even if the instances belong to different AWS accounts that are linked to a single payer account.
14+
However, other instances that are not dedicated can share hardware with Dedicated Instances if they belong to the same AWS account.
15+
16+
{product-title} supports instances with public or dedicated tenancy.
17+
18+
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
19+
20+
.Sample Dedicated Instances configuration
21+
[source,yaml]
22+
----
23+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
24+
kind: AWSMachineTemplate
25+
# ...
26+
spec:
27+
template:
28+
spec:
29+
tenancy: dedicated <1>
30+
# ...
31+
----
32+
<1> Specifies using instances with dedicated tenancy that run on single-tenant hardware.
33+
If you do not specify this value, instances with public tenancy that run on shared hardware are used by default.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * machine_management/cluster_api_machine_management/cluster_api_provider_configurations/cluster-api-config-options-aws.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="machine-feature-aws-existing-placement-group_{context}"]
7+
= Elastic Fabric Adapter instances and placement group options
8+
9+
You can deploy compute machines on link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html[Elastic Fabric Adapter] (EFA) instances within an existing AWS placement group.
10+
11+
EFA instances do not require placement groups, and you can use placement groups for purposes other than configuring an EFA.
12+
The following example uses an EFA and placement group together to demonstrate a configuration that can improve network performance for machines within the specified placement group.
13+
14+
include::snippets/apply-machine-configuration-method.adoc[tag=method-machine-template]
15+
16+
.Sample EFA instance and placement group configuration
17+
[source,yaml]
18+
----
19+
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
20+
kind: AWSMachineTemplate
21+
# ...
22+
spec:
23+
template:
24+
spec:
25+
instanceType: <supported_instance_type> # <1>
26+
networkInterfaceType: efa # <2>
27+
placementGroupName: <placement_group> # <3>
28+
placementGroupPartition: <placement_group_partition_number> # <4>
29+
# ...
30+
----
31+
<1> Specifies an instance type that link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types[supports EFAs].
32+
<2> Specifies the `efa` network interface type.
33+
<3> Specifies the name of the existing AWS placement group to deploy machines in.
34+
<4> Optional: Specifies the partition number of the existing AWS placement group where you want your machines deployed.
35+
36+
[NOTE]
37+
====
38+
Ensure that the link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#limitations-placement-groups[rules and limitations] for the type of placement group that you create are compatible with your intended use case.
39+
====
40+
41+
////
42+
The MAPI version of this has additional parameters in the providerSpec:
43+
44+
----
45+
placement:
46+
availabilityZone: <zone> # <3>
47+
region: <region> # <4>
48+
----
49+
<3> Specifies the zone, for example, `us-east-1a`.
50+
<4> Specifies the region, for example, `us-east-1`.
51+
52+
Do we need to say anything specific about this, or is this just redundant with the failure domain?
53+
54+
Note:
55+
CAPI has networkInterfaceType: efa
56+
MAPI has networkInterfaceType: EFA
57+
Capitalization matters!
58+
////

0 commit comments

Comments
 (0)