Skip to content

Commit 07edfba

Browse files
Merge pull request #93291 from AedinC/OSDOCS-14654
OSDOCS-14654:Added cluster installation errors to ROSA Classic Troubleshooting guide
2 parents 40c9da1 + badf12f commit 07edfba

18 files changed

+570
-3
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsapiratelimitexceeded-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSAPIRateLimitExceeded error
7+
8+
If a cluster creation action fails, you might receive the following error messages.
9+
10+
.Example install logs output
11+
[source,terminal]
12+
----
13+
level=error\nlevel=error msg=Error: error waiting for Route53 Hosted Zone .* creation: timeout while waiting for state to become 'INSYNC' (last state: 'PENDING', timeout: 15m0s)
14+
----
15+
16+
.Example {cluster-manager} output
17+
[source,terminal]
18+
----
19+
Provisioning Error Code: OCM3008
20+
Provisioning Error Message: AWS API rate limit exceeded. Please try again.
21+
----
22+
23+
This error indicates that the AWS API rate limit has been exceeded while waiting for the Route 53 hosted zone.
24+
25+
.Procedure
26+
27+
* Reattempt the installation.
28+
29+
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsec2quotaexceeded-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSEC2QuotaExceeded error
7+
8+
If a cluster creation action fails, you might receive the following error message.
9+
10+
.Example output
11+
[source,terminal]
12+
----
13+
Provisioning Error Code: OCM3042
14+
Provisioning Error Message: AWS E2C quota limit exceeded. Clean unused load balancers or increase quota and try again.
15+
----
16+
17+
This error indicates that you have reached the EC2 quota limit for the region mentioned in the error log.
18+
19+
.Procedure
20+
21+
Request a quota increase from AWS or delete unused EC2 instances.
22+
23+
* Request a quota increase from AWS.
24+
.. Sign in to the link:https://aws.amazon.com/console/[AWS Management Console].
25+
.. Click your user name and select **Service Quotas**.
26+
.. Under **Manage quotas**, select an AWS service to view available quotas.
27+
.. If the quota is adjustable, you can choose the button or the name, and then choose **Request quota increase**.
28+
29+
* Delete unused EC2 instances using the console.
30+
.. Before you delete an EC2 instance, verify your data by checking that your Amazon EBS volumes will still exist after you delete the unused EC2 instances.
31+
.. Ensure you have copied any data that you need from your instance store volumes to persistent storage, such as Amazon EBS or Amazon S3.
32+
.. If you have a CNAME record for your domain that points to your load balancer, point it to a new location and wait for the DNS change to take effect before deleting your load balancer.
33+
.. Open the link:https://console.aws.amazon.com/ec2/[Amazon EC2 console].
34+
.. On the navigation pane, choose **Instances**.
35+
.. Select the instance, and choose **Terminate instance**.
36+
37+
38+
39+
40+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsinsufficientcapacity-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSInsufficientCapacity error
7+
8+
If a cluster creation action fails, you might receive the following error message.
9+
10+
.Example output
11+
[source,terminal]
12+
----
13+
Provisioning Error Code: OCM3052
14+
Provisioning Error Message: AWSInsufficientCapacity.
15+
----
16+
17+
This error indicates that AWS has run out of capacity for a particular availability zone that you have requested.
18+
19+
.Procedure
20+
21+
* Try reinstalling or select a different AWS region or different availability zones.
22+
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsinsufficientpermission-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSInsufficientPermissions error
7+
8+
If a cluster creation action fails, you might receive the following error message.
9+
10+
.Example {cluster-manager} output
11+
[source,terminal]
12+
----
13+
Provisioning Error Code: OCM3033
14+
Provisioning Error Message: Current credentials insufficient for performing cluster installation.
15+
----
16+
17+
This error indicates that the cluster installation is blocked due to missing or insufficient privileges on the AWS account used to provision the cluster.
18+
19+
.Procedure
20+
21+
Ensure that the prerequisites are met by reviewing _Detailed requirements for deploying ROSA (classic architecture) using STS_ or _Deploying ROSA without AWS STS_ in _Additional resources_ depending on your choice of credential mode for installing clusters.
22+
23+
include::snippets/rosa-sts.adoc[]
24+
. If needed, you can re-create the permissions and policies by using the `-f` flag:
25+
+
26+
.Example output
27+
[source,terminal]
28+
----
29+
$ rosa create ocm-role -f
30+
$ rosa create user-role -f
31+
$ rosa create account-roles -f
32+
$ rosa create operator-roles -c ${CLUSTER} -f
33+
----
34+
. Validate all the prerequisites and attempt cluster reinstallation.
35+
36+
37+
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsnatgatewaylimitexceeded-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSNATGatewayLimitExceeded error
7+
8+
If a cluster creation action fails, you might receive the following error messages.
9+
10+
.Example install logs output
11+
[source,terminal]
12+
----
13+
Failed to create cluster: Error creating NAT Gateway: NatGatewayLimitExceeded: Performing this operation would exceed the limit of 5 NAT gateways.
14+
----
15+
16+
.Example {cluster-manager} output
17+
[source,terminal]
18+
----
19+
Provisioning Error Code: OCM3019
20+
Provisioning Error Message: NAT gateway limit exceeded. Clean unused NAT gateways or increase quota and try again.
21+
----
22+
23+
This error indicates that you have reached the quota for the number of NAT gateways for that availability zone.
24+
25+
.Procedure
26+
27+
. To fix this issue, try one of the following methods:
28+
29+
* Request an increase in the **NAT gateways per Availability Zone quota** page by using the **Service Quotas** console (AWS).
30+
31+
* Check the status of your NAT gateway. A status of `Pending`, `Available`, or `Deleting` counts against your quota. If you have recently deleted a NAT gateway, wait a few minutes for the status to go from `Deleting` to `Deleted`. Then try creating a new NAT gateway.
32+
33+
* If you do not need your NAT gateway in a specific availability zone, try creating a NAT gateway in an availability zone where you have not reached your quota.
34+
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awssubnetnotexist-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSSubnetDoesNotExist error
7+
8+
If a cluster creation action fails, you can receive the following error messages.
9+
10+
.Example install logs output
11+
[source,terminal]
12+
----
13+
The subnet ID 'subnet-<somesubnetID>' does not exist.
14+
----
15+
16+
.Example {cluster-manager} output
17+
[source,terminal]
18+
----
19+
Provisioning Error Code: OCM3032
20+
Provisioning Error Message: You have specified an invalid subnet. Verify your subnet configuration is correct and try again.
21+
----
22+
23+
This error indicates that the cluster installation is blocked by an invalid subnet selection error.
24+
25+
.Procedure
26+
27+
* Check your subnets provided in the `platform.aws.subnets` parameter during installation. The subnets must be a part of the same machine Network CIDR ranges that you specify.
28+
** For a standard cluster, specify a public and a private subnet for each availability zone.
29+
** For a private cluster, specify a private subnet for each availability zone.
30+
31+
For more information about AWS VPC and subnet requirements and optional parameters, see the _VPC_ section in the _AWS prerequisites for ROSA_ guide.
32+
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-awsvpclimit-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an AWSVPCLimitExceeded error
7+
8+
If a cluster creation action fails, you might receive the following error message.
9+
10+
.Example {cluster-manager} output
11+
[source,terminal]
12+
----
13+
Provisioning Error Code: OCM3013
14+
Provisioning Error Message: VPC limit exceeded. Clean unused VPCs or increase quota and try again.
15+
----
16+
17+
This error indicates that you have reached the quota for the number of VPCs.
18+
19+
.Procedure
20+
21+
Request a quota increase from AWS or delete unused VPCs.
22+
23+
* Request a quota increase from AWS.
24+
.. Sign in to the link:https://aws.amazon.com/console/[AWS Management Console].
25+
.. Click your user name and select **Service Quotas**.
26+
.. Under **Manage quotas**, select a service to view available quotas.
27+
.. If the quota is adjustable, you can choose the button or the name, and then choose **Request increase**.
28+
.. For **Increase quota value**, enter the new value. The new value must be greater than the current value.
29+
.. Choose **Request**.
30+
31+
* Clean unused VPCs. Before you can delete a VPC, you must first terminate or delete any resources that created a requester-managed network interface in the VPC. For example, you must terminate your EC2 instances and delete your load balancers, NAT gateways, transit gateways, and interface VPC endpoints before deleting a VPC.
32+
.. Sign in to the link:https://console.aws.amazon.com/ec2/[AWS EC2 console].
33+
.. Terminate all instances in the VPC. For more information, see link:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html[Terminate Amazon EC2 instances].
34+
.. Open the link:https://console.aws.amazon.com/vpc[Amazon VPC console].
35+
.. In the navigation pane, choose **Your VPCs**.
36+
.. Select the VPC to delete and choose **Actions, Delete VPC**.
37+
.. If you have a Site-to-Site VPN connection, select the option to delete it; otherwise, leave it unselected. Choose **Delete VPC**.
38+
39+
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-deleteiamrole-deployment_{context}"]
6+
= Troubleshooting cluster creation with a DeletingIAMRole error
7+
8+
If a cluster creation action fails, you might receive the following error message.
9+
10+
.Example output
11+
[source,terminal]
12+
----
13+
OCM3031: Error deleting IAM Role (role-name): DeleteConflict: Cannot delete entity, must detach all policies first.\nlevel=error msg=\tstatus code: 409
14+
----
15+
The cluster's installation was blocked as the cluster installer was not able to delete the roles it used during the installation.
16+
17+
.Procedure
18+
To unblock the cluster installation, ensure that no policies are added to new roles by default.
19+
20+
* Run the following command to list all managed policies that are attached to the specified role:
21+
22+
+
23+
[source,terminal]
24+
----
25+
$ aws iam list-attached-role-policies --role-name <role-name>
26+
----
27+
+
28+
.Example output
29+
[source,terminal]
30+
----
31+
{
32+
"AttachedPolicies": [
33+
{
34+
"PolicyName": "SecurityAudit",
35+
"PolicyArn": "arn:aws:iam::aws:policy/SecurityAudit"
36+
}
37+
],
38+
"IsTruncated": false
39+
}
40+
----
41+
42+
+
43+
If there are no policies attached to the specified role (or none that match the specified path prefix), the command returns an empty list.
44+
45+
For more information about the list-attached-role-policies command, see link:https://docs.aws.amazon.com/cli/latest/reference/iam/list-attached-role-policies.html[list-attached-role-policies] in the official AWS documentation.

modules/rosa-troubleshooting-general-deployment.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// * support/rosa-troubleshooting-deployments.adoc
44
:_mod-docs-content-type: PROCEDURE
55
[id="rosa-troubleshooting-general-deployment-failure_{context}"]
6-
= Obtaining information on a failed cluster
6+
= Obtaining information about a failed cluster
77

88
If a cluster deployment fails, the cluster is put into an "error" state.
99

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * support/rosa-troubleshooting-deployments.adoc
4+
:_mod-docs-content-type: PROCEDURE
5+
[id="rosa-troubleshooting-invalidinstallconfigsubnet-failure-deployment_{context}"]
6+
= Troubleshooting cluster creation with an InvalidInstallConfigSubnet error
7+
8+
If a cluster creation action fails, you might receive the following error messages.
9+
10+
.Example install logs output
11+
[source,terminal]
12+
----
13+
platform.aws.subnets[1]: Invalid value: "subnet-0babad72exxxxxxxx": subnet's CIDR range start 10.69.1x.3x is outside of the specified machine networks
14+
----
15+
16+
.Example {cluster-manager} output
17+
[source,terminal]
18+
----
19+
Provisioning Error Code: OCM3020
20+
Provisioning Error Message: Subnet CIDR ranges are outside of specified machine CIDR.
21+
----
22+
23+
These errors indicate that a subnet's CIDR range start is outside of the specified machine networks.
24+
25+
.Procedure
26+
27+
. Check your subnet configuration.
28+
. Edit your machine CIDR range to include all subnet CIDR ranges.
29+
Generally, your machine CIDR should match your VPC CIDR.
30+
31+
For more information about CIDR ranges, see _CIDR range definitions_ in the _Additional resources_ section .
32+

0 commit comments

Comments
 (0)