Merge pull request #94607 from jeana-redhat/4.19-bug-text-cloud-compute

jeana-redhat · web-flow · commit 5d16bc59c305 · 2025-06-11T16:38:00.000-04:00
Batches 4.19 Cloud Compute bugs
diff --git a/release_notes/ocp-4-19-release-notes.adoc b/release_notes/ocp-4-19-release-notes.adoc
@@ -22,7 +22,7 @@ You must use {op-system} machines for the control plane and for the compute mach
 //Removed the note per https://issues.redhat.com/browse/GRPA-3517
 //Removed paragraph about the RHEL package because mode workers are removed from 4.19, per Scott Dodson
 //Even-numbered release lifecycle verbiage (Comment in for even-numbered releases)
-//// 
+////
 Starting from {product-title} 4.14, the Extended Update Support (EUS) phase for even-numbered releases increases the total available lifecycle to 24 months on all supported architectures, including `x86_64`, 64-bit ARM (`aarch64`), {ibm-power-name} (`ppc64le`), and {ibm-z-name} (`s390x`) architectures. Beyond this, Red{nbsp}Hat also offers a 12-month additional EUS add-on, denoted as _Additional EUS Term 2_, that extends the total available lifecycle from 24 months to 36 months. The Additional EUS Term 2 is available on all architecture variants of {product-title}. For more information about support for all versions, see the link:https://access.redhat.com/support/policy/updates/openshift[Red Hat {product-title} Life Cycle Policy].
 ////
 
@@ -1219,6 +1219,132 @@ For more information about the unsupported, community-maintained, version of the
 [id="ocp-release-note-cloud-compute-bug-fixes_{context}"]
 ==== Cloud Compute
 
+* When upgrading {gcp-short} clusters that use a boot disk that is not compatible with UEFI, you cannot enable Shielded VM support.
+Previously, this prevented the creation of new compute machines.
+With this release, disks with known UEFI incompatiblity have Shielded VM support disabled.
+This primarily affects customers upgrading from {product-title} version 4.12 to 4.13 using the {gcp-short} marketplace images.
+(link:https://issues.redhat.com/browse/OCPBUGS-17079[OCPBUGS-17079])
+
+* Previously, VMs in a cluster that ran on {azure-short} failed because the attached network interface controller (NIC) was in a `ProvisioningFailed` state.
+With this release, the Machine API controller checks the provisioning status of a NIC and refreshes the VMs on a regular basis to prevent this issue.
+(link:https://issues.redhat.com/browse/OCPBUGS-31515[OCPBUGS-31515])
+
+* Previously, in larger clusters that had other subsystems using certificate signing requests (CSRs), the CSR approver counted unrelated, unapproved CSRs towards its total and prevented further approvals.
+With this release, the CSR approver uses a `signerName` property as a filter and only includes CSRs that it can approve.
+As a result, the CSR approver only prevents new approvals when there are a large number of unapproved CSRs for the relevant `signerName` values.
+(link:https://issues.redhat.com/browse/OCPBUGS-36404[OCPBUGS-36404])
+
+* Previously, the Machine API controller read only the zone number to populate machine zone information.
+For machines in {azure-short} regions that only support availability sets, the set number represents the zone, so the Machine API controller did not populate their zone information.
+With this release, the Machine API controller references the {azure-short} fault domain property.
+This property works for availability sets and availability zones, so the controller correctly reads the fault domain in each case and machines always report a zone.
+(link:https://issues.redhat.com/browse/OCPBUGS-38570[OCPBUGS-38570])
+
+* Previously, increased granularity in {gcp-short} zone API error messages caused the machine controller to mistakenly mark some machines with invalid configurations as valid with a temporary cloud error.
+This behavior prevented invalid machines from transitioning to a failed state.
+With this release, the machine controller handles the more granular error messages correctly so that machines with an invalid zone or project ID correctly move to a failed state.
+(link:https://issues.redhat.com/browse/OCPBUGS-43531[OCPBUGS-43531])
+
+* Previously, some permissions required for linked actions were missing.
+Linked actions create the subresources necessary for other {azure-short} resources that the cloud controller manager and {product-title} require.
+With this release, the cloud controller manager for {azure-short} has the following permissions for linked actions:
++
+--
+** `Microsoft.Network/applicationGateways/backendAddressPools/join/action`
+** `Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action`
+** `Microsoft.Network/applicationSecurityGroups/joinNetworkSecurityRule/action`
+** `Microsoft.Network/ddosProtectionPlans/join/action`
+** `Microsoft.Network/gatewayLoadBalancerAliases/join/action`
+** `Microsoft.Network/loadBalancers/backendAddressPools/join/action`
+** `Microsoft.Network/loadBalancers/frontendIPConfigurations/join/action`
+** `Microsoft.Network/loadBalancers/inboundNatRules/join/action`
+** `Microsoft.Network/networkInterfaces/join/action`
+** `Microsoft.Network/networkSecurityGroups/join/action`
+** `Microsoft.Network/publicIPAddresses/join/action`
+** `Microsoft.Network/publicIPPrefixes/join/action`
+** `Microsoft.Network/virtualNetworks/subnets/join/action`
+--
++
+(link:https://issues.redhat.com/browse/OCPBUGS-44126[OCPBUGS-44126])
+
+* Previously, some permissions required for linked actions were missing.
+Linked actions create the subresources necessary for other {azure-short} resources that the Machine API and {product-title} require.
+With this release, the Machine API provider for {azure-short} has the following permissions for linked actions:
++
+--
+** `Microsoft.Compute/disks/beginGetAccess/action`
+** `Microsoft.KeyVault/vaults/deploy/action`
+** `Microsoft.ManagedIdentity/userAssignedIdentities/assign/action`
+** `Microsoft.Network/applicationGateways/backendAddressPools/join/action`
+** `Microsoft.Network/applicationSecurityGroups/joinIpConfiguration/action`
+** `Microsoft.Network/applicationSecurityGroups/joinNetworkSecurityRule/action`
+** `Microsoft.Network/ddosProtectionPlans/join/action`
+** `Microsoft.Network/gatewayLoadBalancerAliases/join/action`
+** `Microsoft.Network/loadBalancers/backendAddressPools/join/action`
+** `Microsoft.Network/loadBalancers/frontendIPConfigurations/join/action`
+** `Microsoft.Network/loadBalancers/inboundNatPools/join/action`
+** `Microsoft.Network/loadBalancers/inboundNatRules/join/action`
+** `Microsoft.Network/networkInterfaces/join/action`
+** `Microsoft.Network/networkSecurityGroups/join/action`
+** `Microsoft.Network/publicIPAddresses/join/action`
+** `Microsoft.Network/publicIPPrefixes/join/action`
+** `Microsoft.Network/virtualNetworks/subnets/join/action`
+--
++
+(link:https://issues.redhat.com/browse/OCPBUGS-44130[OCPBUGS-44130])
+
+* Previously, installing an {aws-short} cluster failed in certain environments on existing subnets when the `publicIp` parameter in the compute machine set CR was set to `false`.
+With this release, a fix ensures that a configuration value set for `publicIp` no longer causes issues when the installation program provisions machines for your {aws-short} cluster in certain environments.
+(link:https://issues.redhat.com/browse/OCPBUGS-44373[OCPBUGS-44373])
+
+* Previously, {gcp-short} clusters that used non-UEFI disks failed to load.
+This release adds a check to ensure that disks are UEFI-compatible before enabling features that require UEFI, such as secure boot.
+This change adds `compute.images.get` and `compute.images.getFromFamily` permissions requirements.
+As a result, you can use non-UEFI disks if you do nto need these features.
+(link:https://issues.redhat.com/browse/OCPBUGS-44671[OCPBUGS-44671])
+
+* Previously, when the {aws-short} `DHCPOptionSet` parameter was configured to use a custom domain name that contains a trailing period (`.`), {product-title} installation failed.
+With this release, the logic that extracts the hostname of EC2 instances and turns them into kubelet node names trims trailing periods so that the resulting Kubernetes object name is valid.
+Trailing periods in this parameter no longer cause installation to fail. (link:https://issues.redhat.com/browse/OCPBUGS-45306[OCPBUGS-45306])
+
+* Previously, the number of {azure-short} availability set fault domains used a fixed value of `2`.
+This setting works in most {azure-short} regions because fault domain counts are typically at least 2.
+However, this setting failed in the `centraluseuap` and `eastusstg` regions.
+With this release, the number of availability set fault domains in a region is set dynamically.
+(link:https://issues.redhat.com/browse/OCPBUGS-45663[OCPBUGS-45663])
+
+* Previously, the {azure-short} cloud controller manager panicked when there was a temporary API server disconnection.
+With this release, the {azure-short} cloud controller manager correctly recovers from temporary disconnection.
+(link:https://issues.redhat.com/browse/OCPBUGS-45859[OCPBUGS-45859])
+
+* Previously, some services became stuck in a pending state due to incorrect or missing annotations.
+With this release, validation added to the {azure-short} `service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout` and {gcp-short} `cloud.google.com/network-tier` annotations resolves the issue.
+(link:https://issues.redhat.com/browse/OCPBUGS-48481[OCPBUGS-48481])
+
+* Previously, the method used to fetch the provider ID from {aws-short} could fail to provide this value to the kubelet when needed.
+As a result, sometimes machines could get stuck in different states and fail to complete initialization.
+With this release, the provider ID is consistently set when the kubelet starts up.
+(link:https://issues.redhat.com/browse/OCPBUGS-50905[OCPBUGS-50905])
+
+* Previously, an incorrect endpoint in the {azure-short} cloud controller manager caused installations on {azure-full} Government Cloud to fail.
+The issue is resolved in this release.
+(link:https://issues.redhat.com/browse/OCPBUGS-50969[OCPBUGS-50969])
+
+* Previously, the Machine API sometimes detected an unhealthy control plane node during cluster creation on {ibm-cloud-title} and attempted to replace the node.
+This effectively destroyed the cluster.
+With this release, the Machine API only attempts to replace unhealthy compute nodes during cluster creation and does not attempt to replace unhealthy control plane nodes.
+(link:https://issues.redhat.com/browse/OCPBUGS-51864[OCPBUGS-51864])
+
+* Previously, {azure-short} spot machines that were evicted before their node became ready could get stuck in the `provisioned` state.
+With this release, {azure-short} spot instances now use a delete-eviction policy.
+This policy ensures that the machines correctly move to the `failed` state upon preemption.
+(link:https://issues.redhat.com/browse/OCPBUGS-54617[OCPBUGS-54617])
+
+* Previously, a bug fix altered the availability set configuration by changing the fault domain count to use the maximum available value instead of a fixed value of `2`.
+This inadvertently caused scaling issues for compute machine sets created before the bug fix, as the controller attempted to change immutable availability sets.
+With this release, availability sets are no longer modified after creation, allowing affected compute machine sets to scale properly.
+(link:https://issues.redhat.com/browse/OCPBUGS-56653[OCPBUGS-56653])
+
 [discrete]
 [id="ocp-release-note-cloud-cred-operator-bug-fixes_{context}"]
 ==== Cloud Credential Operator
@@ -1247,13 +1373,13 @@ For more information about the unsupported, community-maintained, version of the
 [id="ocp-release-note-image-registry-bug-fixes_{context}"]
 ==== Registry
 
-* Previously, image importing from blocked registries would fail if those registries were configured with `NeverContactSource`, even when mirror registries were set up. With this update, image importing is no longer blocked when a registry has mirrors configured. This ensures that image imports succeed even if the original source was set to `NeverContactSource` in the `ImageDigestMirrorSet` or `ImageTagMirrorSet` resources. (link:https://issues.redhat.com/browse/OCPBUGS-44432[*OCPBUGS-44432*])
+* Previously, image importing from blocked registries would fail if those registries were configured with `NeverContactSource`, even when mirror registries were set up. With this update, image importing is no longer blocked when a registry has mirrors configured. This ensures that image imports succeed even if the original source was set to `NeverContactSource` in the `ImageDigestMirrorSet` or `ImageTagMirrorSet` resources. (link:https://issues.redhat.com/browse/OCPBUGS-44432[OCPBUGS-44432])
 
 [discrete]
 [id="ocp-release-note-installer-bug-fixes_{context}"]
 ==== Installer
 
-* Previously, if you attempted to install an {aws-first} cluster with minimum privileges and you did not specify an instance type in the `install-config.yaml` file, installation of the cluster failed. This issue happened because the installation program could not find supported instance types that the cluster uses in availability zones. For example, the `m6i.xlarge` default instance type was unavailable in `ap-southeast-4` and `eu-south-2` availability zones. With this release, the `openshift-install` program now requires the `ec2:DescribeInstanceTypeOfferings` {aws-short} permission to prevent the installation of the cluster from failing in situations where `m6i.xlarge` or another supported instance type is unavailable in a supported availability zone. (link:https://issues.redhat.com/browse/OCPBUGS-46596[*OCPBUGS-46596*])
+* Previously, if you attempted to install an {aws-first} cluster with minimum privileges and you did not specify an instance type in the `install-config.yaml` file, installation of the cluster failed. This issue happened because the installation program could not find supported instance types that the cluster could use in supported availability zones. For example, the `m6i.xlarge` default instance type was unavailable in `ap-southeast-4` and `eu-south-2` availability zones. With this release, the `openshift-install` program now requires the `ec2:DescribeInstanceTypeOfferings` {aws-short} permission to prevent the installation of the cluster from failing in situations where `m6i.xlarge` or another supported instance type is unavailable in a supported availability zone. (link:https://issues.redhat.com/browse/OCPBUGS-46596[OCPBUGS-46596])
 
 [discrete]
 [id="ocp-release-note-insights-operator-bug-fixes_{context}"]