Skip to content

LoadBalancer Service Provisions in Untagged VPC Subnets #1248

@mtulio

Description

@mtulio

What happened:

A breaking change in the subnet auto-discovery of Service load balancer has been introduced by the PR: kubernetes/kubernetes#97431.

Prior to this change, the AWS Cloud Controller Manager (CCM) respected user intention by only auto-discovering subnets explicitly tagged for the cluster (kubernetes.io/cluster/<id>:owned|shared) when provisioning load balancers.

Following the mentioned change, the controller now automatically discovers all subnets in the VPC, including ones without any cluster tag. This generates an issue for users who want to install multiple clusters in the same VPC that has many subnets. The change breaks the established behavior and can lead to unintended subnet usage.

As per the PR description, this was an intentional change:

Prior to this change, the AWS cloudprovider would auto discover subnets only with the cluster tags when provisioning NLB/CLB for service resources. We want to modify this behavior to include the subnets without any cluster tags, in addition to the ones previously matched by auto-discovery. After the changes, the auto-discovery will consider all subnets except the ones tagged explicitly for other cluster. If there are multiple subnets per AZ, the ties are broken in the following order

This breaking change has impacted specific scenarios, such as:

  • Untagged subnets are now being used for LoadBalancer services, violating user intent and potentially causing network issues.
  • Kubernetes distributions, such as OpenShift, that install a cluster in an existing VPC with a subset of subnets are impacted, particularly after a cluster upgrade.

What you expected to happen:

The AWS Cloud Controller Manager could only auto-discover and use subnets for Service LoadBalancer provisioning that are explicitly tagged with kubernetes.io/cluster/<id>:owned|shared. This is the expected behavior to ensure the controller respects the user's defined cluster boundaries and prevents the use of unintended subnets.

How to reproduce it (as minimally and precisely as possible):

  1. Start with an existing AWS VPC that has at least two public subnets in distinct zones, or create a subnet in distinct zone from existing one (without cluster tags).
  2. Tag only one of these subnets with kubernetes.io/cluster/<cluster_name>:owned.
  3. Deploy an OpenShift cluster into the existing VPC, ensuring it only uses the single, tagged subnet for its installation.
  4. Create a new service of type LoadBalancer using a simple application deployment. For example:
apiVersion: v1
kind: Service
metadata:
  name: my-lb-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: my-app
  1. Observe the load balancer's creation. The CCM will provision the load balancer in one of the untagged subnets, instead of the intended, tagged subnet.

Anything else we need to know?:

This issue is particularly impactful for multi-cluster environments or when deploying a single-subnet cluster within a larger, existing VPC. The lack of respect for the kubernetes.io/cluster/<id>:owned tag breaks a core assumption of how clusters are deployed in shared VPCs.

Please see the related OpenShift bug for additional context and impact reports:

Other relevant discussions:

Environment:

  • Kubernetes version (use kubectl version): 1.20+ (version that kubernetes/pull/97431 was backported)
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: OpenShift
  • Others:

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions