-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happened?
Hi,
I've been facing with issues regarding the way the variables like SE_DRAIN_AFTER_SESSION_COUNT are handled in selenium-grid source code versus the way we manage the versions upgrades in our Kuberenets clusters.
The problem is with automation because each time we push a new version of "selenium-grid" to our Kubernetes cluster, the corresponding Helm Releases complain about "patch" operation as they cannot match the environment variable (in our case) "SE_DRAIN_AFTER_SESSION_COUNT".
It looks like this parameter is set by default to the "0" value, and we want to override in with the value "30" and this is how we do it in the code:

The problem is with this setting:
- name: SE_DRAIN_AFTER_SESSION_COUNT value: "30"
Although it should work correctly, it doesn't. The reason for the issue is that the Helm Release reconciliation when a new version is detected, it tries to patch the existing resources but it cannot match that env variable because it has two different definitions of it (with the values "0" and "30" in our case) and it fails:

Therefore, each time we upgrade selenium-grid, we have to manually remove all underlying selenium Deployments from all our clusters, and then resume the Helm Release manually so it creates everything (all the resources) from scratch without using "patch" to any existing Deployments.
This is cumbersome and is causing downtimes.
We have been investigating the logic of:
https://github.com/SeleniumHQ/docker-selenium/blob/selenium-grid-0.45.1/charts/selenium-grid/templates/_helpers.tpl#L381
and it turns out that there's no way to disable it (so we can safely use our definition with the value of "30") and it's related to another setting nodeMaxSessions
:
- name: SE_DRAIN_AFTER_SESSION_COUNT
value: {{ and (eq (include "seleniumGrid.useKEDA" $) "true") (eq .Values.autoscaling.scalingType "job") | ternary $nodeMaxSessions 0 | quote }}
Since we don't want to change `nodeMaxSessions' to enforce the value of "30", I'm wondering if there's a chance to fix this behavior and just expose an option to only define the value of "SE_DRAIN_AFTER_SESSION_COUNT" so we don't need to re-define it in the way we do it now.
Another option would be to have a setting to disable this variable so it's not set by default at all - so we can define it on our side which shouldn't cause the conflicts with Helm "patch" operation.
Command used to start Selenium Grid with Docker (or Kubernetes)
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: comp-tests-selenium
spec:
releaseName: comp-tests-selenium
chart:
spec:
chart: selenium-grid
sourceRef:
kind: HelmRepository
name: selenium-grid
version: "0.45.1"
interval: 10m
timeout: 9m30s
install:
remediation:
retries: 3
# https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/values.yaml
values:
global:
seleniumGrid:
imagePullSecret: artifactory
kubectlImage: docker.company.com/bitnami/kubectl:1.31
imageRegistry: docker.company.com/selenium
isolateComponents: false
chromeNode:
scaledObjectOptions:
scaleTargetRef:
name: selenium-chrome-node
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: [ "ALL" ]
seccompProfile:
type: RuntimeDefault
imageName: node-chrome
dshmVolumeSizeLimit: 1.5Gi
replicas: 2
resources:
limits:
cpu: 2 #by default from helm charts defined to 1
memory: 1.5Gi
requests:
memory: 1Gi
cpu: 1
startupProbe:
httpGet:
path: /status
port: 5555
failureThreshold: 120
periodSeconds: 5
terminationGracePeriodSeconds: 90
# Allow pod correctly shutdown
deregisterLifecycle:
preStop:
exec:
command: [ "bash", "-c", "/opt/bin/nodePreStop.sh" ]
extraEnvironmentVariables: # Custom environment variables for chromeNode
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SCREEN_DEPTH
value: "24"
- name: SCREEN_DPI
value: "74"
- name: SE_DRAIN_AFTER_SESSION_COUNT
value: "30"
- name: SE_NODE_SESSION_TIMEOUT # The Node will automatically kill a session that has not had any activity in the last X seconds. This will release the slot for other tests
value: "60"
- name: SE_NODE_GRID_URL
value: "http://comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}.svc:4444" #hrName-selenium-hub.namespace
- name: SE_EVENT_BUS_HOST
value: "comp-tests-selenium-selenium-hub.comp-tests-selenium${namespace_suffix}" #hrName-selenium-hub.namespace
nodeSelector:
qa: "true"
tolerations:
- key: qa
value: "true"
effect: NoSchedule
firefoxNode:
enabled: false
edgeNode:
enabled: false
hub:
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
capabilities:
drop: [ "ALL" ]
seccompProfile:
type: RuntimeDefault
# affinity: consider podAntiAffinity with hub and nodes, from newer versions chart provides this possibility
imageName: hub
serviceType: ClusterIP
resources:
limits:
memory: 2Gi
requests:
memory: 1Gi
cpu: 0.2
annotations:
karpenter.sh/do-not-disrupt: "true"
extraEnvironmentVariables: # Custom environment variables for hub
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
- name: SCREEN_DEPTH
value: "24"
- name: SCREEN_DPI
value: "74"
- name: SE_SESSION_REQUEST_TIMEOUT # A new incoming session request is added to the queue. Requests sitting in the queue for longer than the configured time will timeout.
value: "180"
nodeSelector:
qa: "true"
tolerations:
- key: qa
value: "true"
effect: NoSchedule
ingress:
className: private-nginx
annotations:
nginx.ingress.kubernetes.io/service-upstream: "true"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
external-dns.alpha.kubernetes.io/private: "true"
cert-manager.io/cluster-issuer: letsencrypt
hostname: "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
tls:
- secretName: comp-tests-selenium-private-ingress-tls-selenium
hosts:
- "comp-tests-selenium${namespace_suffix}.tools.${cluster_region}.${cluster_domain}"
autoscaling:
patchObjectFinalizers:
enabled: true #https://github.com/SeleniumHQ/docker-selenium/issues/2196
enabled: false
enableWithExistingKEDA: true
scalingType: deployment
scaledOptions:
minReplicaCount: 0
maxReplicaCount: 5
pollingInterval: 10
scaledObjectOptions:
# triggers: #consider this section when connection to hub is not properly set
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 4
periodSeconds: 10
scaleDown:
stabilizationWindowSeconds: 360
policies:
- type: Pods
value: 1
periodSeconds: 150
Relevant log output
Status:
│ Conditions:
│ Last Transition Time: 2025-07-17T08:39:26Z
│ Message: Failed to upgrade after 1 attempt(s)
│ Observed Generation: 54
│ Reason: RetriesExceeded
│ Status: True
│ Type: Stalled
│ Last Transition Time: 2025-07-17T08:07:07Z
│ Message: Helm upgrade failed for release comp-tests-selenium/comp-tests-selenium with chart selenium-grid@0.45.1: cannot patch "comp-tests-selenium-selenium-node-chrome" with kind Deploy │
│ ment: The order in patch list: │
│ [map[name:SE_NODE_STEREOTYPE_EXTRA value:] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:0] map[name:SE_DRAIN_AFTER_SESSION_COUNT value:30] map[name:SE_NODE_BROWSER_VERSION value:] map[name:SE_NODE_PLATF │
│ RM_NAME value:[] map[name:SE_OTEL_RESOURCE_ATTRIBUTES value:app.kubernetes.io/component=selenium-grid-4.34.0-20250707,app.kubernetes.io/instance=comp-tests-selenium,app.kubernetes.io/managed-by=helm,app │
│ .kubernetes.io/version=4.34.0-20250707,helm.sh/chart=selenium-grid-0.45.1]] │
│ doesn't match $setElementOrder list: │
│ [map[name:KUBERNETES_NODE_HOST_IP] map[name:SE_NODE_MAX_SESSIONS] map[name:SE_NODE_ENABLE_MANAGED_DOWNLOADS] map[name:SE_NODE_STEREOTYPE_EXTRA] map[name:SE_DRAIN_AFTER_SESSION_COUNT] map[name:SE_NODE_B │
│ OWSER_NAME[] map[name:SE_NODE_BROWSER_VERSION] map[name:SE_NODE_PLATFORM_NAME] map[name:SE_NODE_CONTAINER_NAME] map[name:SE_OTEL_SERVICE_NAME] map[name:SE_OTEL_RESOURCE_ATTRIBUTES] map[name:SE_NODE_HOS │
│ [] map[name:SE_NODE_PORT] map[name:SE_NODE_REGISTER_PERIOD] map[name:SE_NODE_REGISTER_CYCLE] map[name:SCREEN_WIDTH] map[name:SCREEN_HEIGHT] map[name:SCREEN_DEPTH] map[name:SCREEN_DPI] map[name:SE_DRAIN │
│ AFTER_SESSION_COUNT[] map[name:SE_NODE_SESSION_TIMEOUT] map[name:SE_NODE_GRID_URL] map[name:SE_EVENT_BUS_HOST]]
│ Observed Generation: 54
│ Reason: UpgradeFailed
│ Status: False
│ Type: Ready
Operating System
Kubernetes EKS
Docker Selenium version (image tag)
4.34.0-20250707
Selenium Grid chart version (chart version)
0.45.1