Skip to content

failed to get pod ips while all valkey pods are healthy #286

@davidshen84

Description

@davidshen84

Hi,

I installed the operator using helm chart v0.0.59-chart and the valkey image version is 8.0.2.

After I deployed this Valkey resource

apiVersion: hyperspike.io/v1
kind: Valkey
metadata:
  name: valkey
  namespace: valkey-operator-system
spec:
  image: ghcr.io/hyperspike/valkey:8.0.2
  anonymousAuth: true
  prometheus: false
  volumePermissions: true
  nodes: 1
  replicas: 1
  resources:
    limits:
      cpu: 1000m
      memory: 4Gi
    requests:
      cpu: 200m
      memory: 128Mi

I got the following errors constantly in the operator pod.

2025-08-08T00:51:56Z    ERROR    failed to get pod ips    {"controller": "valkey", "controllerGroup": "hyperspike.io", "controllerKind": "Valkey", "Valkey": {"name":"valkey","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "valkey", "reconcileID": "ef86a3ac-bdf4-4363-9769-cc10585215a2", "error": "timeout waiting for pods"}
2025-08-08T00:52:28Z    ERROR    failed to get pod ips    {"controller": "valkey", "controllerGroup": "hyperspike.io", "controllerKind": "Valkey", "Valkey": {"name":"valkey","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "valkey", "reconcileID": "ed4bd6d0-ed8a-47c3-9d5b-06a434137b62", "error": "timeout waiting for pods"}

Inside the valkey pod, nslookup can find the pods:

nslookup valkey-headless
Server:         10.43.0.10
Address:        10.43.0.10:53

** server can't find valkey-headless.cluster.local: NXDOMAIN

** server can't find valkey-headless.svc.cluster.local: NXDOMAIN


** server can't find valkey-headless.cluster.local: NXDOMAIN

** server can't find valkey-headless.svc.cluster.local: NXDOMAIN

Name:   valkey-headless.valkey-operator-system.svc.cluster.local
Address: 10.42.0.61
Name:   valkey-headless.valkey-operator-system.svc.cluster.local
Address: 10.42.0.60

In my k8s cluster, which is a k3s node, I saw the following errors. But I am not sure if they are related.

Aug 08 10:58:54 xps9560 k3s[292083]: E0808 10:58:54.717287  292083 repairip.go:523] "Unhandled Error" err="the IPAddress: 10.43.172.139 for Service valkey/valkey-operator-system has a wrong reference &v1.ParentReference{Group:\"\", Resource:\"services\", Namespace:\"valkey-operator-system\", Name:\"valkey\"}; cleaning up"
Aug 08 10:58:54 xps9560 k3s[292083]: I0808 10:58:54.728595  292083 cidrallocator.go:277] updated ClusterIP allocator for Service CIDR 10.43.0.0/16
Aug 08 10:58:54 xps9560 k3s[292083]: I0808 10:58:54.728701  292083 cidrallocator.go:277] updated ClusterIP allocator for Service CIDR 2001:cafe:43::/112
Aug 08 10:58:54 xps9560 k3s[292083]: I0808 10:58:54.738101  292083 ipallocator.go:374] error releasing ip 10.43.172.139 : ipaddresses.networking.k8s.io "10.43.172.139" not found

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions