-
Notifications
You must be signed in to change notification settings - Fork 661
Open
Labels
area/spdkSPDK upstream/downstreamSPDK upstream/downstreamarea/v2-data-enginev2 data engine (SPDK)v2 data engine (SPDK)kind/bugpriority/0Must be implement or fixed in this release (managed by PO)Must be implement or fixed in this release (managed by PO)require/backportRequire backport. Only used when the specific versions to backport have not been definied.Require backport. Only used when the specific versions to backport have not been definied.require/qa-review-coverageRequire QA to review coverageRequire QA to review coverage
Milestone
Description
Describe the Bug
Try to attach volume V2 volume (V2 in interrupt mode, pool mode works) does not never finish (disks are native NVMe).
[longhorn-instance-manager] time="2025-09-18T13:57:29.642404886Z" level=info msg="Creating instance" func="instance.(*Server).InstanceCreate" file="instance.go:116" dataEngine=DATA_ENGINE_V2 name=v2test36-r-a9f41bfc type=replica upgradeRequired=false
[2025-09-18 13:57:29.662629] bdev.c:8723:bdev_open_ext: *NOTICE*: Currently unable to find bdev with name: d83833ab97eacecdeb0a188f234799cfn1/v2test36-r-a9f41bfc
[longhorn-instance-manager] time="2025-09-18T13:57:29.674625164Z" level=info msg="Replica created a new head lvol" func="log.(*SafeLogger).Info" file="log.go:66" lvsName=d83833ab97eacecdeb0a188f234799cfn1 lvsUUID=fb68f02b-b1f2-4ac9-8b35-18621a8e7f93 replicaName=v2test36-r-a9f41bfc
[2025-09-18 13:57:29.718621] tcp.c: 759:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
[2025-09-18 13:57:29.742770] tcp.c:1103:nvmf_tcp_listen: *NOTICE*: *** NVMe/TCP Target Listening on 10.33.200.8 port 20001 ***
[longhorn-instance-manager] time="2025-09-18T13:57:29.751752709Z" level=info msg="Created replica" func="log.(*SafeLogger).Info" file="log.go:66" lvsName=d83833ab97eacecdeb0a188f234799cfn1 lvsUUID=fb68f02b-b1f2-4ac9-8b35-18621a8e7f93 replicaName=v2test36-r-a9f41bfc
[longhorn-instance-manager] time="2025-09-18T13:57:30.83629071Z" level=info msg="Creating instance" func="instance.(*Server).InstanceCreate" file="instance.go:116" dataEngine=DATA_ENGINE_V2 name=v2test36-e-0 type=engine upgradeRequired=false
[longhorn-instance-manager] time="2025-09-18T13:57:30.837017197Z" level=info msg="Creating engine" func="spdk.(*Engine).Create" file="engine.go:203" engineName=v2test36-e-0 frontend=spdk-tcp-blockdev initiatorAddress=10.33.200.8 portCount=1 replicaAddressMap="map[v2test36-r-23f3f887:10.33.200.4:20001 v2test36-r-a28dabf0:10.33.200.3:20001 v2test36-r-a9f41bfc:10.33.200.8:20001]" salvageRequested=false targetAddress=10.33.200.8 volumeName=v2test36
[longhorn-instance-manager] time="2025-09-18T13:57:30.840589422Z" level=info msg="Creating both initiator and target instances" func="log.(*SafeLogger).Info" file="log.go:66" engineName=v2test36-e-0 frontend=spdk-tcp-blockdev volumeName=v2test36
[2025-09-18 13:57:30.842613] bdev.c:8723:bdev_open_ext: *NOTICE*: Currently unable to find bdev with name: v2test36-e-0
[2025-09-18 13:57:30.850597] bdev_nvme.c:7088:spdk_bdev_nvme_delete: *ERROR*: Failed to find NVMe bdev controller
[2025-09-18 13:57:30.858604] bdev_nvme.c:6762:spdk_bdev_nvme_create: *NOTICE*: Updating global NVMe transport type (g_nvme_trtype) from PCIe to TCP (base-name: v2test36-r-a9f41bfc)
[2025-09-18 13:57:30.917166] nvme_transport.c: 580:nvme_qpair_connect_completion_cb: *NOTICE*: NVMe qpair 0x3522e00 connected successfully.
expected next line of log - build raid1 (from pool mode V2):
[longhorn-instance-manager] time="2025-09-18T12:44:08.448979351Z" level=info msg="Connecting all available replicas map[v2test33-r-007d8fdc:0xc001183a10 v2test33-r-67ddf076:0xc001183410 v2test33-r-b48a5efb:0xc001302bd0], then launching raid during engine creation" func="log.(*SafeLogger).Infof" file="log.go:73" engineName=v2test33-e-0 frontend=spdk-tcp-blockdev initiatorIP=10.33.200.4 replicaStatusMap="map[v2test33-r-007d8fdc:0xc001183a10 v2test33-r-67ddf076:0xc001183410 v2test33-r-b48a5efb:0xc001302bd0]" targetIP=10.33.200.4 volumeName=v2test33
To Reproduce
- created V2 volume in V2 interrupt mode
- try to attach volume does not never finish
- second to last message in "instace-manager" was introduces by new PR (bdev/nvme: allow global NVMe transport type overwrite spdk#64) initiated by previous issue ([BUG] V2 stop working - connectNVMfBdev() -> "code": -95,"message": "Operation not supported" (1.10.0-rc2) #11761)
- delete volume not possible (unfinished volume attach), restart all V2 "instance-managers" and remove orphan volumes are possible
Expected Behavior
successful attachment
Support Bundle for Troubleshooting
Many tries in support bundle, the last one is volume "v2test36" cretaed "13:56:*", try to attach "13:57:*", try to delete "14:09:*", restarted v2 instance-managers "14:11:*", orphaned delete V2 volumes "14:13:*".
supportbundle_09afb238-c96d-4010-9f95-f6de59c721df_2025-09-18T14-20-06Z.zip
Environment
- Longhorn version: v1.10.0-rc3
- Impacted volume (PV): v2test36
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: harvester v1.6.0 -> rke2
- Number of control plane nodes in the cluster: 3
- Number of worker nodes in the cluster: 3
- Node config
- OS type and version: SLE Micro 5.5 / Harvester v1.6.0
- Kernel version: 5.14.21-150500.55.116-default
- CPU per node: 8C/16T
- Memory per node: >=64GB
- Disk type (e.g. SSD/NVMe/HDD): 2xNVMe
- Network bandwidth between the nodes (Gbps): LACP-2x2.5Gb/s
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal NVMe
- Number of Longhorn volumes in the cluster: many V1, none V2
Additional context
No response
Workaround and Mitigation
No response
c3y1huang, PhanLe1010 and innobead
Metadata
Metadata
Assignees
Labels
area/spdkSPDK upstream/downstreamSPDK upstream/downstreamarea/v2-data-enginev2 data engine (SPDK)v2 data engine (SPDK)kind/bugpriority/0Must be implement or fixed in this release (managed by PO)Must be implement or fixed in this release (managed by PO)require/backportRequire backport. Only used when the specific versions to backport have not been definied.Require backport. Only used when the specific versions to backport have not been definied.require/qa-review-coverageRequire QA to review coverageRequire QA to review coverage
Type
Projects
Status
Resolved
Status
Implement