Skip to content

Commit e3debdd

Browse files
Chengchang Tangrleon
authored andcommitted
RDMA/hns: Fix missing flush CQE for DWQE
Flush CQE handler has not been called if QP state gets into errored mode in DWQE path. So, the new added outstanding WQEs will never be flushed. It leads to a hung task timeout when using NFS over RDMA: __switch_to+0x7c/0xd0 __schedule+0x350/0x750 schedule+0x50/0xf0 schedule_timeout+0x2c8/0x340 wait_for_common+0xf4/0x2b0 wait_for_completion+0x20/0x40 __ib_drain_sq+0x140/0x1d0 [ib_core] ib_drain_sq+0x98/0xb0 [ib_core] rpcrdma_xprt_disconnect+0x68/0x270 [rpcrdma] xprt_rdma_close+0x20/0x60 [rpcrdma] xprt_autoclose+0x64/0x1cc [sunrpc] process_one_work+0x1d8/0x4e0 worker_thread+0x154/0x420 kthread+0x108/0x150 ret_from_fork+0x10/0x18 Fixes: 01584a5 ("RDMA/hns: Add support of direct wqe") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20241220055249.146943-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
1 parent fa5c4ba commit e3debdd

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

drivers/infiniband/hw/hns/hns_roce_hw_v2.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,10 @@ static void write_dwqe(struct hns_roce_dev *hr_dev, struct hns_roce_qp *qp,
670670
#define HNS_ROCE_SL_SHIFT 2
671671
struct hns_roce_v2_rc_send_wqe *rc_sq_wqe = wqe;
672672

673+
if (unlikely(qp->state == IB_QPS_ERR)) {
674+
flush_cqe(hr_dev, qp);
675+
return;
676+
}
673677
/* All kinds of DirectWQE have the same header field layout */
674678
hr_reg_enable(rc_sq_wqe, RC_SEND_WQE_FLAG);
675679
hr_reg_write(rc_sq_wqe, RC_SEND_WQE_DB_SL_L, qp->sl);

0 commit comments

Comments
 (0)