Skip to content

Commit ef30228

Browse files
committed
IB/mlx5: Use __iowrite64_copy() for write combining stores
mlx5 has a built in self-test at driver startup to evaluate if the platform supports write combining to generate a 64 byte PCIe TLP or not. This has proven necessary because a lot of common scenarios end up with broken write combining (especially inside virtual machines) and there is other way to learn this information. This self test has been consistently failing on new ARM64 CPU designs (specifically with NVIDIA Grace's implementation of Neoverse V2). The C loop around writeq() generates some pretty terrible ARM64 assembly, but historically this has worked on a lot of existing ARM64 CPUs till now. We see it succeed about 1 time in 10,000 on the worst effected systems. The CPU architects speculate that the load instructions interspersed with the stores makes the WC buffers statistically flush too often and thus the generation of large TLPs becomes infrequent. This makes the boot up test unreliable in that it indicates no write-combining, however userspace would be fine since it uses a ST4 instruction. Further, S390 has similar issues where only the special zpci_memcpy_toio() will actually generate large TLPs, and the open coded loop does not trigger it at all. Fix both ARM64 and S390 by switching to __iowrite64_copy() which now provides architecture specific variants that have a high change of generating a large TLP with write combining. x86 continues to use a similar writeq loop in the generate __iowrite64_copy(). Fixes: 11f552e ("IB/mlx5: Test write combining support") Link: https://lore.kernel.org/r/6-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
1 parent 2b7a5e1 commit ef30228

File tree

1 file changed

+3
-5
lines changed
  • drivers/infiniband/hw/mlx5

1 file changed

+3
-5
lines changed

drivers/infiniband/hw/mlx5/mem.c

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
* SOFTWARE.
3131
*/
3232

33+
#include <linux/io.h>
3334
#include <rdma/ib_umem_odp.h>
3435
#include "mlx5_ib.h"
3536
#include <linux/jiffies.h>
@@ -108,7 +109,6 @@ static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id,
108109
__be32 mmio_wqe[16] = {};
109110
unsigned long flags;
110111
unsigned int idx;
111-
int i;
112112

113113
if (unlikely(dev->mdev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR))
114114
return -EIO;
@@ -148,10 +148,8 @@ static int post_send_nop(struct mlx5_ib_dev *dev, struct ib_qp *ibqp, u64 wr_id,
148148
* we hit doorbell
149149
*/
150150
wmb();
151-
for (i = 0; i < 8; i++)
152-
mlx5_write64(&mmio_wqe[i * 2],
153-
bf->bfreg->map + bf->offset + i * 8);
154-
io_stop_wc();
151+
__iowrite64_copy(bf->bfreg->map + bf->offset, mmio_wqe,
152+
sizeof(mmio_wqe) / 8);
155153

156154
bf->offset ^= bf->buf_size;
157155

0 commit comments

Comments
 (0)