Skip to content

Commit 092e335

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe: - Usual minor updates and fixes for bnxt_re, hfi1, rxe, mana, iser, mlx5, vmw_pvrdma, hns - Make rxe work on tun devices - mana gains more standard verbs as it moves toward supporting in-kernel verbs - DMABUF support for mana - Fix page size calculations when memory registration exceeds 4G - On Demand Paging support for rxe - mlx5 support for RDMA TRANSPORT flow tables and a new ucap mechanism to access control use of them - Optional RDMA_TX/RX counters per QP in mlx5 * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (73 commits) IB/mad: Check available slots before posting receive WRs RDMA/mana_ib: Fix integer overflow during queue creation RDMA/mlx5: Fix calculation of total invalidated pages RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow RDMA/mlx5: Fix page_size variable overflow RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc() RDMA/mlx5: Fix cache entry update on dereg error RDMA/mlx5: Fix MR cache initialization error flow RDMA/mlx5: Support optional-counters binding for QPs RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config RDMA/core: Pass port to counter bind/unbind operations RDMA/core: Add support to optional-counters binding configuration RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj() RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes RDMA/core: Fix use-after-free when rename device name RDMA/bnxt_re: Support perf management counters RDMA/rxe: Fix incorrect return value of rxe_odp_atomic_op() RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject() RDMA/mana_ib: Handle net event for pointing to the current netdev net: mana: Change the function signature of mana_get_primary_netdev_rcu ...
2 parents 0ccff07 + 37826f0 commit 092e335

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+4052
-656
lines changed

Documentation/infiniband/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ InfiniBand
1212
opa_vnic
1313
sysfs
1414
tag_matching
15+
ucaps
1516
user_mad
1617
user_verbs
1718

Documentation/infiniband/ucaps.rst

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
=================================
2+
Infiniband Userspace Capabilities
3+
=================================
4+
5+
User CAPabilities (UCAPs) provide fine-grained control over specific
6+
firmware features in Infiniband (IB) devices. This approach offers
7+
more granular capabilities than the existing Linux capabilities,
8+
which may be too generic for certain FW features.
9+
10+
Each user capability is represented as a character device with root
11+
read-write access. Root processes can grant users special privileges
12+
by allowing access to these character devices (e.g., using chown).
13+
14+
Usage
15+
=====
16+
17+
UCAPs allow control over specific features of an IB device using file
18+
descriptors of UCAP character devices. Here is how a user enables
19+
specific features of an IB device:
20+
21+
* A root process grants the user access to the UCAP files that
22+
represents the capabilities (e.g., using chown).
23+
* The user opens the UCAP files, obtaining file descriptors.
24+
* When opening an IB device, include an array of the UCAP file
25+
descriptors as an attribute.
26+
* The ib_uverbs driver recognizes the UCAP file descriptors and enables
27+
the corresponding capabilities for the IB device.
28+
29+
Creating UCAPs
30+
==============
31+
32+
To create a new UCAP, drivers must first define a type in the
33+
rdma_user_cap enum in rdma/ib_ucaps.h. The name of the UCAP character
34+
device should be added to the ucap_names array in
35+
drivers/infiniband/core/ucaps.c. Then, the driver can create the UCAP
36+
character device by calling the ib_create_ucap API with the UCAP
37+
type.
38+
39+
A reference count is stored for each UCAP to track creations and
40+
removals of the UCAP device. If multiple creation calls are made with
41+
the same type (e.g., for two IB devices), the UCAP character device
42+
is created during the first call and subsequent calls increment the
43+
reference count.
44+
45+
The UCAP character device is created under /dev/infiniband, and its
46+
permissions are set to allow root read and write access only.
47+
48+
Removing UCAPs
49+
==============
50+
51+
Each removal decrements the reference count of the UCAP. The UCAP
52+
character device is removed from the filesystem only when the
53+
reference count is decreased to 0.
54+
55+
/dev and /sys/class files
56+
=========================
57+
58+
The class::
59+
60+
/sys/class/infiniband_ucaps
61+
62+
is created when the first UCAP character device is created.
63+
64+
The UCAP character device is created under /dev/infiniband.
65+
66+
For example, if mlx5_ib adds the rdma_user_cap
67+
RDMA_UCAP_MLX5_CTRL_LOCAL with name "mlx5_perm_ctrl_local", this will
68+
create the device node::
69+
70+
/dev/infiniband/mlx5_perm_ctrl_local
71+

drivers/infiniband/core/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
3939
uverbs_std_types_async_fd.o \
4040
uverbs_std_types_srq.o \
4141
uverbs_std_types_wq.o \
42-
uverbs_std_types_qp.o
42+
uverbs_std_types_qp.o \
43+
ucaps.o
4344
ib_uverbs-$(CONFIG_INFINIBAND_USER_MEM) += umem.o umem_dmabuf.o
4445
ib_uverbs-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o

drivers/infiniband/core/cache.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1501,6 +1501,12 @@ ib_cache_update(struct ib_device *device, u32 port, bool update_gids,
15011501
device->port_data[port].cache.pkey = pkey_cache;
15021502
}
15031503
device->port_data[port].cache.lmc = tprops->lmc;
1504+
1505+
if (device->port_data[port].cache.port_state != IB_PORT_NOP &&
1506+
device->port_data[port].cache.port_state != tprops->state)
1507+
ibdev_info(device, "Port: %d Link %s\n", port,
1508+
ib_port_state_to_str(tprops->state));
1509+
15041510
device->port_data[port].cache.port_state = tprops->state;
15051511

15061512
device->port_data[port].cache.subnet_prefix = tprops->subnet_prefix;

drivers/infiniband/core/cma.c

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -739,12 +739,26 @@ cma_validate_port(struct ib_device *device, u32 port,
739739
goto out;
740740
}
741741

742-
if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
743-
ndev = dev_get_by_index(dev_addr->net, bound_if_index);
744-
if (!ndev)
745-
goto out;
742+
/*
743+
* For a RXE device, it should work with TUN device and normal ethernet
744+
* devices. Use driver_id to check if a device is a RXE device or not.
745+
* ARPHDR_NONE means a TUN device.
746+
*/
747+
if (device->ops.driver_id == RDMA_DRIVER_RXE) {
748+
if ((dev_type == ARPHRD_NONE || dev_type == ARPHRD_ETHER)
749+
&& rdma_protocol_roce(device, port)) {
750+
ndev = dev_get_by_index(dev_addr->net, bound_if_index);
751+
if (!ndev)
752+
goto out;
753+
}
746754
} else {
747-
gid_type = IB_GID_TYPE_IB;
755+
if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) {
756+
ndev = dev_get_by_index(dev_addr->net, bound_if_index);
757+
if (!ndev)
758+
goto out;
759+
} else {
760+
gid_type = IB_GID_TYPE_IB;
761+
}
748762
}
749763

750764
sgid_attr = rdma_find_gid_by_port(device, gid, gid_type, port, ndev);

drivers/infiniband/core/counters.c

Lines changed: 32 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212

1313
static int __counter_set_mode(struct rdma_port_counter *port_counter,
1414
enum rdma_nl_counter_mode new_mode,
15-
enum rdma_nl_counter_mask new_mask)
15+
enum rdma_nl_counter_mask new_mask,
16+
bool bind_opcnt)
1617
{
1718
if (new_mode == RDMA_COUNTER_MODE_AUTO) {
1819
if (new_mask & (~ALL_AUTO_MODE_MASKS))
@@ -23,6 +24,7 @@ static int __counter_set_mode(struct rdma_port_counter *port_counter,
2324

2425
port_counter->mode.mode = new_mode;
2526
port_counter->mode.mask = new_mask;
27+
port_counter->mode.bind_opcnt = bind_opcnt;
2628
return 0;
2729
}
2830

@@ -41,6 +43,7 @@ static int __counter_set_mode(struct rdma_port_counter *port_counter,
4143
*/
4244
int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port,
4345
enum rdma_nl_counter_mask mask,
46+
bool bind_opcnt,
4447
struct netlink_ext_ack *extack)
4548
{
4649
struct rdma_port_counter *port_counter;
@@ -59,12 +62,13 @@ int rdma_counter_set_auto_mode(struct ib_device *dev, u32 port,
5962
RDMA_COUNTER_MODE_NONE;
6063

6164
if (port_counter->mode.mode == mode &&
62-
port_counter->mode.mask == mask) {
65+
port_counter->mode.mask == mask &&
66+
port_counter->mode.bind_opcnt == bind_opcnt) {
6367
ret = 0;
6468
goto out;
6569
}
6670

67-
ret = __counter_set_mode(port_counter, mode, mask);
71+
ret = __counter_set_mode(port_counter, mode, mask, bind_opcnt);
6872

6973
out:
7074
mutex_unlock(&port_counter->lock);
@@ -89,7 +93,7 @@ static void auto_mode_init_counter(struct rdma_counter *counter,
8993
}
9094

9195
static int __rdma_counter_bind_qp(struct rdma_counter *counter,
92-
struct ib_qp *qp)
96+
struct ib_qp *qp, u32 port)
9397
{
9498
int ret;
9599

@@ -100,7 +104,7 @@ static int __rdma_counter_bind_qp(struct rdma_counter *counter,
100104
return -EOPNOTSUPP;
101105

102106
mutex_lock(&counter->lock);
103-
ret = qp->device->ops.counter_bind_qp(counter, qp);
107+
ret = qp->device->ops.counter_bind_qp(counter, qp, port);
104108
mutex_unlock(&counter->lock);
105109

106110
return ret;
@@ -140,7 +144,8 @@ int rdma_counter_modify(struct ib_device *dev, u32 port,
140144

141145
static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port,
142146
struct ib_qp *qp,
143-
enum rdma_nl_counter_mode mode)
147+
enum rdma_nl_counter_mode mode,
148+
bool bind_opcnt)
144149
{
145150
struct rdma_port_counter *port_counter;
146151
struct rdma_counter *counter;
@@ -149,13 +154,15 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port,
149154
if (!dev->ops.counter_dealloc || !dev->ops.counter_alloc_stats)
150155
return NULL;
151156

152-
counter = kzalloc(sizeof(*counter), GFP_KERNEL);
157+
counter = rdma_zalloc_drv_obj(dev, rdma_counter);
153158
if (!counter)
154159
return NULL;
155160

156161
counter->device = dev;
157162
counter->port = port;
158163

164+
dev->ops.counter_init(counter);
165+
159166
rdma_restrack_new(&counter->res, RDMA_RESTRACK_COUNTER);
160167
counter->stats = dev->ops.counter_alloc_stats(counter);
161168
if (!counter->stats)
@@ -166,7 +173,7 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port,
166173
switch (mode) {
167174
case RDMA_COUNTER_MODE_MANUAL:
168175
ret = __counter_set_mode(port_counter, RDMA_COUNTER_MODE_MANUAL,
169-
0);
176+
0, bind_opcnt);
170177
if (ret) {
171178
mutex_unlock(&port_counter->lock);
172179
goto err_mode;
@@ -185,10 +192,11 @@ static struct rdma_counter *alloc_and_bind(struct ib_device *dev, u32 port,
185192
mutex_unlock(&port_counter->lock);
186193

187194
counter->mode.mode = mode;
195+
counter->mode.bind_opcnt = bind_opcnt;
188196
kref_init(&counter->kref);
189197
mutex_init(&counter->lock);
190198

191-
ret = __rdma_counter_bind_qp(counter, qp);
199+
ret = __rdma_counter_bind_qp(counter, qp, port);
192200
if (ret)
193201
goto err_mode;
194202

@@ -213,7 +221,8 @@ static void rdma_counter_free(struct rdma_counter *counter)
213221
port_counter->num_counters--;
214222
if (!port_counter->num_counters &&
215223
(port_counter->mode.mode == RDMA_COUNTER_MODE_MANUAL))
216-
__counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0);
224+
__counter_set_mode(port_counter, RDMA_COUNTER_MODE_NONE, 0,
225+
false);
217226

218227
mutex_unlock(&port_counter->lock);
219228

@@ -238,7 +247,7 @@ static bool auto_mode_match(struct ib_qp *qp, struct rdma_counter *counter,
238247
return match;
239248
}
240249

241-
static int __rdma_counter_unbind_qp(struct ib_qp *qp)
250+
static int __rdma_counter_unbind_qp(struct ib_qp *qp, u32 port)
242251
{
243252
struct rdma_counter *counter = qp->counter;
244253
int ret;
@@ -247,7 +256,7 @@ static int __rdma_counter_unbind_qp(struct ib_qp *qp)
247256
return -EOPNOTSUPP;
248257

249258
mutex_lock(&counter->lock);
250-
ret = qp->device->ops.counter_unbind_qp(qp);
259+
ret = qp->device->ops.counter_unbind_qp(qp, port);
251260
mutex_unlock(&counter->lock);
252261

253262
return ret;
@@ -339,13 +348,14 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port)
339348

340349
counter = rdma_get_counter_auto_mode(qp, port);
341350
if (counter) {
342-
ret = __rdma_counter_bind_qp(counter, qp);
351+
ret = __rdma_counter_bind_qp(counter, qp, port);
343352
if (ret) {
344353
kref_put(&counter->kref, counter_release);
345354
return ret;
346355
}
347356
} else {
348-
counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO);
357+
counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_AUTO,
358+
port_counter->mode.bind_opcnt);
349359
if (!counter)
350360
return -ENOMEM;
351361
}
@@ -358,15 +368,15 @@ int rdma_counter_bind_qp_auto(struct ib_qp *qp, u32 port)
358368
* @force:
359369
* true - Decrease the counter ref-count anyway (e.g., qp destroy)
360370
*/
361-
int rdma_counter_unbind_qp(struct ib_qp *qp, bool force)
371+
int rdma_counter_unbind_qp(struct ib_qp *qp, u32 port, bool force)
362372
{
363373
struct rdma_counter *counter = qp->counter;
364374
int ret;
365375

366376
if (!counter)
367377
return -EINVAL;
368378

369-
ret = __rdma_counter_unbind_qp(qp);
379+
ret = __rdma_counter_unbind_qp(qp, port);
370380
if (ret && !force)
371381
return ret;
372382

@@ -513,7 +523,7 @@ int rdma_counter_bind_qpn(struct ib_device *dev, u32 port,
513523
goto err_task;
514524
}
515525

516-
ret = __rdma_counter_bind_qp(counter, qp);
526+
ret = __rdma_counter_bind_qp(counter, qp, port);
517527
if (ret)
518528
goto err_task;
519529

@@ -558,7 +568,7 @@ int rdma_counter_bind_qpn_alloc(struct ib_device *dev, u32 port,
558568
goto err;
559569
}
560570

561-
counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL);
571+
counter = alloc_and_bind(dev, port, qp, RDMA_COUNTER_MODE_MANUAL, true);
562572
if (!counter) {
563573
ret = -ENOMEM;
564574
goto err;
@@ -604,7 +614,7 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u32 port,
604614
goto out;
605615
}
606616

607-
ret = rdma_counter_unbind_qp(qp, false);
617+
ret = rdma_counter_unbind_qp(qp, port, false);
608618

609619
out:
610620
rdma_restrack_put(&qp->res);
@@ -613,13 +623,15 @@ int rdma_counter_unbind_qpn(struct ib_device *dev, u32 port,
613623

614624
int rdma_counter_get_mode(struct ib_device *dev, u32 port,
615625
enum rdma_nl_counter_mode *mode,
616-
enum rdma_nl_counter_mask *mask)
626+
enum rdma_nl_counter_mask *mask,
627+
bool *opcnt)
617628
{
618629
struct rdma_port_counter *port_counter;
619630

620631
port_counter = &dev->port_data[port].port_counter;
621632
*mode = port_counter->mode.mode;
622633
*mask = port_counter->mode.mask;
634+
*opcnt = port_counter->mode.bind_opcnt;
623635

624636
return 0;
625637
}

0 commit comments

Comments
 (0)