Skip to content

Commit bff687b

Browse files
committed
Merge tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe: "Mostly just NVMe, but also a single fixup for BFQ for a regression that happened during the merge window. In detail: - NVMe pull requests via Christoph: - Fix doorbell buffer value endianness (Klaus Jensen) - Fix Linux vs NVMe page size mismatch (Keith Busch) - Fix a potential use memory access beyong the allocation limit (Keith Busch) - Fix a multipath vs blktrace NULL pointer dereference (Yanjun Zhang) - Fix various problems in handling the Command Supported and Effects log (Christoph Hellwig) - Don't allow unprivileged passthrough of commands that don't transfer data but modify logical block content (Christoph Hellwig) - Add a features and quirks policy document (Christoph Hellwig) - Fix some really nasty code that was correct but made smatch complain (Sagi Grimberg) - Use-after-free regression in BFQ from this merge window (Yu)" * tag 'block-6.2-2022-12-29' of git://git.kernel.dk/linux: nvme-auth: fix smatch warning complaints nvme: consult the CSE log page for unprivileged passthrough nvme: also return I/O command effects from nvme_command_effects nvmet: don't defer passthrough commands with trivial effects to the workqueue nvmet: set the LBCC bit for commands that modify data nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition docs, nvme: add a feature and quirk policy document nvme-pci: update sqsize when adjusting the queue depth nvme: fix setting the queue depth in nvme_alloc_io_tag_set block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq nvme: fix multipath crash caused by flush request when blktrace is enabled nvme-pci: fix page size checks nvme-pci: fix mempool alloc size nvme-pci: fix doorbell buffer value endianness
2 parents ac787ff + 1551ed5 commit bff687b

File tree

12 files changed

+186
-59
lines changed

12 files changed

+186
-59
lines changed

Documentation/maintainer/maintainer-entry-profile.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,4 @@ to do something different in the near future.
104104
../riscv/patch-acceptance
105105
../driver-api/media/maintainer-entry-profile
106106
../driver-api/vfio-pci-device-specific-driver-acceptance
107+
../nvme/feature-and-quirk-policy
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=======================================
4+
Linux NVMe feature and and quirk policy
5+
=======================================
6+
7+
This file explains the policy used to decide what is supported by the
8+
Linux NVMe driver and what is not.
9+
10+
11+
Introduction
12+
============
13+
14+
NVM Express is an open collection of standards and information.
15+
16+
The Linux NVMe host driver in drivers/nvme/host/ supports devices
17+
implementing the NVM Express (NVMe) family of specifications, which
18+
currently consists of a number of documents:
19+
20+
- the NVMe Base specification
21+
- various Command Set specifications (e.g. NVM Command Set)
22+
- various Transport specifications (e.g. PCIe, Fibre Channel, RDMA, TCP)
23+
- the NVMe Management Interface specification
24+
25+
See https://nvmexpress.org/developers/ for the NVMe specifications.
26+
27+
28+
Supported features
29+
==================
30+
31+
NVMe is a large suite of specifications, and contains features that are only
32+
useful or suitable for specific use-cases. It is important to note that Linux
33+
does not aim to implement every feature in the specification. Every additional
34+
feature implemented introduces more code, more maintenance and potentially more
35+
bugs. Hence there is an inherent tradeoff between functionality and
36+
maintainability of the NVMe host driver.
37+
38+
Any feature implemented in the Linux NVMe host driver must support the
39+
following requirements:
40+
41+
1. The feature is specified in a release version of an official NVMe
42+
specification, or in a ratified Technical Proposal (TP) that is
43+
available on NVMe website. Or if it is not directly related to the
44+
on-wire protocol, does not contradict any of the NVMe specifications.
45+
2. Does not conflict with the Linux architecture, nor the design of the
46+
NVMe host driver.
47+
3. Has a clear, indisputable value-proposition and a wide consensus across
48+
the community.
49+
50+
Vendor specific extensions are generally not supported in the NVMe host
51+
driver.
52+
53+
It is strongly recommended to work with the Linux NVMe and block layer
54+
maintainers and get feedback on specification changes that are intended
55+
to be used by the Linux NVMe host driver in order to avoid conflict at a
56+
later stage.
57+
58+
59+
Quirks
60+
======
61+
62+
Sometimes implementations of open standards fail to correctly implement parts
63+
of the standards. Linux uses identifier-based quirks to work around such
64+
implementation bugs. The intent of quirks is to deal with widely available
65+
hardware, usually consumer, which Linux users can't use without these quirks.
66+
Typically these implementations are not or only superficially tested with Linux
67+
by the hardware manufacturer.
68+
69+
The Linux NVMe maintainers decide ad hoc whether to quirk implementations
70+
based on the impact of the problem to Linux users and how it impacts
71+
maintainability of the driver. In general quirks are a last resort, if no
72+
firmware updates or other workarounds are available from the vendor.
73+
74+
Quirks will not be added to the Linux kernel for hardware that isn't available
75+
on the mass market. Hardware that fails qualification for enterprise Linux
76+
distributions, ChromeOS, Android or other consumers of the Linux kernel
77+
should be fixed before it is shipped instead of relying on Linux quirks.

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14916,6 +14916,7 @@ L: linux-nvme@lists.infradead.org
1491614916
S: Supported
1491714917
W: http://git.infradead.org/nvme.git
1491814918
T: git://git.infradead.org/nvme.git
14919+
F: Documentation/nvme/
1491914920
F: drivers/nvme/host/
1492014921
F: drivers/nvme/common/
1492114922
F: include/linux/nvme*

block/bfq-iosched.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5317,8 +5317,8 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
53175317
unsigned long flags;
53185318

53195319
spin_lock_irqsave(&bfqd->lock, flags);
5320-
bfq_exit_bfqq(bfqd, bfqq);
53215320
bic_set_bfqq(bic, NULL, is_sync);
5321+
bfq_exit_bfqq(bfqd, bfqq);
53225322
spin_unlock_irqrestore(&bfqd->lock, flags);
53235323
}
53245324
}

drivers/nvme/host/auth.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -953,7 +953,7 @@ int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl)
953953
goto err_free_dhchap_secret;
954954

955955
if (!ctrl->opts->dhchap_secret && !ctrl->opts->dhchap_ctrl_secret)
956-
return ret;
956+
return 0;
957957

958958
ctrl->dhchap_ctxs = kvcalloc(ctrl_max_dhchaps(ctrl),
959959
sizeof(*chap), GFP_KERNEL);

drivers/nvme/host/core.c

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1074,23 +1074,43 @@ static u32 nvme_known_admin_effects(u8 opcode)
10741074
return 0;
10751075
}
10761076

1077+
static u32 nvme_known_nvm_effects(u8 opcode)
1078+
{
1079+
switch (opcode) {
1080+
case nvme_cmd_write:
1081+
case nvme_cmd_write_zeroes:
1082+
case nvme_cmd_write_uncor:
1083+
return NVME_CMD_EFFECTS_LBCC;
1084+
default:
1085+
return 0;
1086+
}
1087+
}
1088+
10771089
u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
10781090
{
10791091
u32 effects = 0;
10801092

10811093
if (ns) {
10821094
if (ns->head->effects)
10831095
effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
1096+
if (ns->head->ids.csi == NVME_CAP_CSS_NVM)
1097+
effects |= nvme_known_nvm_effects(opcode);
10841098
if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
10851099
dev_warn_once(ctrl->device,
1086-
"IO command:%02x has unhandled effects:%08x\n",
1100+
"IO command:%02x has unusual effects:%08x\n",
10871101
opcode, effects);
1088-
return 0;
1089-
}
10901102

1091-
if (ctrl->effects)
1092-
effects = le32_to_cpu(ctrl->effects->acs[opcode]);
1093-
effects |= nvme_known_admin_effects(opcode);
1103+
/*
1104+
* NVME_CMD_EFFECTS_CSE_MASK causes a freeze all I/O queues,
1105+
* which would deadlock when done on an I/O command. Note that
1106+
* We already warn about an unusual effect above.
1107+
*/
1108+
effects &= ~NVME_CMD_EFFECTS_CSE_MASK;
1109+
} else {
1110+
if (ctrl->effects)
1111+
effects = le32_to_cpu(ctrl->effects->acs[opcode]);
1112+
effects |= nvme_known_admin_effects(opcode);
1113+
}
10941114

10951115
return effects;
10961116
}
@@ -4926,7 +4946,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
49264946

49274947
memset(set, 0, sizeof(*set));
49284948
set->ops = ops;
4929-
set->queue_depth = ctrl->sqsize + 1;
4949+
set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
49304950
/*
49314951
* Some Apple controllers requires tags to be unique across admin and
49324952
* the (only) I/O queue, so reserve the first 32 tags of the I/O queue.

drivers/nvme/host/ioctl.c

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
1212
fmode_t mode)
1313
{
14+
u32 effects;
15+
1416
if (capable(CAP_SYS_ADMIN))
1517
return true;
1618

@@ -43,11 +45,29 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c,
4345
}
4446

4547
/*
46-
* Only allow I/O commands that transfer data to the controller if the
47-
* special file is open for writing, but always allow I/O commands that
48-
* transfer data from the controller.
48+
* Check if the controller provides a Commands Supported and Effects log
49+
* and marks this command as supported. If not reject unprivileged
50+
* passthrough.
51+
*/
52+
effects = nvme_command_effects(ns->ctrl, ns, c->common.opcode);
53+
if (!(effects & NVME_CMD_EFFECTS_CSUPP))
54+
return false;
55+
56+
/*
57+
* Don't allow passthrough for command that have intrusive (or unknown)
58+
* effects.
59+
*/
60+
if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC |
61+
NVME_CMD_EFFECTS_UUID_SEL |
62+
NVME_CMD_EFFECTS_SCOPE_MASK))
63+
return false;
64+
65+
/*
66+
* Only allow I/O commands that transfer data to the controller or that
67+
* change the logical block contents if the file descriptor is open for
68+
* writing.
4969
*/
50-
if (nvme_is_write(c))
70+
if (nvme_is_write(c) || (effects & NVME_CMD_EFFECTS_LBCC))
5171
return mode & FMODE_WRITE;
5272
return true;
5373
}

drivers/nvme/host/nvme.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -893,7 +893,7 @@ static inline void nvme_trace_bio_complete(struct request *req)
893893
{
894894
struct nvme_ns *ns = req->q->queuedata;
895895

896-
if (req->cmd_flags & REQ_NVME_MPATH)
896+
if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
897897
trace_block_bio_complete(ns->head->disk->queue, req->bio);
898898
}
899899

drivers/nvme/host/pci.c

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
#define SQ_SIZE(q) ((q)->q_depth << (q)->sqes)
3737
#define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion))
3838

39-
#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct nvme_sgl_desc))
39+
#define SGES_PER_PAGE (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc))
4040

4141
/*
4242
* These can be higher, but we need to ensure that any command doesn't
@@ -144,9 +144,9 @@ struct nvme_dev {
144144
mempool_t *iod_mempool;
145145

146146
/* shadow doorbell buffer support: */
147-
u32 *dbbuf_dbs;
147+
__le32 *dbbuf_dbs;
148148
dma_addr_t dbbuf_dbs_dma_addr;
149-
u32 *dbbuf_eis;
149+
__le32 *dbbuf_eis;
150150
dma_addr_t dbbuf_eis_dma_addr;
151151

152152
/* host memory buffer support: */
@@ -208,10 +208,10 @@ struct nvme_queue {
208208
#define NVMEQ_SQ_CMB 1
209209
#define NVMEQ_DELETE_ERROR 2
210210
#define NVMEQ_POLLED 3
211-
u32 *dbbuf_sq_db;
212-
u32 *dbbuf_cq_db;
213-
u32 *dbbuf_sq_ei;
214-
u32 *dbbuf_cq_ei;
211+
__le32 *dbbuf_sq_db;
212+
__le32 *dbbuf_cq_db;
213+
__le32 *dbbuf_sq_ei;
214+
__le32 *dbbuf_cq_ei;
215215
struct completion delete_done;
216216
};
217217

@@ -343,20 +343,20 @@ static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old)
343343
}
344344

345345
/* Update dbbuf and return true if an MMIO is required */
346-
static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
347-
volatile u32 *dbbuf_ei)
346+
static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
347+
volatile __le32 *dbbuf_ei)
348348
{
349349
if (dbbuf_db) {
350-
u16 old_value;
350+
u16 old_value, event_idx;
351351

352352
/*
353353
* Ensure that the queue is written before updating
354354
* the doorbell in memory
355355
*/
356356
wmb();
357357

358-
old_value = *dbbuf_db;
359-
*dbbuf_db = value;
358+
old_value = le32_to_cpu(*dbbuf_db);
359+
*dbbuf_db = cpu_to_le32(value);
360360

361361
/*
362362
* Ensure that the doorbell is updated before reading the event
@@ -366,7 +366,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
366366
*/
367367
mb();
368368

369-
if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value))
369+
event_idx = le32_to_cpu(*dbbuf_ei);
370+
if (!nvme_dbbuf_need_event(event_idx, value, old_value))
370371
return false;
371372
}
372373

@@ -380,9 +381,9 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db,
380381
*/
381382
static int nvme_pci_npages_prp(void)
382383
{
383-
unsigned nprps = DIV_ROUND_UP(NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE,
384-
NVME_CTRL_PAGE_SIZE);
385-
return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8);
384+
unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE;
385+
unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE);
386+
return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8);
386387
}
387388

388389
/*
@@ -392,7 +393,7 @@ static int nvme_pci_npages_prp(void)
392393
static int nvme_pci_npages_sgl(void)
393394
{
394395
return DIV_ROUND_UP(NVME_MAX_SEGS * sizeof(struct nvme_sgl_desc),
395-
PAGE_SIZE);
396+
NVME_CTRL_PAGE_SIZE);
396397
}
397398

398399
static int nvme_admin_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
@@ -708,7 +709,7 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge,
708709
sge->length = cpu_to_le32(entries * sizeof(*sge));
709710
sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4;
710711
} else {
711-
sge->length = cpu_to_le32(PAGE_SIZE);
712+
sge->length = cpu_to_le32(NVME_CTRL_PAGE_SIZE);
712713
sge->type = NVME_SGL_FMT_SEG_DESC << 4;
713714
}
714715
}
@@ -2332,10 +2333,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
23322333
if (dev->cmb_use_sqes) {
23332334
result = nvme_cmb_qdepth(dev, nr_io_queues,
23342335
sizeof(struct nvme_command));
2335-
if (result > 0)
2336+
if (result > 0) {
23362337
dev->q_depth = result;
2337-
else
2338+
dev->ctrl.sqsize = result - 1;
2339+
} else {
23382340
dev->cmb_use_sqes = false;
2341+
}
23392342
}
23402343

23412344
do {
@@ -2536,7 +2539,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
25362539

25372540
dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
25382541
io_queue_depth);
2539-
dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
25402542
dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
25412543
dev->dbs = dev->bar + 4096;
25422544

@@ -2577,7 +2579,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
25772579
dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
25782580
dev->q_depth);
25792581
}
2580-
2582+
dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
25812583

25822584
nvme_map_cmb(dev);
25832585

0 commit comments

Comments
 (0)