Skip to content

Commit 09db7a4

Browse files
committed
Merge branch 'hv_netvsc-fix-error-nvsp_rndis_pkt_complete-error-status-2'
Michael Kelley says: ==================== hv_netvsc: Fix error "nvsp_rndis_pkt_complete error status: 2" Starting with commit dca5161 in the 6.3 kernel, the Linux driver for Hyper-V synthetic networking (netvsc) occasionally reports "nvsp_rndis_pkt_complete error status: 2".[1] This error indicates that Hyper-V has rejected a network packet transmit request from the guest, and the outgoing network packet is dropped. Higher level network protocols presumably recover and resend the packet so there is no functional error, but performance is slightly impacted. Commit dca5161 is not the cause of the error -- it only added reporting of an error that was already happening without any notice. The error has presumably been present since the netvsc driver was originally introduced into Linux. This patch set fixes the root cause of the problem, which is that the netvsc driver in Linux may send an incorrectly formatted VMBus message to Hyper-V when transmitting the network packet. The incorrect formatting occurs when the rndis header of the VMBus message crosses a page boundary due to how the Linux skb head memory is aligned. In such a case, two PFNs are required to describe the location of the rndis header, even though they are contiguous in guest physical address (GPA) space. Hyper-V requires that two PFNs be in a single "GPA range" data struture, but current netvsc code puts each PFN in its own GPA range, which Hyper-V rejects as an error in the case of the rndis header. The incorrect formatting occurs only for larger packets that netvsc must transmit via a VMBus "GPA Direct" message. There's no problem when netvsc transmits a smaller packet by copying it into a pre- allocated send buffer slot because the pre-allocated slots don't have page crossing issues. After commit 14ad6ed in the 6.14 kernel, the error occurs much more frequently in VMs with 16 or more vCPUs. It may occur every few seconds, or even more frequently, in a ssh session that outputs a lot of text. Commit 14ad6ed subtly changes how skb head memory is allocated, making it much more likely that the rndis header will cross a page boundary when the vCPU count is 16 or more. The changes in commit 14ad6ed are perfectly valid -- they just had the side effect of making the netvsc bug more prominent. One fix is to check for adjacent PFNs in vmbus_sendpacket_pagebuffer() and just combine them into a single GPA range. Such a fix is very contained. But conceptually it is fixing the problem at the wrong level. So this patch set takes the broader approach of maintaining the already known grouping of contiguous PFNs at a higher level in the netvsc driver code, and propagating that grouping down to the creation of the VMBus message to send to Hyper-V. Maintaining the grouping fixes this problem, and has the added benefit of allowing netvsc_dma_map() to make fewer calls to dma_map_single() to do bounce buffering in CoCo VMs. Patch 1 is a preparatory change to allow vmbus_sendpacket_mpb_desc() to specify multiple GPA ranges. In current code vmbus_sendpacket_mpb_desc() is used only by the storvsc synthetic SCSI driver, and it always creates a single GPA range. Patch 2 updates the netvsc driver to use vmbus_sendpacket_mpb_desc() instead of vmbus_sendpacket_pagebuffer(). Because the higher levels of netvsc still don't group contiguous PFNs, this patch is functionally neutral. The VMBus message to Hyper-V still has many GPA ranges, each with a single PFN. But it lays the groundwork for the next patch. Patch 3 changes the higher levels of netvsc to preserve the already known grouping of contiguous PFNs. When the contiguous groupings are passed to vmbus_sendpacket_mpb_desc(), GPA ranges containing multiple PFNs are produced, as expected by Hyper-V. This is point at which the core problem is fixed. Patches 4 and 5 remove code that is no longer necessary after the previous patches. These changes provide a net reduction of about 65 lines of code, which is an added benefit. These changes have been tested in normal VMs, in SEV-SNP and TDX CoCo VMs, and in Dv6-series VMs where the netvsp implementation is in the OpenHCL paravisor instead of the Hyper-V host. These changes are built against kernel version 6.15-rc6. [1] https://bugzilla.kernel.org/show_bug.cgi?id=217503 ==================== Link: https://patch.msgid.link/20250513000604.1396-1-mhklinux@outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 parents bf449f3 + 45a442f commit 09db7a4

File tree

7 files changed

+83
-146
lines changed

7 files changed

+83
-146
lines changed

drivers/hv/channel.c

Lines changed: 3 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1077,68 +1077,10 @@ int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer,
10771077
EXPORT_SYMBOL(vmbus_sendpacket);
10781078

10791079
/*
1080-
* vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
1081-
* packets using a GPADL Direct packet type. This interface allows you
1082-
* to control notifying the host. This will be useful for sending
1083-
* batched data. Also the sender can control the send flags
1084-
* explicitly.
1085-
*/
1086-
int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
1087-
struct hv_page_buffer pagebuffers[],
1088-
u32 pagecount, void *buffer, u32 bufferlen,
1089-
u64 requestid)
1090-
{
1091-
int i;
1092-
struct vmbus_channel_packet_page_buffer desc;
1093-
u32 descsize;
1094-
u32 packetlen;
1095-
u32 packetlen_aligned;
1096-
struct kvec bufferlist[3];
1097-
u64 aligned_data = 0;
1098-
1099-
if (pagecount > MAX_PAGE_BUFFER_COUNT)
1100-
return -EINVAL;
1101-
1102-
/*
1103-
* Adjust the size down since vmbus_channel_packet_page_buffer is the
1104-
* largest size we support
1105-
*/
1106-
descsize = sizeof(struct vmbus_channel_packet_page_buffer) -
1107-
((MAX_PAGE_BUFFER_COUNT - pagecount) *
1108-
sizeof(struct hv_page_buffer));
1109-
packetlen = descsize + bufferlen;
1110-
packetlen_aligned = ALIGN(packetlen, sizeof(u64));
1111-
1112-
/* Setup the descriptor */
1113-
desc.type = VM_PKT_DATA_USING_GPA_DIRECT;
1114-
desc.flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
1115-
desc.dataoffset8 = descsize >> 3; /* in 8-bytes granularity */
1116-
desc.length8 = (u16)(packetlen_aligned >> 3);
1117-
desc.transactionid = VMBUS_RQST_ERROR; /* will be updated in hv_ringbuffer_write() */
1118-
desc.reserved = 0;
1119-
desc.rangecount = pagecount;
1120-
1121-
for (i = 0; i < pagecount; i++) {
1122-
desc.range[i].len = pagebuffers[i].len;
1123-
desc.range[i].offset = pagebuffers[i].offset;
1124-
desc.range[i].pfn = pagebuffers[i].pfn;
1125-
}
1126-
1127-
bufferlist[0].iov_base = &desc;
1128-
bufferlist[0].iov_len = descsize;
1129-
bufferlist[1].iov_base = buffer;
1130-
bufferlist[1].iov_len = bufferlen;
1131-
bufferlist[2].iov_base = &aligned_data;
1132-
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
1133-
1134-
return hv_ringbuffer_write(channel, bufferlist, 3, requestid, NULL);
1135-
}
1136-
EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
1137-
1138-
/*
1139-
* vmbus_sendpacket_multipagebuffer - Send a multi-page buffer packet
1080+
* vmbus_sendpacket_mpb_desc - Send one or more multi-page buffer packets
11401081
* using a GPADL Direct packet type.
1141-
* The buffer includes the vmbus descriptor.
1082+
* The desc argument must include space for the VMBus descriptor. The
1083+
* rangecount field must already be set.
11421084
*/
11431085
int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
11441086
struct vmbus_packet_mpb_array *desc,
@@ -1160,7 +1102,6 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
11601102
desc->length8 = (u16)(packetlen_aligned >> 3);
11611103
desc->transactionid = VMBUS_RQST_ERROR; /* will be updated in hv_ringbuffer_write() */
11621104
desc->reserved = 0;
1163-
desc->rangecount = 1;
11641105

11651106
bufferlist[0].iov_base = desc;
11661107
bufferlist[0].iov_len = desc_size;

drivers/net/hyperv/hyperv_net.h

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,6 @@ struct hv_netvsc_packet {
158158
u8 cp_partial; /* partial copy into send buffer */
159159

160160
u8 rmsg_size; /* RNDIS header and PPI size */
161-
u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
162161
u8 page_buf_cnt;
163162

164163
u16 q_idx;
@@ -893,6 +892,18 @@ struct nvsp_message {
893892
sizeof(struct nvsp_message))
894893
#define NETVSC_MIN_IN_MSG_SIZE sizeof(struct vmpacket_descriptor)
895894

895+
/* Maximum # of contiguous data ranges that can make up a trasmitted packet.
896+
* Typically it's the max SKB fragments plus 2 for the rndis packet and the
897+
* linear portion of the SKB. But if MAX_SKB_FRAGS is large, the value may
898+
* need to be limited to MAX_PAGE_BUFFER_COUNT, which is the max # of entries
899+
* in a GPA direct packet sent to netvsp over VMBus.
900+
*/
901+
#if MAX_SKB_FRAGS + 2 < MAX_PAGE_BUFFER_COUNT
902+
#define MAX_DATA_RANGES (MAX_SKB_FRAGS + 2)
903+
#else
904+
#define MAX_DATA_RANGES MAX_PAGE_BUFFER_COUNT
905+
#endif
906+
896907
/* Estimated requestor size:
897908
* out_ring_size/min_out_msg_size + in_ring_size/min_in_msg_size
898909
*/

drivers/net/hyperv/netvsc.c

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -953,8 +953,7 @@ static void netvsc_copy_to_send_buf(struct netvsc_device *net_device,
953953
+ pend_size;
954954
int i;
955955
u32 padding = 0;
956-
u32 page_count = packet->cp_partial ? packet->rmsg_pgcnt :
957-
packet->page_buf_cnt;
956+
u32 page_count = packet->cp_partial ? 1 : packet->page_buf_cnt;
958957
u32 remain;
959958

960959
/* Add padding */
@@ -1055,6 +1054,42 @@ static int netvsc_dma_map(struct hv_device *hv_dev,
10551054
return 0;
10561055
}
10571056

1057+
/* Build an "array" of mpb entries describing the data to be transferred
1058+
* over VMBus. After the desc header fields, each "array" entry is variable
1059+
* size, and each entry starts after the end of the previous entry. The
1060+
* "offset" and "len" fields for each entry imply the size of the entry.
1061+
*
1062+
* The pfns are in HV_HYP_PAGE_SIZE, because all communication with Hyper-V
1063+
* uses that granularity, even if the system page size of the guest is larger.
1064+
* Each entry in the input "pb" array must describe a contiguous range of
1065+
* guest physical memory so that the pfns are sequential if the range crosses
1066+
* a page boundary. The offset field must be < HV_HYP_PAGE_SIZE.
1067+
*/
1068+
static inline void netvsc_build_mpb_array(struct hv_page_buffer *pb,
1069+
u32 page_buffer_count,
1070+
struct vmbus_packet_mpb_array *desc,
1071+
u32 *desc_size)
1072+
{
1073+
struct hv_mpb_array *mpb_entry = &desc->range;
1074+
int i, j;
1075+
1076+
for (i = 0; i < page_buffer_count; i++) {
1077+
u32 offset = pb[i].offset;
1078+
u32 len = pb[i].len;
1079+
1080+
mpb_entry->offset = offset;
1081+
mpb_entry->len = len;
1082+
1083+
for (j = 0; j < HVPFN_UP(offset + len); j++)
1084+
mpb_entry->pfn_array[j] = pb[i].pfn + j;
1085+
1086+
mpb_entry = (struct hv_mpb_array *)&mpb_entry->pfn_array[j];
1087+
}
1088+
1089+
desc->rangecount = page_buffer_count;
1090+
*desc_size = (char *)mpb_entry - (char *)desc;
1091+
}
1092+
10581093
static inline int netvsc_send_pkt(
10591094
struct hv_device *device,
10601095
struct hv_netvsc_packet *packet,
@@ -1097,20 +1132,24 @@ static inline int netvsc_send_pkt(
10971132

10981133
packet->dma_range = NULL;
10991134
if (packet->page_buf_cnt) {
1135+
struct vmbus_channel_packet_page_buffer desc;
1136+
u32 desc_size;
1137+
11001138
if (packet->cp_partial)
1101-
pb += packet->rmsg_pgcnt;
1139+
pb++;
11021140

11031141
ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
11041142
if (ret) {
11051143
ret = -EAGAIN;
11061144
goto exit;
11071145
}
11081146

1109-
ret = vmbus_sendpacket_pagebuffer(out_channel,
1110-
pb, packet->page_buf_cnt,
1111-
&nvmsg, sizeof(nvmsg),
1112-
req_id);
1113-
1147+
netvsc_build_mpb_array(pb, packet->page_buf_cnt,
1148+
(struct vmbus_packet_mpb_array *)&desc,
1149+
&desc_size);
1150+
ret = vmbus_sendpacket_mpb_desc(out_channel,
1151+
(struct vmbus_packet_mpb_array *)&desc,
1152+
desc_size, &nvmsg, sizeof(nvmsg), req_id);
11141153
if (ret)
11151154
netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
11161155
} else {
@@ -1259,7 +1298,7 @@ int netvsc_send(struct net_device *ndev,
12591298
packet->send_buf_index = section_index;
12601299

12611300
if (packet->cp_partial) {
1262-
packet->page_buf_cnt -= packet->rmsg_pgcnt;
1301+
packet->page_buf_cnt--;
12631302
packet->total_data_buflen = msd_len + packet->rmsg_size;
12641303
} else {
12651304
packet->page_buf_cnt = 0;

drivers/net/hyperv/netvsc_drv.c

Lines changed: 14 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -326,43 +326,10 @@ static u16 netvsc_select_queue(struct net_device *ndev, struct sk_buff *skb,
326326
return txq;
327327
}
328328

329-
static u32 fill_pg_buf(unsigned long hvpfn, u32 offset, u32 len,
330-
struct hv_page_buffer *pb)
331-
{
332-
int j = 0;
333-
334-
hvpfn += offset >> HV_HYP_PAGE_SHIFT;
335-
offset = offset & ~HV_HYP_PAGE_MASK;
336-
337-
while (len > 0) {
338-
unsigned long bytes;
339-
340-
bytes = HV_HYP_PAGE_SIZE - offset;
341-
if (bytes > len)
342-
bytes = len;
343-
pb[j].pfn = hvpfn;
344-
pb[j].offset = offset;
345-
pb[j].len = bytes;
346-
347-
offset += bytes;
348-
len -= bytes;
349-
350-
if (offset == HV_HYP_PAGE_SIZE && len) {
351-
hvpfn++;
352-
offset = 0;
353-
j++;
354-
}
355-
}
356-
357-
return j + 1;
358-
}
359-
360329
static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
361330
struct hv_netvsc_packet *packet,
362331
struct hv_page_buffer *pb)
363332
{
364-
u32 slots_used = 0;
365-
char *data = skb->data;
366333
int frags = skb_shinfo(skb)->nr_frags;
367334
int i;
368335

@@ -371,28 +338,27 @@ static u32 init_page_array(void *hdr, u32 len, struct sk_buff *skb,
371338
* 2. skb linear data
372339
* 3. skb fragment data
373340
*/
374-
slots_used += fill_pg_buf(virt_to_hvpfn(hdr),
375-
offset_in_hvpage(hdr),
376-
len,
377-
&pb[slots_used]);
378341

342+
pb[0].offset = offset_in_hvpage(hdr);
343+
pb[0].len = len;
344+
pb[0].pfn = virt_to_hvpfn(hdr);
379345
packet->rmsg_size = len;
380-
packet->rmsg_pgcnt = slots_used;
381346

382-
slots_used += fill_pg_buf(virt_to_hvpfn(data),
383-
offset_in_hvpage(data),
384-
skb_headlen(skb),
385-
&pb[slots_used]);
347+
pb[1].offset = offset_in_hvpage(skb->data);
348+
pb[1].len = skb_headlen(skb);
349+
pb[1].pfn = virt_to_hvpfn(skb->data);
386350

387351
for (i = 0; i < frags; i++) {
388352
skb_frag_t *frag = skb_shinfo(skb)->frags + i;
353+
struct hv_page_buffer *cur_pb = &pb[i + 2];
354+
u64 pfn = page_to_hvpfn(skb_frag_page(frag));
355+
u32 offset = skb_frag_off(frag);
389356

390-
slots_used += fill_pg_buf(page_to_hvpfn(skb_frag_page(frag)),
391-
skb_frag_off(frag),
392-
skb_frag_size(frag),
393-
&pb[slots_used]);
357+
cur_pb->offset = offset_in_hvpage(offset);
358+
cur_pb->len = skb_frag_size(frag);
359+
cur_pb->pfn = pfn + (offset >> HV_HYP_PAGE_SHIFT);
394360
}
395-
return slots_used;
361+
return frags + 2;
396362
}
397363

398364
static int count_skb_frag_slots(struct sk_buff *skb)
@@ -483,7 +449,7 @@ static int netvsc_xmit(struct sk_buff *skb, struct net_device *net, bool xdp_tx)
483449
struct net_device *vf_netdev;
484450
u32 rndis_msg_size;
485451
u32 hash;
486-
struct hv_page_buffer pb[MAX_PAGE_BUFFER_COUNT];
452+
struct hv_page_buffer pb[MAX_DATA_RANGES];
487453

488454
/* If VF is present and up then redirect packets to it.
489455
* Skip the VF if it is marked down or has no carrier.

drivers/net/hyperv/rndis_filter.c

Lines changed: 5 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,7 @@ static int rndis_filter_send_request(struct rndis_device *dev,
225225
struct rndis_request *req)
226226
{
227227
struct hv_netvsc_packet *packet;
228-
struct hv_page_buffer page_buf[2];
229-
struct hv_page_buffer *pb = page_buf;
228+
struct hv_page_buffer pb;
230229
int ret;
231230

232231
/* Setup the packet to send it */
@@ -235,27 +234,14 @@ static int rndis_filter_send_request(struct rndis_device *dev,
235234
packet->total_data_buflen = req->request_msg.msg_len;
236235
packet->page_buf_cnt = 1;
237236

238-
pb[0].pfn = virt_to_phys(&req->request_msg) >>
239-
HV_HYP_PAGE_SHIFT;
240-
pb[0].len = req->request_msg.msg_len;
241-
pb[0].offset = offset_in_hvpage(&req->request_msg);
242-
243-
/* Add one page_buf when request_msg crossing page boundary */
244-
if (pb[0].offset + pb[0].len > HV_HYP_PAGE_SIZE) {
245-
packet->page_buf_cnt++;
246-
pb[0].len = HV_HYP_PAGE_SIZE -
247-
pb[0].offset;
248-
pb[1].pfn = virt_to_phys((void *)&req->request_msg
249-
+ pb[0].len) >> HV_HYP_PAGE_SHIFT;
250-
pb[1].offset = 0;
251-
pb[1].len = req->request_msg.msg_len -
252-
pb[0].len;
253-
}
237+
pb.pfn = virt_to_phys(&req->request_msg) >> HV_HYP_PAGE_SHIFT;
238+
pb.len = req->request_msg.msg_len;
239+
pb.offset = offset_in_hvpage(&req->request_msg);
254240

255241
trace_rndis_send(dev->ndev, 0, &req->request_msg);
256242

257243
rcu_read_lock_bh();
258-
ret = netvsc_send(dev->ndev, packet, NULL, pb, NULL, false);
244+
ret = netvsc_send(dev->ndev, packet, NULL, &pb, NULL, false);
259245
rcu_read_unlock_bh();
260246

261247
return ret;

drivers/scsi/storvsc_drv.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1819,6 +1819,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
18191819
return SCSI_MLQUEUE_DEVICE_BUSY;
18201820
}
18211821

1822+
payload->rangecount = 1;
18221823
payload->range.len = length;
18231824
payload->range.offset = offset_in_hvpg;
18241825

include/linux/hyperv.h

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1161,13 +1161,6 @@ extern int vmbus_sendpacket(struct vmbus_channel *channel,
11611161
enum vmbus_packet_type type,
11621162
u32 flags);
11631163

1164-
extern int vmbus_sendpacket_pagebuffer(struct vmbus_channel *channel,
1165-
struct hv_page_buffer pagebuffers[],
1166-
u32 pagecount,
1167-
void *buffer,
1168-
u32 bufferlen,
1169-
u64 requestid);
1170-
11711164
extern int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
11721165
struct vmbus_packet_mpb_array *mpb,
11731166
u32 desc_size,

0 commit comments

Comments
 (0)