Skip to content

Commit 6098481

Browse files
jpemartinsjgunthorpe
authored andcommitted
iommufd: Add a flag to skip clearing of IOPTE dirty
VFIO has an operation where it unmaps an IOVA while returning a bitmap with the dirty data. In reality the operation doesn't quite query the IO pagetables that the PTE was dirty or not. Instead it marks as dirty on anything that was mapped, and doing so in one syscall. In IOMMUFD the equivalent is done in two operations by querying with GET_DIRTY_IOVA followed by UNMAP_IOVA. However, this would incur two TLB flushes given that after clearing dirty bits IOMMU implementations require invalidating their IOTLB, plus another invalidation needed for the UNMAP. To allow dirty bits to be queried faster, add a flag (IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR) that requests to not clear the dirty bits from the PTE (but just reading them), under the expectation that the next operation is the unmap. An alternative is to unmap and just perpectually mark as dirty as that's the same behaviour as today. So here equivalent functionally can be provided with unmap alone, and if real dirty info is required it will amortize the cost while querying. There's still a race against DMA where in theory the unmap of the IOVA (when the guest invalidates the IOTLB via emulated iommu) would race against the VF performing DMA on the same IOVA. As discussed in [0], we are accepting to resolve this race as throwing away the DMA and it doesn't matter if it hit physical DRAM or not, the VM can't tell if we threw it away because the DMA was blocked or because we failed to copy the DRAM. [0] https://lore.kernel.org/linux-iommu/20220502185239.GR8364@nvidia.com/ Link: https://lore.kernel.org/r/20231024135109.73787-10-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
1 parent 7623683 commit 6098481

File tree

3 files changed

+23
-4
lines changed

3 files changed

+23
-4
lines changed

drivers/iommu/iommufd/hw_pagetable.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,8 @@ int iommufd_hwpt_get_dirty_bitmap(struct iommufd_ucmd *ucmd)
228228
struct iommufd_ioas *ioas;
229229
int rc = -EOPNOTSUPP;
230230

231-
if ((cmd->flags || cmd->__reserved))
231+
if ((cmd->flags & ~(IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR)) ||
232+
cmd->__reserved)
232233
return -EOPNOTSUPP;
233234

234235
hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);

drivers/iommu/iommufd/io_pagetable.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,7 @@ int iopt_map_user_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
414414
}
415415

416416
struct iova_bitmap_fn_arg {
417+
unsigned long flags;
417418
struct io_pagetable *iopt;
418419
struct iommu_domain *domain;
419420
struct iommu_dirty_bitmap *dirty;
@@ -430,13 +431,14 @@ static int __iommu_read_and_clear_dirty(struct iova_bitmap *bitmap,
430431
struct iommu_dirty_bitmap *dirty = arg->dirty;
431432
const struct iommu_dirty_ops *ops = domain->dirty_ops;
432433
unsigned long last_iova = iova + length - 1;
434+
unsigned long flags = arg->flags;
433435
int ret;
434436

435437
iopt_for_each_contig_area(&iter, area, arg->iopt, iova, last_iova) {
436438
unsigned long last = min(last_iova, iopt_area_last_iova(area));
437439

438440
ret = ops->read_and_clear_dirty(domain, iter.cur_iova,
439-
last - iter.cur_iova + 1, 0,
441+
last - iter.cur_iova + 1, flags,
440442
dirty);
441443
if (ret)
442444
return ret;
@@ -470,12 +472,15 @@ iommu_read_and_clear_dirty(struct iommu_domain *domain,
470472

471473
iommu_dirty_bitmap_init(&dirty, iter, &gather);
472474

475+
arg.flags = flags;
473476
arg.iopt = iopt;
474477
arg.domain = domain;
475478
arg.dirty = &dirty;
476479
iova_bitmap_for_each(iter, &arg, __iommu_read_and_clear_dirty);
477480

478-
iommu_iotlb_sync(domain, &gather);
481+
if (!(flags & IOMMU_DIRTY_NO_CLEAR))
482+
iommu_iotlb_sync(domain, &gather);
483+
479484
iova_bitmap_free(iter);
480485

481486
return ret;

include/uapi/linux/iommufd.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -500,11 +500,24 @@ struct iommu_hwpt_set_dirty_tracking {
500500
#define IOMMU_HWPT_SET_DIRTY_TRACKING _IO(IOMMUFD_TYPE, \
501501
IOMMUFD_CMD_HWPT_SET_DIRTY_TRACKING)
502502

503+
/**
504+
* enum iommufd_hwpt_get_dirty_bitmap_flags - Flags for getting dirty bits
505+
* @IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR: Just read the PTEs without clearing
506+
* any dirty bits metadata. This flag
507+
* can be passed in the expectation
508+
* where the next operation is an unmap
509+
* of the same IOVA range.
510+
*
511+
*/
512+
enum iommufd_hwpt_get_dirty_bitmap_flags {
513+
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR = 1,
514+
};
515+
503516
/**
504517
* struct iommu_hwpt_get_dirty_bitmap - ioctl(IOMMU_HWPT_GET_DIRTY_BITMAP)
505518
* @size: sizeof(struct iommu_hwpt_get_dirty_bitmap)
506519
* @hwpt_id: HW pagetable ID that represents the IOMMU domain
507-
* @flags: Must be zero
520+
* @flags: Combination of enum iommufd_hwpt_get_dirty_bitmap_flags
508521
* @__reserved: Must be 0
509522
* @iova: base IOVA of the bitmap first bit
510523
* @length: IOVA range size

0 commit comments

Comments
 (0)