|
| 1 | +.. SPDX-License-Identifier: (GPL-2.0+ OR MIT) |
| 2 | +
|
| 3 | +=============== |
| 4 | +GPU SVM Section |
| 5 | +=============== |
| 6 | + |
| 7 | +Agreed upon design principles |
| 8 | +============================= |
| 9 | + |
| 10 | +* migrate_to_ram path |
| 11 | + * Rely only on core MM concepts (migration PTEs, page references, and |
| 12 | + page locking). |
| 13 | + * No driver specific locks other than locks for hardware interaction in |
| 14 | + this path. These are not required and generally a bad idea to |
| 15 | + invent driver defined locks to seal core MM races. |
| 16 | + * An example of a driver-specific lock causing issues occurred before |
| 17 | + fixing do_swap_page to lock the faulting page. A driver-exclusive lock |
| 18 | + in migrate_to_ram produced a stable livelock if enough threads read |
| 19 | + the faulting page. |
| 20 | + * Partial migration is supported (i.e., a subset of pages attempting to |
| 21 | + migrate can actually migrate, with only the faulting page guaranteed |
| 22 | + to migrate). |
| 23 | + * Driver handles mixed migrations via retry loops rather than locking. |
| 24 | +* Eviction |
| 25 | + * Eviction is defined as migrating data from the GPU back to the |
| 26 | + CPU without a virtual address to free up GPU memory. |
| 27 | + * Only looking at physical memory data structures and locks as opposed to |
| 28 | + looking at virtual memory data structures and locks. |
| 29 | + * No looking at mm/vma structs or relying on those being locked. |
| 30 | + * The rationale for the above two points is that CPU virtual addresses |
| 31 | + can change at any moment, while the physical pages remain stable. |
| 32 | + * GPU page table invalidation, which requires a GPU virtual address, is |
| 33 | + handled via the notifier that has access to the GPU virtual address. |
| 34 | +* GPU fault side |
| 35 | + * mmap_read only used around core MM functions which require this lock |
| 36 | + and should strive to take mmap_read lock only in GPU SVM layer. |
| 37 | + * Big retry loop to handle all races with the mmu notifier under the gpu |
| 38 | + pagetable locks/mmu notifier range lock/whatever we end up calling |
| 39 | + those. |
| 40 | + * Races (especially against concurrent eviction or migrate_to_ram) |
| 41 | + should not be handled on the fault side by trying to hold locks; |
| 42 | + rather, they should be handled using retry loops. One possible |
| 43 | + exception is holding a BO's dma-resv lock during the initial migration |
| 44 | + to VRAM, as this is a well-defined lock that can be taken underneath |
| 45 | + the mmap_read lock. |
| 46 | + * One possible issue with the above approach is if a driver has a strict |
| 47 | + migration policy requiring GPU access to occur in GPU memory. |
| 48 | + Concurrent CPU access could cause a livelock due to endless retries. |
| 49 | + While no current user (Xe) of GPU SVM has such a policy, it is likely |
| 50 | + to be added in the future. Ideally, this should be resolved on the |
| 51 | + core-MM side rather than through a driver-side lock. |
| 52 | +* Physical memory to virtual backpointer |
| 53 | + * This does not work, as no pointers from physical memory to virtual |
| 54 | + memory should exist. mremap() is an example of the core MM updating |
| 55 | + the virtual address without notifying the driver of address |
| 56 | + change rather the driver only receiving the invalidation notifier. |
| 57 | + * The physical memory backpointer (page->zone_device_data) should remain |
| 58 | + stable from allocation to page free. Safely updating this against a |
| 59 | + concurrent user would be very difficult unless the page is free. |
| 60 | +* GPU pagetable locking |
| 61 | + * Notifier lock only protects range tree, pages valid state for a range |
| 62 | + (rather than seqno due to wider notifiers), pagetable entries, and |
| 63 | + mmu notifier seqno tracking, it is not a global lock to protect |
| 64 | + against races. |
| 65 | + * All races handled with big retry as mentioned above. |
| 66 | + |
| 67 | +Overview of baseline design |
| 68 | +=========================== |
| 69 | + |
| 70 | +Baseline design is simple as possible to get a working basline in which can be |
| 71 | +built upon. |
| 72 | + |
| 73 | +.. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c |
| 74 | + :doc: Overview |
| 75 | + :doc: Locking |
| 76 | + :doc: Migrataion |
| 77 | + :doc: Partial Unmapping of Ranges |
| 78 | + :doc: Examples |
| 79 | + |
| 80 | +Possible future design features |
| 81 | +=============================== |
| 82 | + |
| 83 | +* Concurrent GPU faults |
| 84 | + * CPU faults are concurrent so makes sense to have concurrent GPU |
| 85 | + faults. |
| 86 | + * Should be possible with fined grained locking in the driver GPU |
| 87 | + fault handler. |
| 88 | + * No expected GPU SVM changes required. |
| 89 | +* Ranges with mixed system and device pages |
| 90 | + * Can be added if required to drm_gpusvm_get_pages fairly easily. |
| 91 | +* Multi-GPU support |
| 92 | + * Work in progress and patches expected after initially landing on GPU |
| 93 | + SVM. |
| 94 | + * Ideally can be done with little to no changes to GPU SVM. |
| 95 | +* Drop ranges in favor of radix tree |
| 96 | + * May be desirable for faster notifiers. |
| 97 | +* Compound device pages |
| 98 | + * Nvidia, AMD, and Intel all have agreed expensive core MM functions in |
| 99 | + migrate device layer are a performance bottleneck, having compound |
| 100 | + device pages should help increase performance by reducing the number |
| 101 | + of these expensive calls. |
| 102 | +* Higher order dma mapping for migration |
| 103 | + * 4k dma mapping adversely affects migration performance on Intel |
| 104 | + hardware, higher order (2M) dma mapping should help here. |
| 105 | +* Build common userptr implementation on top of GPU SVM |
| 106 | +* Driver side madvise implementation and migration policies |
| 107 | +* Pull in pending dma-mapping API changes from Leon / Nvidia when these land |
0 commit comments