Skip to content

Commit a764cd3

Browse files
committed
Documentation: teaching: lectures: update virtualization lecture
Signed-off-by: Octavian Purdila <tavi@cs.pub.ro>
1 parent b6a6d1a commit a764cd3

File tree

2 files changed

+396
-46
lines changed

2 files changed

+396
-46
lines changed

Documentation/teaching/lectures/virt.rst

Lines changed: 204 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Virtualization
1111
Lecture objectives:
1212
===================
1313

14-
.. slide:: Network Management
14+
.. slide:: Virtualization
1515
:inline-contents: True
1616
:level: 2
1717

@@ -128,6 +128,8 @@ MMU virtualization
128128
* "Fake" VM physical addresses are translated by the host to actual
129129
physical addresses
130130

131+
* Guest virtual address -> Guest physical address -> Host Physical Address
132+
131133
* The guest page tables are not directly used by the host hardware
132134

133135
* VM page tables are verified then translated into a new set of page
@@ -192,13 +194,17 @@ Lazy shadow sync
192194
* To avoid repeated traps, checks and transformations map guest
193195
page table entries with write access
194196

195-
* Update the shadow page table when the TLB is flushed
197+
* Update the shadow page table when
196198

199+
* The TLB is flushed
197200

198-
I/O virtualization
199-
==================
201+
* In the host page fault handler
200202

201-
.. slide:: I/O virtualization
203+
204+
I/O emulation
205+
=============
206+
207+
.. slide:: I/O emulation
202208
:inline-contents: True
203209
:level: 2
204210

@@ -232,6 +238,14 @@ I/O virtualization
232238
+-----------------+
233239

234240

241+
.. slide:: Example: qemu SiFive UART emulation
242+
:inline-contents: True
243+
:level: 2
244+
245+
.. literalinclude:: ../res/sifive_uart.c
246+
:language: c
247+
248+
235249
Paravirtualization
236250
==================
237251

@@ -299,6 +313,24 @@ Virtual Machine Control Structure
299313
* VMCS can not be accessed directly but certain information can be
300314
accessed with special instructions
301315

316+
VM entry & exit
317+
---------------
318+
319+
.. slide:: VM entry & exit
320+
:inline-contents: True
321+
:level: 2
322+
323+
* VM entry - new instructions that switches the CPU in non-root
324+
mode and loads the VM state from a VMCS; host state is saved in
325+
VMCS
326+
327+
* Allows injecting interrupts and exceptions in the guest
328+
329+
* VM exit will be automatically triggered based on the VMCS
330+
configuration
331+
332+
* When VM exit occurs host state is loaded from VMCS, guest state
333+
is saved in VMCS
302334

303335
VM execution control fields
304336
---------------------------
@@ -325,25 +357,6 @@ VM execution control fields
325357
generate a VM exit
326358

327359

328-
VM entry & exit
329-
---------------
330-
331-
.. slide:: VM entry & exit
332-
:inline-contents: True
333-
:level: 2
334-
335-
* VM entry - new instructions that switches the CPU in non-root
336-
mode and loads the VM state from a VMCS; host state is saved in
337-
VMCS
338-
339-
* Allows injecting interrupts and exceptions in the guest
340-
341-
* VM exit will be automatically triggered based on the VMCS
342-
configuration
343-
344-
* When VM exit occurs host state is loaded from VMCS, guest state
345-
is saved in VMCS
346-
347360
Extend Page Tables
348361
==================
349362

@@ -394,33 +407,158 @@ VPID
394407
* When searching the TLB just the current VPID is used
395408

396409

397-
Intel VT-d
398-
==========
410+
I/O virtualization
411+
==================
412+
413+
* Direct access to hardware from a VM - in a controlled fashion
414+
415+
* Map the MMIO host directly to the guest
399416

400-
.. slide:: Intel VT-d
417+
* Forward interrupts
418+
419+
.. slide:: I/O virtualization
420+
:inline-contents: True
421+
:level: 2
422+
423+
.. ditaa::
424+
425+
+---------------------+ +---------------------+
426+
| Guest OS | | Guest OS |
427+
| +---------------+ | | +---------------+ |
428+
| | Guest Driver | | | | Guest Driver | |
429+
| +---------------+ | | +---------------+ |
430+
| | ^ | | | ^ |
431+
| | | | | | | |
432+
+----+-----------+----+ +----+-----------+----+
433+
| traped | | mapped |
434+
| access | | access |
435+
+---+-----------+----+ +---+-----------+-----+ But how do we deal with DMA?
436+
| | VMM | | | | VMM | |
437+
| v | | | | | |
438+
| +----------------+ | | | +---------+ |
439+
| | Virtual Device | | | | | IRQ | |
440+
| +----------------+ | | | | Mapping | |
441+
| | ^ | | | +---------+ |
442+
| | | | | | | |
443+
+--+------------+----+ +---+-----------+-----+
444+
| | | |
445+
v | v |
446+
+-----------------+ +-----------------+
447+
| Physical Device | | Physical Device |
448+
+-----------------+ +-----------------+
449+
450+
Instead of trapping MMIO as with emulated devices we can allow the
451+
guest to access the MMIO directly by mapping through its page tables.
452+
453+
Interrupts from the device are handled by the host kernel and a signal
454+
is send to the VMM which injects the interrupt to the guest just as
455+
for the emulated devices.
456+
457+
458+
.. slide:: I/O MMU
459+
:inline-contents: True
460+
:level: 2
461+
462+
VT-d protects and translates VM physical addresses using an I/O
463+
MMU (DMA remaping)
464+
465+
.. ditaa::
466+
467+
+------+ +------+
468+
| | | |
469+
| CPU | | DMA |
470+
| | | |
471+
+------+ +------+
472+
|
473+
|
474+
v
475+
+-----+ +-----+
476+
| CR3 | | EPT |
477+
+-----+ +-----+
478+
| +------------------+ | +----------------+
479+
| | | | | |
480+
+--------> | Guest Page Table | +-------> | EPT Page Table | --------------->
481+
| | | |
482+
------------> +------------------+ ------------> +----------------+
483+
484+
Guest Virtual Guest Physical Host Physical
485+
Address Address Address
486+
487+
488+
.. slide:: Interrupt posting
401489
:inline-contents: True
402490
:level: 2
403491

404-
* Direct access to hardware from a VM - in a controlled was
492+
* Messsage Signaled Interrupts (MSI) = DMA writes to the host
493+
address range of the IRQ controller (e.g. 0xFEExxxxx)
405494

406-
* The physical device must support multiplexing (e.g. SR-IOV)
495+
* Low bits of the address and the data indicate which interrupt
496+
vector to deliver to which CPU
407497

408-
* I/O assignments
498+
* Interrupt remapping table points to the virtual CPU (VMCS) that
499+
should receive the interrupt
409500

410-
* IRQ routing
501+
* I/O MMU will trap the IRQ controller write and look it up in the
502+
interrupt remmaping table
411503

412-
* VT-d protects and translates VM physical addresses using an I/O
413-
MMU (DMA remaping)
504+
* if that virtual CPU is currently running it will take the
505+
interrupt directly
414506

507+
* otherwise a bit is set in a table (Posted Interrupt Descriptor
508+
table) and the interrupt will be inject next time that vCPU is
509+
run
415510

416-
DMA remapping
417-
-------------
418511

419-
.. slide:: DMA remapping
512+
.. slide:: I/O virtualization
420513
:inline-contents: True
421514
:level: 2
422515

423-
.. image:: ../res/dma-remapping.png
516+
.. ditaa::
517+
518+
+---------------------+ +---------------------+ +---------------------+
519+
| Guest OS | | Guest OS | | Guest OS |
520+
| +---------------+ | | +---------------+ | | +---------------+ |
521+
| | Guest Driver | | | | Guest Driver | | | | Guest Driver | |
522+
| +---------------+ | | +---------------+ | | +---------------+ |
523+
| | ^ | | | ^ | | | ^ |
524+
| | | | | | | | | | | |
525+
+----+-----------+----+ +----+-----------+----+ +----+-----------+----+
526+
| traped | | mapped | | mapped | interrupt
527+
| access | | access | | access | posting
528+
+---+-----------+----+ +---+-----------+-----+ +---+-----------+-----+
529+
| | VMM | | | | VMM | | | | VMM | |
530+
| v | | | | | | | | | |
531+
| +----------------+ | | | +---------+ | | | | |
532+
| | Virtual Device | | | | | IRQ | | | | | |
533+
| +----------------+ | | | | Mapping | | | | | |
534+
| | ^ | | | +---------+ | | | | |
535+
| | | | | | | | | | | |
536+
+--+------------+----+ +---+-----------+-----+ +---+-----------+-----+
537+
| | | | | |
538+
v | v | v |
539+
+-----------------+ +-----------------+ +-----------------+
540+
| Physical Device | | Physical Device | | Physical Device |
541+
+-----------------+ +-----------------+ +-----------------+
542+
543+
544+
545+
.. slide:: SR-IOV
546+
:inline-contents: True
547+
:level: 2
548+
549+
* Single Root - Input Output Virtualization
550+
551+
* Physical device with multiple Ethernet ports will be shown as
552+
multiple device on the PCI bus
553+
554+
* Physical Function is used for the control and can be configured
555+
556+
* to present itself as a new PCI device
557+
558+
* which VLAN to use
559+
560+
* The new virtual function is enumerated on the bus and can be
561+
assigned to a particular guest
424562

425563

426564
qemu
@@ -451,15 +589,6 @@ KVM
451589
:inline-contents: True
452590
:level: 2
453591

454-
* VMM implemented inside the Linux kernel
455-
456-
* Requires hardware virtualization (e.g. Intel VT-x)
457-
458-
* Shadow page tables or EPT if present
459-
460-
* Uses qemu or virtio for I/O virtualization
461-
462-
463592
.. ditaa::
464593

465594
VM1 (qemu) VM2 (qemu)
@@ -483,6 +612,35 @@ KVM
483612
+----------------------------------------------------+
484613

485614

615+
.. slide:: KVM
616+
:inline-contents: True
617+
:level: 2
618+
619+
* Linux device driver for hardware virtualization (e.g. Intel VT-x, SVM)
620+
621+
* IOCTL based interface for managing and running virtual CPUs
622+
623+
* VMM components implemented inside the Linux kernel
624+
(e.g. interrupt controller, timers)
625+
626+
* Shadow page tables or EPT if present
627+
628+
* Uses qemu or virtio for I/O virtualization
629+
630+
631+
632+
Type 1 vs Type 2 Hypervisors
633+
============================
634+
635+
.. slide:: Xen
636+
:inline-contents: True
637+
:level: 2
638+
639+
* Type 1 = Bare Metal Hypervisor
640+
641+
* Type 2 = Hypervisor embedded in an exist kernel / OS
642+
643+
486644
Xen
487645
===
488646

0 commit comments

Comments
 (0)