@@ -11,7 +11,7 @@ Virtualization
11
11
Lecture objectives:
12
12
===================
13
13
14
- .. slide :: Network Management
14
+ .. slide :: Virtualization
15
15
:inline-contents: True
16
16
:level: 2
17
17
@@ -128,6 +128,8 @@ MMU virtualization
128
128
* "Fake" VM physical addresses are translated by the host to actual
129
129
physical addresses
130
130
131
+ * Guest virtual address -> Guest physical address -> Host Physical Address
132
+
131
133
* The guest page tables are not directly used by the host hardware
132
134
133
135
* VM page tables are verified then translated into a new set of page
@@ -192,13 +194,17 @@ Lazy shadow sync
192
194
* To avoid repeated traps, checks and transformations map guest
193
195
page table entries with write access
194
196
195
- * Update the shadow page table when the TLB is flushed
197
+ * Update the shadow page table when
196
198
199
+ * The TLB is flushed
197
200
198
- I/O virtualization
199
- ==================
201
+ * In the host page fault handler
200
202
201
- .. slide :: I/O virtualization
203
+
204
+ I/O emulation
205
+ =============
206
+
207
+ .. slide :: I/O emulation
202
208
:inline-contents: True
203
209
:level: 2
204
210
@@ -232,6 +238,14 @@ I/O virtualization
232
238
+-----------------+
233
239
234
240
241
+ .. slide :: Example: qemu SiFive UART emulation
242
+ :inline-contents: True
243
+ :level: 2
244
+
245
+ .. literalinclude :: ../res/sifive_uart.c
246
+ :language: c
247
+
248
+
235
249
Paravirtualization
236
250
==================
237
251
@@ -299,6 +313,24 @@ Virtual Machine Control Structure
299
313
* VMCS can not be accessed directly but certain information can be
300
314
accessed with special instructions
301
315
316
+ VM entry & exit
317
+ ---------------
318
+
319
+ .. slide :: VM entry & exit
320
+ :inline-contents: True
321
+ :level: 2
322
+
323
+ * VM entry - new instructions that switches the CPU in non-root
324
+ mode and loads the VM state from a VMCS; host state is saved in
325
+ VMCS
326
+
327
+ * Allows injecting interrupts and exceptions in the guest
328
+
329
+ * VM exit will be automatically triggered based on the VMCS
330
+ configuration
331
+
332
+ * When VM exit occurs host state is loaded from VMCS, guest state
333
+ is saved in VMCS
302
334
303
335
VM execution control fields
304
336
---------------------------
@@ -325,25 +357,6 @@ VM execution control fields
325
357
generate a VM exit
326
358
327
359
328
- VM entry & exit
329
- ---------------
330
-
331
- .. slide :: VM entry & exit
332
- :inline-contents: True
333
- :level: 2
334
-
335
- * VM entry - new instructions that switches the CPU in non-root
336
- mode and loads the VM state from a VMCS; host state is saved in
337
- VMCS
338
-
339
- * Allows injecting interrupts and exceptions in the guest
340
-
341
- * VM exit will be automatically triggered based on the VMCS
342
- configuration
343
-
344
- * When VM exit occurs host state is loaded from VMCS, guest state
345
- is saved in VMCS
346
-
347
360
Extend Page Tables
348
361
==================
349
362
@@ -394,33 +407,158 @@ VPID
394
407
* When searching the TLB just the current VPID is used
395
408
396
409
397
- Intel VT-d
398
- ==========
410
+ I/O virtualization
411
+ ==================
412
+
413
+ * Direct access to hardware from a VM - in a controlled fashion
414
+
415
+ * Map the MMIO host directly to the guest
399
416
400
- .. slide :: Intel VT-d
417
+ * Forward interrupts
418
+
419
+ .. slide :: I/O virtualization
420
+ :inline-contents: True
421
+ :level: 2
422
+
423
+ .. ditaa ::
424
+
425
+ +---------------------+ +---------------------+
426
+ | Guest OS | | Guest OS |
427
+ | +---------------+ | | +---------------+ |
428
+ | | Guest Driver | | | | Guest Driver | |
429
+ | +---------------+ | | +---------------+ |
430
+ | | ^ | | | ^ |
431
+ | | | | | | | |
432
+ +----+-----------+----+ +----+-----------+----+
433
+ | traped | | mapped |
434
+ | access | | access |
435
+ +---+-----------+----+ +---+-----------+-----+ But how do we deal with DMA?
436
+ | | VMM | | | | VMM | |
437
+ | v | | | | | |
438
+ | +----------------+ | | | +---------+ |
439
+ | | Virtual Device | | | | | IRQ | |
440
+ | +----------------+ | | | | Mapping | |
441
+ | | ^ | | | +---------+ |
442
+ | | | | | | | |
443
+ +--+------------+----+ +---+-----------+-----+
444
+ | | | |
445
+ v | v |
446
+ +-----------------+ +-----------------+
447
+ | Physical Device | | Physical Device |
448
+ +-----------------+ +-----------------+
449
+
450
+ Instead of trapping MMIO as with emulated devices we can allow the
451
+ guest to access the MMIO directly by mapping through its page tables.
452
+
453
+ Interrupts from the device are handled by the host kernel and a signal
454
+ is send to the VMM which injects the interrupt to the guest just as
455
+ for the emulated devices.
456
+
457
+
458
+ .. slide :: I/O MMU
459
+ :inline-contents: True
460
+ :level: 2
461
+
462
+ VT-d protects and translates VM physical addresses using an I/O
463
+ MMU (DMA remaping)
464
+
465
+ .. ditaa ::
466
+
467
+ +------+ +------+
468
+ | | | |
469
+ | CPU | | DMA |
470
+ | | | |
471
+ +------+ +------+
472
+ |
473
+ |
474
+ v
475
+ +-----+ +-----+
476
+ | CR3 | | EPT |
477
+ +-----+ +-----+
478
+ | +------------------+ | +----------------+
479
+ | | | | | |
480
+ +--------> | Guest Page Table | +-------> | EPT Page Table | --------------->
481
+ | | | |
482
+ ------------> +------------------+ ------------> +----------------+
483
+
484
+ Guest Virtual Guest Physical Host Physical
485
+ Address Address Address
486
+
487
+
488
+ .. slide :: Interrupt posting
401
489
:inline-contents: True
402
490
:level: 2
403
491
404
- * Direct access to hardware from a VM - in a controlled was
492
+ * Messsage Signaled Interrupts (MSI) = DMA writes to the host
493
+ address range of the IRQ controller (e.g. 0xFEExxxxx)
405
494
406
- * The physical device must support multiplexing (e.g. SR-IOV)
495
+ * Low bits of the address and the data indicate which interrupt
496
+ vector to deliver to which CPU
407
497
408
- * I/O assignments
498
+ * Interrupt remapping table points to the virtual CPU (VMCS) that
499
+ should receive the interrupt
409
500
410
- * IRQ routing
501
+ * I/O MMU will trap the IRQ controller write and look it up in the
502
+ interrupt remmaping table
411
503
412
- * VT-d protects and translates VM physical addresses using an I/O
413
- MMU (DMA remaping)
504
+ * if that virtual CPU is currently running it will take the
505
+ interrupt directly
414
506
507
+ * otherwise a bit is set in a table (Posted Interrupt Descriptor
508
+ table) and the interrupt will be inject next time that vCPU is
509
+ run
415
510
416
- DMA remapping
417
- -------------
418
511
419
- .. slide :: DMA remapping
512
+ .. slide :: I/O virtualization
420
513
:inline-contents: True
421
514
:level: 2
422
515
423
- .. image :: ../res/dma-remapping.png
516
+ .. ditaa ::
517
+
518
+ +---------------------+ +---------------------+ +---------------------+
519
+ | Guest OS | | Guest OS | | Guest OS |
520
+ | +---------------+ | | +---------------+ | | +---------------+ |
521
+ | | Guest Driver | | | | Guest Driver | | | | Guest Driver | |
522
+ | +---------------+ | | +---------------+ | | +---------------+ |
523
+ | | ^ | | | ^ | | | ^ |
524
+ | | | | | | | | | | | |
525
+ +----+-----------+----+ +----+-----------+----+ +----+-----------+----+
526
+ | traped | | mapped | | mapped | interrupt
527
+ | access | | access | | access | posting
528
+ +---+-----------+----+ +---+-----------+-----+ +---+-----------+-----+
529
+ | | VMM | | | | VMM | | | | VMM | |
530
+ | v | | | | | | | | | |
531
+ | +----------------+ | | | +---------+ | | | | |
532
+ | | Virtual Device | | | | | IRQ | | | | | |
533
+ | +----------------+ | | | | Mapping | | | | | |
534
+ | | ^ | | | +---------+ | | | | |
535
+ | | | | | | | | | | | |
536
+ +--+------------+----+ +---+-----------+-----+ +---+-----------+-----+
537
+ | | | | | |
538
+ v | v | v |
539
+ +-----------------+ +-----------------+ +-----------------+
540
+ | Physical Device | | Physical Device | | Physical Device |
541
+ +-----------------+ +-----------------+ +-----------------+
542
+
543
+
544
+
545
+ .. slide :: SR-IOV
546
+ :inline-contents: True
547
+ :level: 2
548
+
549
+ * Single Root - Input Output Virtualization
550
+
551
+ * Physical device with multiple Ethernet ports will be shown as
552
+ multiple device on the PCI bus
553
+
554
+ * Physical Function is used for the control and can be configured
555
+
556
+ * to present itself as a new PCI device
557
+
558
+ * which VLAN to use
559
+
560
+ * The new virtual function is enumerated on the bus and can be
561
+ assigned to a particular guest
424
562
425
563
426
564
qemu
451
589
:inline-contents: True
452
590
:level: 2
453
591
454
- * VMM implemented inside the Linux kernel
455
-
456
- * Requires hardware virtualization (e.g. Intel VT-x)
457
-
458
- * Shadow page tables or EPT if present
459
-
460
- * Uses qemu or virtio for I/O virtualization
461
-
462
-
463
592
.. ditaa ::
464
593
465
594
VM1 (qemu) VM2 (qemu)
483
612
+----------------------------------------------------+
484
613
485
614
615
+ .. slide :: KVM
616
+ :inline-contents: True
617
+ :level: 2
618
+
619
+ * Linux device driver for hardware virtualization (e.g. Intel VT-x, SVM)
620
+
621
+ * IOCTL based interface for managing and running virtual CPUs
622
+
623
+ * VMM components implemented inside the Linux kernel
624
+ (e.g. interrupt controller, timers)
625
+
626
+ * Shadow page tables or EPT if present
627
+
628
+ * Uses qemu or virtio for I/O virtualization
629
+
630
+
631
+
632
+ Type 1 vs Type 2 Hypervisors
633
+ ============================
634
+
635
+ .. slide :: Xen
636
+ :inline-contents: True
637
+ :level: 2
638
+
639
+ * Type 1 = Bare Metal Hypervisor
640
+
641
+ * Type 2 = Hypervisor embedded in an exist kernel / OS
642
+
643
+
486
644
Xen
487
645
===
488
646
0 commit comments