Skip to content
This repository was archived by the owner on Jul 16, 2024. It is now read-only.

Cloud hypervisor

Dom edited this page Feb 14, 2023 · 77 revisions

Cloud Hypervisor on Altra

Build cloud-hypervisor

Follow doc: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/building.md

Build dynamic linked binary

cargo build --release

file target/release/cloud-hypervisor
target/release/cloud-hypervisor: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1, BuildID[sha1]=6188673dff9f9a438cc8bbef34eb769fb3d74575, for GNU/Linux 3.7.0, stripped

Build static linked binary

Need to use musl as C library. The original doc does not work. On ubunut-2004:

rustup target add aarch64-unknown-linux-musl
cargo build --release --target=aarch64-unknown-linux-musl --all
error occurred: Failed to find tool. Is `aarch64-linux-musl-gcc` installed?

Do bellow:

$ sudo apt install musl-tools
$ CC=musl-gcc cargo build --release --target=aarch64-unknown-linux-musl --all

$ file target/aarch64-unknown-linux-musl/release/cloud-hypervisor
target/aarch64-unknown-linux-musl/release/cloud-hypervisor: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, BuildID[sha1]=36e2a5f9869eb13fe004323c26794f1cbe2f6842, stripped

It is also possible to use docker build cloud hypervisor, e.g:

$ ./scripts/dev_cli.sh build --release --libc musl
[Cloud Hypervisor] Binaries placed under /home/adam/cloud_hyp/cloud-hypervisor/cloud-hypervisor/build/cargo_target/aarch64-unknown-linux-musl/release

$ size build/cargo_target/aarch64-unknown-linux-musl/release/cloud-hypervisor
   text    data     bss     dec     hex filename
2856088  382716    7168 3245972  318794 build/cargo_target/aarch64-unknown-linux-musl/release/cloud-hypervisor

Run cloud-hypervisor

./target/release/cloud-hypervisor \
          --kernel ../linux-cloud-hypervisor/arch/arm64/boot/Image  \
          --disk path=$ROOTFS path=/tmp/ubuntu-cloudinit.img  \
          --cmdline "console=hvc0 root=/dev/vda1 rw" \
          --cpus boot=4   \
          --memory size=0,shared=on  \
          --memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 \
          --net "tap=,mac=,ip=,mask="

Control VM with ch-remote

We can use ch-remote tool to control an running Virtual Machine. See doc: https://www.cloudhypervisor.org/docs/prologue/commands/#ch-remote-binary

Start a VM with --api-socket option

sudo ./cloud-hypervisor ...  --api-socket /tmp/cloud-hypervisor.sock

sudo ./ch-remote --api-socket /tmp/cloud-hypervisor.sock reboot

Firmware boot

Rust Hypervisor Firmware

TBD - https://github.com/cloud-hypervisor/rust-hypervisor-firmware/issues/198, pretty new.

使用CLOUDHV_EFI.fd启动VM时卡住,配合--serial tty --console off可以启动GRUB,但启动ubuntu时报错:
Synchronous Exception at 0x000000006B8D5624

Reference cloud-hypervisor/scripts/common-aarch64.sh:
Reset edk2 to 46b4606ba23498d3d0e66b53e498eb3d5d592586,reset edk2-platforms to 8227e9e9f6a8aefbd772b40138f835121ccb2307,
reset acpica to b9c69f81a05c45611c91ea9cbce8756078d76233.
Rebuild CLOUDHV_EFI.fd,then VM can work.

EDK2 Firmware

TBD - https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/uefi.md#building-uefi-firmware-for-aarch64

用hypervisor-fw_arm64,配置focal-server-cloudimg-arm64.raw,启动后马上报错
Found EFI partition
Filesystem ready
Error loading default entry: File(NotFound)
Using EFI boot.
Found bootloader: /EFI/BOOT/BOOTAA64.EFI

用hypervisor-fw_arm64和bionic-server-cloudimg-arm64.raw,可以成功启动vm
Found EFI partition
Filesystem ready
Error loading default entry: File(NotFound)
Using EFI boot.
Found bootloader: /EFI/BOOT/BOOTAA64.EFI
Executable loaded
error: no suitable video mode found.
error: no such device: root.

Press any key to continue...

uname -a
Linux cloud 4.15.0-106-generic

Performance test

$ ./scripts/dev_cli.sh tests --metrics
Compiling hypervisor v0.1.0 (/cloud-hypervisor/hypervisor)
error[E0433]: failed to resolve: use of undeclared crate or module `mshv`
  --> hypervisor/src/lib.rs:77:8
   |
77 |     if mshv::MshvHypervisor::is_available()? {
   |        ^^^^ use of undeclared crate or module `mshv`

error[E0433]: failed to resolve: use of undeclared crate or module `mshv`
  --> hypervisor/src/lib.rs:78:16
   |
78 |         return mshv::MshvHypervisor::new();
   |                ^^^^ use of undeclared crate or module `mshv`

For more information about this error, try `rustc --explain E0433`.
error: could not compile `hypervisor` due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

It looks cloud hypervisor does not support mshv for arm64 (https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/release-notes.md#release-binary-supports-both-mshv-and-kvm).

Release Binary Supports Both MSHV and KVM
On x86-64 the binary included in releases supports both the KVM and MSHV hypervisor with runtime detection to identify the correct hypervisor to use.

So change scripts/run_metrics.sh to disable mshv in cloud hypervisor:

diff --git a/scripts/run_metrics.sh b/scripts/run_metrics.sh
index ea4aa990..ed7b4b39 100755
--- a/scripts/run_metrics.sh
+++ b/scripts/run_metrics.sh
@@ -92,7 +92,8 @@ if [[ "${BUILD_TARGET}" == "${TEST_ARCH}-unknown-linux-musl" ]]; then
     CFLAGS="-I /usr/include/${TEST_ARCH}-linux-musl/ -idirafter /usr/include/"
 fi

-cargo build --no-default-features --features "kvm,mshv" --all --release --target $BUILD_TARGET
+#cargo build --no-default-features --features "kvm,mshv" --all --release --target $BUILD_TARGET
+cargo build --no-default-features --features "kvm" --all --release --target $BUILD_TARGET

 # setup hugepages
 echo 6144 | sudo tee /proc/sys/vm/nr_hugepages

We can get metrics:

{
  "git_human_readable": "v28.0-231-g3df82337-dirty",
  "git_revision": "3df82337f1f3bc81f04d787fdd98c8c612b1099f",
  "git_commit_date": "Fri Jan 13 01:34:59 2023 +0000",
  "date": "Wed Jan 18 13:33:04 UTC 2023",
  "results": [
    {
      "name": "boot_time_ms",
      "mean": 295.95369999999997,
      "std_dev": 10.243182533275483,
      "max": 316.84599999999995,
      "min": 285.472
    },

... ...

It is interesting to note, linux kernel supports booting linux guest on MS hyper-v on ARM64 HW: https://lore.kernel.org/all/1628092359-61351-1-git-send-email-mikelley@microsoft.com/ . How to test MSHV on arm64?

VMM features

VM snapshot and restore

TBD: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/snapshot_restore.md

Kata container with cloud hypervisor

TBD: https://github.com/kata-containers/documentation/blob/master/design/virtualization.md#cloud-hypervisorkvm

vCPU hotplug

Not ready for ARM64. Refer to: https://github.com/cloud-hypervisor/linux/pull/13

Works: ./ch-remote --api-socket

  --cpus boot=4,topology=1:1:1:1
Error parsing config: Error validating configuration: Product of CPU topology parts does not match maximum vCPUs

  --cpus features=amx
Error parsing config: Invalid feature in --cpus features list: amx

All pass.

sudo ./ch-remote --api-socket /tmp/ch-socket3 resize --cpus 8
Error running command: Server responded with an error: InternalServerError: ApiError(VmResize(CpuManager(VcpuCreate(Failed to create Vcpu: Device or resource busy (os error 16)
sudo ./ch-remote --api-socket /tmp/ch-socket3 add-fs tag=myfs,socket=/tmp/virtiofs.sock
Error running command: Server responded with an error: InternalServerError: ApiError(VmAddFs(ConfigValidation(VhostUserRequiresSharedMemory)))

sudo ./ch-remote --api-socket /tmp/ch-socket3 add-net tap=chtap0
{"id":"_net2","bdf":"0000:00:05.0"}
但VM没有新增虚拟网卡

sudo ./ch-remote --api-socket /tmp/ch-socket3 add-pmem file=/tmp/ubuntu-cloudinit_2.img
{"id":"_pmem4","bdf":"0000:00:06.0"}
sudo ./ch-remote --api-socket /tmp/ch-socket3 add-vsock cid=3,socket=/tmp/vsock.sock
命令运行成功,但不知道在VM中如何验证
 --memory size=8G,hugepages=on 不识别hugepages=on,看help是支持的
Error booting VM: VmBoot(MemoryManager(SharedFileSetLen(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })))

No test case.

 --net ip=192.168.101.1
[  OK  ] Started Network Time Synchronization.
[  OK  ] Reached target System Time Set.
[  OK  ] Reached target System Time Synchronized.
It will be blocked for a few minutes to wait for the network.

The document require to run cloud-hypervisor in VM,but it return error:
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./cloud-hypervisor)
./cloud-hypervisor: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./cloud-hypervisor)

TBD

Following the macvtap setup doc to setup macvtap, CLH report error. Details in: https://github.com/cloud-hypervisor/cloud-hypervisor/discussions/5084 https://github.com/cloud-hypervisor/cloud-hypervisor/issues/4917

Workaround is to use bridge/tap instead:

On host:
sudo ip tuntap add mytap0 mode tap              #Create tap
sudo brctl addbr br0                            #Create bridge
sudo brctl addif br0 mytap0 enP9p3s0            #Connect tap and netcard with bridge
sudo ifconfig mytap0 0 up                       #Clear IP of tap
sudo ifconfig enP9p3s0 0 up                     #Clear IP of netcard
sudo dhclient br0                               #Get IP for bridge
sudo dhclient enP9p3s0                          #Get IP for netcard

Run cloud-hypervisor with --net "tap=mytap0"

On VM:
sudo dhclient enp0s5                            #You can find enp0s5 after run "ip addr"
这两条命令在VM中的内存总量是1.5G,其他的都是1G
--memory size=1G,hotplug_method=virtio-mem,hotplug_size=1G,hotplugged_size=512M
--memory size=0,hotplug_method=virtio-mem --memory-zone id=mem0,size=1G,hotplug_size=1G,hotplugged_size=512M

 --memory size=0 --memory-zone id=mem0,size=1G,hotplug_size=1G
cloud-hypervisor: 4.763259ms: <vmm> ERROR:vmm/src/memory_manager.rs:741 -- Invalid to set ACPI hotplug method for memory zones
Error booting VM: VmBoot(MemoryManager(InvalidHotplugMethodWithMemoryZones))
正确配置:
--memory size=0,hotplug_method=virtio-mem --memory-zone id=mem0,size=1G,hotplug_size=1G

--numa guest_numa_id=0
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', vmm/src/config.rs:1897:67

 --numa guest_numa_id=0,cpus=[1-3,7] guest_numa_id=1,cpus=[0,4-6]
Unrecognized argument: guest_numa_id=1,cpus=[0,4-6]

 --memory size=0 --memory-zone id=mem0,size=1G --memory-zone id=mem1,size=1G --memory-zone id=mem2,size=1G --numa guest_numa_id=0,memory_zones=[mem0,mem2] --numa guest_numa_id=1,                         memory_zones=mem1
thread 'vmm' panicked at 'called `Option::unwrap()` on a `None` value', arch/src/aarch64/fdt.rs:729:22

 --sgx-epc id=epc0,size=32M id=epc1,size=64M id=epc2,size=32M
 --numa guest_numa_id=0,sgx_epc_sections=epc1 --numa guest_numa_id=1,sgx_epc_sections=[epc0,epc2]
Arm不支持sgx
sudo ./cloud-hypervisor --api-socket /tmp/ch0 --restore source_url=file:///home/dom/cloud-hypervisor/bin/dir
Blocked,output nothing.

# First terminal
sudo ./cloud-hypervisor --api-socket /tmp/ch0

# Second terminal
sudo ./ch-remote --api-socket /tmp/ch0 restore source_url=file:///home/dom/cloud-hypervisor/bin/dir/
Return immediately,output nothing.
使用CLOUDHV_EFI.fd启动VM时卡住,配合--serial tty --console off可以启动GRUB,但启动ubuntu时报错:
Synchronous Exception at 0x000000006B8D5624
确认/sys/bus/pci/devices/0003:04:00.0/下有多个文件
用 --device path=/sys/bus/pci/devices/0003:04:00.0/启动vm
Error booting VM: VmBoot(DeviceManager(VfioCreate(OpenGroup(Os { code: 2, kind: NotFound, message: "No such file or directory" }, "44"))))

--device 'path=/sys/bus/pci/devices/0003\:04\:00.0/
Error booting VM: VmBoot(DeviceManager(VfioCreate(InvalidPath)))
 --user-device socket=/tmp/vfio-user.sock
cloud-hypervisor: 97.668894ms: <vmm> WARN:vfio_user/src/lib.rs:627 -- Ignoring unsupported vfio region capability (id = '2')
thread 'vmm' panicked at 'not implemented', pci/src/vfio.rs:641:21

./scripts/rpc.py nvmf_subsystem_add_listener nqn.2019-07.io.spdk:cnode -t VFIOUSER -a /tmp/nvme-vfio-user/ -s 0
[2023-02-08 02:33:57.142873] vfio_user.c:4409:nvmf_vfio_user_listen: *ERROR*: /tmp/nvme-vfio-user/: error to mmap file /tmp/nvme-vfio-user//bar0: Invalid argument.
[2023-02-08 02:33:57.142990] nvmf.c: 676:spdk_nvmf_tgt_listen_ext: *ERROR*: Unable to listen on address '/tmp/nvme-vfio-user/'

 --user-device socket=/tmp/nvme-vfio-user/cntrl
Error booting VM: VmBoot(DeviceManager(VfioUserCreateClient(Connect(Os { code: 2, kind: NotFound, message: "No such file or directory" }))))
cntrl不存在,不知道如何获取
VM可以启动,但linux无法启动,一直卡住
https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/virtiofs-root.md#to-create-the-vm-rootfs
for循环中所有提到的文件在host上都找不到.

TBD

Errors and fix

  1. virtualbox已经启动,此时再启动vm,此处提示设备忙
thread 'vmm' panicked at 'called `Result::unwrap()` on an `Err` value: VmCreate(Device or resource busy (os error 16))', vmm/src/vm.rs:863:41
  1. 启动vm里的--disk所指定的文件不存在时报错:
Error booting VM: VmBoot(DeviceManager(Disk(Os { code: 2, kind: NotFound, message: "No such file or directory" })))
  1. VM netowrk还没有调通,参考
https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/macvtap-bridge.md

在启动vm时使用--net "tap=virbr,mac=52:54:00:12:34:56",

Error booting VM: VmBoot(DeviceManager(CreateVirtioNet(OpenTap(TapOpen(ConfigureTap(Os { code: 16, kind: ResourceBusy, message: "Device or resource busy" }))))))

使用--net fd=3,mac=$mac 3<>$"$tapdevice"

Error booting VM: VmBoot(DeviceManager(CreateVirtioNet(TapError(IoctlError(2147767506, Os { code: 25, kind: Uncategorized, message: "Inappropriate ioctl for device" })))))

以下方法仅能在x86平台上验证pass,命令./ch-remote --api-socket=/tmp/ch-socket3 add-net tap=chtap1无法在arm VM上生成ens5. 另外,之前的命令中有一条brctl stp br0 on,导致交换机的端口直接关闭,任何设备接入该端口都没有网络连接,慎用.

sudo ./cloud-hypervisor \
        --kernel ./hypervisor-fw \
        --console off \
        --serial tty \
        --disk path=./focal-server-cloudimg-amd64.raw \
        --cmdline "root=/dev/vda1 console=hvc0" \
        --cpus boot=4 \
        --memory size=1024M \
        --api-socket=/tmp/ch-socket3

#Create chtap1 on host and create ens5 on VM
sudo ./ch-remote --api-socket=/tmp/ch-socket3 add-net tap=chtap1         

## in the host machine
sudo brctl addbr br0             #add bridge
sudo brctl addif br0 eno2        #add an interface to bridge
sudo ifconfig eno2 0 up          #The Ethernet physical interface becomes a logical port on the network bridge 
                                 #and a part of the logical bridge device, so the IP address is no longer required. Release these IP addresses
sudo dhclient br0                #get IP
route -n
sudo ip link set dev chtap1 up   #enable chtap1
sudo brctl addif br0 chtap1      #add another interface to bridge
sudo iptables -P FORWARD ACCEPT  #it is needed to connect to internet

## in the guest
sudo ip link set dev ens3 up
sudo dhclient ens3 # (or set static ip using this command:  sudo ip addr add ***.***.***.***/24 dev ens3)

4.Proxy配置

[7] Couldn't connect to server (Failed to connect to 127.0.0.1 port 4781 after 0 ms: Connection refused); class=Net (12)

config.json 配置代理要用host的IP,而不是127.0.0.1

/root/.docker/config.json

 {
  "proxies":
  {
    "default":
    {
      "httpProxy": "socks5://10.0.0.41:4781",
      "httpsProxy": "socks5://10.0.0.41:4781",
      "noProxy": "*.test.example.com,.example2.com,127.0.0.0/8"
    }
  }
 }

5.重启网卡

ifconfig eno2 0

会把eno2关闭,下次再打开,输入ifconfig eno2 up但此时没有获取IP,需要再执行dhclient eno2

6.wget不支持socks5代理

wget --quiet

该命令除了不打印下载过程,连报错也不打印,把quiet删了后才打印出error:

Error parsing proxy URL socks5://10.0.0.41:4781: Unsupported scheme 'socks5'.

7.wget配置http代理

Error parsing proxy URL socks5://10.0.0.41:4781: Unsupported scheme 'socks5'.

把/root/.docker/config.json里的socks5://10.0.0.41:4781换成 http://10.0.0.41:4780 就好了

cargo build --no-default-features --features kvm,mshv --all --release --target aarch64-unknown-linux-gnu
use of undeclared crate or module `mshv`

mshv是微软贡献 Linux 内核代码,可运行多个 Windows

mshv: 增加对检测嵌套的 hypervisor 的支持

hv, mshv : 改变嵌套 root 分区的中断向量

Test 'boot_time_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', test_infra/src/lib.rs:1232:18

把脚本改成cargo build --all --release --target $BUILD_TARGET

成功了几个测试,但还是失败了

 Test 'boot_time_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )                                                                                                                     
 Test 'boot_time_ms' .. ok: mean = 413.76223999999996, std_dev = 28.015316271611148                                                                                                                                 
 Test 'boot_time_pmem_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )                                                                                                                
 Test 'boot_time_pmem_ms' .. ok: mean = 18.412709999999997, std_dev = 0.21074413135364106                                                                                                                           
 Test 'boot_time_16_vcpus_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )                                                                                                            
 Test 'boot_time_16_vcpus_ms' .. ok: mean = 749.25443, std_dev = 144.4245624768658                        
 Test 'boot_time_16_vcpus_pmem_ms' running .. (control: test_timeout = 2s, test_iterations = 10, overrides: )
 thread '<unnamed>' panicked at 'assertion failed: `(left == right)`                                                                                                                                                
   left: `1`,                                                                                                                                                                                                        
 right: `2`: Expecting two matching lines for 'Debug I/O port: Kernel code'', performance-metrics/src/performance_tests.rs:176:9
 stack backtrace:                                                                                         
 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
  1. uart error 启动vm命令:
sudo ./target/release/cloud-hypervisor \
          --kernel ../linux-cloud-hypervisor/arch/arm64/boot/Image  \
          --disk path=$ROOTFS --disk path=/tmp/ubuntu-cloudinit.img  \
          --cmdline "console=hvc0 root=/dev/vda1 rw" \
          --cpus boot=4   \
          --memory size=0,shared=on  \
          --memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 \
          --net "tap=,mac=,ip=,mask=" \
          --serial tty \
          --console off

vm启动几秒就卡住

[    1.200267] EXT4-fs (vda1): re-mounted. Opts: (null). Quota mode: disabled.
[    1.888490] squashfs: SQUASHFS error: Xattrs in filesystem, these will be ignored
[    1.890036] unable to read xattr id index table
or
[    0.001761] Console: colour dummy device 80x25
[    0.002461] printk: console [tty0] enabled
[    0.003070] printk: bootconsole [pl11] disabled
cloud-hypervisor: 698.505152ms: <vcpu2> WARN:devices/src/legacy/uart_pl011.rs:358 -- [Debug I/O port: Kernel code: 0x41] 0.695055 seconds

通过kill进程的方式停止vm,可以打印以下错误:

VMM thread exited with error: Error shutting down VMM: SetTerminalCanon(Error(5))
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error(5)', src/main.rs:594:50

相关代码如下:

 589     // SAFETY: trivially safe
 590     let on_tty = unsafe { libc::isatty(libc::STDIN_FILENO) } != 0;
 591     if on_tty {
 592         // Don't forget to set the terminal in canonical mode
 593         // before to exit.
 594         std::io::stdin().lock().set_canon_mode().unwrap();
 595     }
Clone this wiki locally