Skip to content

libriscv/kvmserver

Repository files navigation

Kvmserver - Fast per-request isolation for Linux executables with TinyKVM

KVM Server applies TinyKVM's fast sandboxing technology to existing unmodified Linux server executables in order to provide per request isolation with extremely low overhead.

KVM Server intercepts the programs epoll event loop, and guides new accepted connections onto tiny forked instances of the sandboxed process. After each request concludes, TinyKVM resets the guest to a pristine state far more quickly than Linux is able to fork a process. TinyKVM is able to achieve such extremely fast reset times by running the guest processes under an emulated Linux userspace in KVM.

This approach is uniquely elegant for JIT'ed runtimes where existing options require choosing between fast execution with virtualization (but no per-request isolation), slow process forking, slow v8 isolates; or very very slow interpreters like QuickJS embedded in WebAssembly.

Previous experiments with a real world React rendering benchmark have shown runtimes in the 10s of milliseconds with WebAssembly (which does not support JIT) or reset times of several milliseconds to either fork a process or start a new V8 isolate. We are able to run this benchmark 1.5-2 orders of magnitude faster than existing solutions, while running unmodified dynamic executables taken directly from the latest version, creating a new frontier for per-request isolation.

Benchmarks

Detailed benchmark results

Rust minimal http server

name average p50 p90 p99
native 13 µs 11 µs 17 µs 20 µs
kvmserver threads=1 23 µs 20 µs 29 µs 33 µs
kvmserver ephemeral threads=1 28 µs 28 µs 30 µs 37 µs
kvmserver ephemeral threads=2 34 µs 33 µs 37 µs 51 µs
kvmserver ephemeral threads=4 36 µs 35 µs 39 µs 57 µs
kvmserver ephemeral threads=2 no-tail 28 µs 28 µs 32 µs 36 µs

Deno helloworld

name average p50 p90 p99
native (reusing connection) 11 µs 10 µs 14 µs 16 µs
native 17 µs 15 µs 22 µs 29 µs
kvmserver threads=1 33 µs 32 µs 33 µs 45 µs
kvmserver ephemeral threads=1 50 µs 49 µs 53 µs 75 µs
kvmserver ephemeral threads=2 58 µs 57 µs 62 µs 90 µs
kvmserver ephemeral threads=4 60 µs 59 µs 65 µs 90 µs
kvmserver ephemeral threads=2 no-tail 41 µs 38 µs 46 µs 59 µs

Deno React page rendering

name average p50 p90 p99
native (reusing connection) 642 µs 606 µs 673 µs 805 µs
native 646 µs 619 µs 670 µs 820 µs
kvmserver threads=1 649 µs 619 µs 674 µs 798 µs
kvmserver ephemeral threads=1 695 µs 689 µs 712 µs 790 µs
kvmserver ephemeral threads=2 705 µs 704 µs 722 µs 755 µs
kvmserver ephemeral threads=4 711 µs 710 µs 728 µs 758 µs
kvmserver ephemeral threads=2 no-tail 639 µs 634 µs 662 µs 721 µs

Benmark details

  • Non-ephemeral benchmark shows the overhead of sandboxing without any reset between requests.
  • No-tail benchmark runs with only a single load generator connection to measure latency excluding time spent after the response is sent to the client.
  • Deno is run with --v8-flags=--predictable which causes all work to happen on thread. (At median this makes a 1.5% difference for the React benchmark and none for helloworld.)
  • 1000 warmup requests were used to warm the JIT before benchmarking.
  • deno compile was used to avoid starting background disk cache threads.
  • The Rust minimal http server always closes connections.
  • Benchmarks were run on AMD Ryzen 9 7950X (32) @ 5.881Ghz with deno 2.3.6.

Performance characterization

The React benchmark runs with 10µs of connection creation overhead, 15µs of sandbox execution overhead, and 55µs of sandbox reset overhead for a total of 80µs out of 690µs. Performance is more consistent since reset avoids JIT spikes.

  • Execution of processes inside KVM generally runs at full speed.
  • Any syscalls requiring communication with the host incur overhead of around a microsecond in VM context switching and permission checking.
  • VM reset accounts for most of the overhead. It is tail latency incurred after connection close and consists of:
    • Event loop / file descriptor reset, proportional to the number of open file descriptors.
    • Memory reset time, proportional to the number of dirty memory pages which must be reset.

For simple endpoints the network stack overhead from establishing a new tcp connection can be significant so best performance is achieved by listening on a unix socket and serving incoming tcp connections through a reverse proxy to enable client connection reuse.

Nested virtualization incurs additional overhead that will vary depending on the cpu security mitigations applied. On an AMD Ryzen 7 7840HS running on Linux 6.11 we see around 200µs of additional overhead running nested under QEMU.

Memory usage

Kvmserver forks are very memory efficient since they only need allocate for pages written during a request (which are reset afterwards). This is great for largely single-threaded JITs like V8 since a large RSS can be amortized over many forked VMs.

A simple benchmark rendering the same page over and over is the best case scenario. Expect real-world usage to touch more pages, but will still see substantial savings.

Program RSS Reset
Rust minimal http server 9 MB 68 KB
Deno hello world 102 MB 452 KB
Deno react renderer 162 MB 2324 KB

Runtime requirements

  • Access to /dev/kvm is required. This normally requires your user to the kvm group. Changes to group membership usually take effect at next login.

Command line arguments

kvmserver [OPTIONS] program [args...]


POSITIONALS:
  program TEXT REQUIRED       Program
  args TEXT ...               Program arguments

OPTIONS:
  -h,     --help              Print this help message and exit
  -c,     --config [kvmserver.toml]
                              Read a toml file
          --cwd TEXT [/home/lrowe/devel/kvmserver/.build]
                              Set the guests working directory
          --env TEXT ...      add an environment variable
  -t,     --threads UINT [1]  Number of request VMs (0 to use cpu count)
  -e,     --ephemeral         Use ephemeral VMs
  -w,     --warmup UINT [0]   Number of warmup requests
  -v,     --verbose           Enable verbose output
          --print-config      Print config and exit without running program

Permissions:
          --allow-all Excludes: --allow-read --allow-write --allow-env --allow-net --allow-connect --allow-listen --volume
                              Allow all access
          --allow-read{/} Excludes: --allow-all
                              Allow filesystem read access
          --allow-write{/} Excludes: --allow-all
                              Allow filesystem write access
          --allow-env{*} Excludes: --allow-all
                              Allow access to environment variables. Optionally specify
                              accessible environment variables (e.g.
                              --allow-env=USER,PATH,API_*).
          --allow-net Excludes: --allow-all
                              Allow network access
          --allow-connect Excludes: --allow-all
                              Allow outgoing network access
          --allow-listen Excludes: --allow-all
                              Allow incoming network access
          --volume Excludes: --allow-all
                              <host-path>:<guest-path>[:r?w?=r]

Advanced:
          --max-boot-time FLOAT [20]
          --max-request-time FLOAT [8]
          --max-main-memory UINT [8192]
          --max-address-space UINT [131072]
          --max-request-memory UINT [128]
          --limit-request-memory UINT [128]
          --shared-memory UINT [0]
          --dylink-address-hint UINT [2]
          --heap-address-hint UINT [256]
          --hugepage-arena-size UINT [0]
          --hugepage-requests-arena UINT [0]
          --no-executable-heap{false}
          --hugepages
          --no-split-hugepages{false}
          --transparent-hugepages
          --no-relocate-fixed-mmap{false}
          --no-ephemeral-keep-working-memory{false}
          --remapping ...     virt:size(mb)[:phys=0][:r?w?x?=rw]

Configuration file

By default kvmserver will look for a file named kvmserver.toml in the current directory and if it exists will read configuration from there.

Command line arguments and configuration are handled by CLI11 which supports a subset of TOML. Notably array values must be kept to a single line.

Binary release

Binary releases may be downloaded fomr the GitHub releases page. This binary requires glibc 2.34 or higher and should work on: Debian 12+ / Ubuntu 21.10+ / Fedora 35+ / CentOS/RHEL 9+.

Build from source

On debian based distributions cmake libc6-dev g++ make are required.

Run make to build .build/kvmserver.

Running in Docker or Podman

The binary release is also available from GitHub Container Registry and may be copied into your glibc based Dockerfile:

FROM debian:bookworm-slim
COPY --from=ghcr.io/libriscv/kvmserver:bin /kvmserver /usr/local/bin/

Give permission to access /dev/kvm when running container:

KVM_GID=$(getent group kvm | cut -d: -f3) \
docker run --rm --device /dev/kvm --group-add $KVM_GID IMAGE
podman run --rm --device /dev/kvm --group-add keep-groups IMAGE

On Ubuntu 24.04 you likely want to install a recent podman from podman-static and podman-compose from PyPI podman-compose.

Docker compose / podman-compose

See: docker-compose.yml.

Run with docker compose:

docker compose build
KVM_GID=$(getent group kvm | cut -d: -f3) docker compose up
docker compose down --volumes

Run with podman-compose:

podman-compose build
podman-compose up
podman-compose down --volumes

Guest program considerations

KVM Server is intended for running primarily single threaded guest programs with parallel execution handled with VM forking. There is some basic threading support to allow guests to make progress when threads are used during startup but threads should be avoided where possible.

For details see the integration tested example guest programs:

About

Fast per-request isolation for Linux executables with TinyKVM

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •  

Languages