Add cloud-hypervisor VMM backend for Linux hosts#782
Open
crosbymichael wants to merge 1 commit into
Open
Conversation
apple/containerization currently runs containers in per-container VMs on macOS hosts via Virtualization.framework. This adds a second VMM backend so the same Swift orchestration layer (LinuxContainer / LinuxPod / Vminitd gRPC contract) runs on Linux hosts via cloud-hypervisor + KVM. **CloudHypervisor Swift package** (`Sources/CloudHypervisor/`) — a thin client for cloud-hypervisor's REST-over-UDS API, layered on AsyncHTTPClient. Endpoints cover VMM / VM lifecycle / hotplug (disk, fs, net, vsock, remove-device). Cross-platform (compiles on macOS for unit tests; consumed at runtime only by the Linux side of Containerization). **CH backend in Containerization** — one cloud-hypervisor subprocess per VM, gated behind `#if os(Linux)`. CHVirtualMachineManager / CHVirtualMachineInstance mirror the VZ shape behind the existing VirtualMachineManager / VirtualMachineInstance protocol. CHProcess and VirtiofsdProcess manage the binaries; CHHotplugProvider handles virtio-blk and virtio-fs runtime hotplug (with one virtiofsd per unique source-hash tag, refcounted across containers). **Linux host networking** — BridgeManager brings up a Linux bridge with an IPv4 subnet and (opt-in via `--enable-nat`) iptables MASQUERADE + scoped FORWARD rules. LinuxBridgedNetwork enslaves a fresh TAP per container to the bridge. State is recorded under `/run/containerization` so `cctl bridge delete` reverses exactly what create did. Bridge teardown verifies the link kind via sysfs to refuse deleting non-bridge interfaces. **cctl run / bridge** — end-to-end Linux container run path (image pull, ext4 rootfs assembly, VM boot, container exec) plus `cctl bridge create|delete` for the host network plumbing. **Build & dist** — `make linux-build` / `make linux-integration` build and exercise the host side inside an apple/container `--virtualization` dev container. `make dist-x86_64` produces a deployment tarball (cctl + cloud-hypervisor + virtiofsd + initfs + kernel) cross-compiled from the aarch64 dev container; pipeline documented in `docs/x86_64-build.md`. Static-musl C deps and the Zig cross compiler are pinned by SHA256. The host orchestrator runs as root. Per-VM runtime state lives under `/run/containerization/ch/<UUID>` with mode 0700; UDS sockets inside are bound with mode 0600. Vminitd's gRPC channel inherits that trust boundary — socket-file perms are the auth. Sandbox flags are upstream-secure by default. Two per-component opt-outs exist for the apple/container dev-container case (where the host seccomp profile SIGSYS-kills CH and virtiofsd): - `CONTAINERIZATION_NO_CH_SECCOMP=1` — `cloud-hypervisor --seccomp false`. - `CONTAINERIZATION_NO_VIRTIOFSD_SANDBOX=1` — `virtiofsd --sandbox none`. Each emits a one-shot `logger.warning` at process start. Legacy alias `CONTAINERIZATION_RELAXED_SANDBOX=1` flips both. cctl spawns both binaries with `setsid` and a minimal env allowlist (PATH / HOME / RUST_LOG / RUST_BACKTRACE) so the parent's secrets don't leak to children. `make linux-integration` runs the cross-platform integration suite against a real cloud-hypervisor VM inside the dev container. Linux runs the cross-platform subset (`process true`/`false`/`echo hi`, virtiofs round-trip, hotplug); the macOS suite is unchanged. Signed-off-by: michael_crosby <michael_crosby@apple.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
apple/containerization currently runs containers in per-container VMs on macOS hosts via Virtualization.framework. This adds a second VMM backend so the same Swift orchestration layer (LinuxContainer / LinuxPod / Vminitd gRPC contract) runs on Linux hosts via cloud-hypervisor + KVM.
CloudHypervisor Swift package (
Sources/CloudHypervisor/) — a thin client for cloud-hypervisor's REST-over-UDS API, layered on AsyncHTTPClient. Endpoints cover VMM / VM lifecycle / hotplug (disk, fs, net, vsock, remove-device). Cross-platform (compiles on macOS for unit tests; consumed at runtime only by the Linux side of Containerization).CH backend in Containerization — one cloud-hypervisor subprocess per VM, gated behind
#if os(Linux). CHVirtualMachineManager / CHVirtualMachineInstance mirror the VZ shape behind the existing VirtualMachineManager / VirtualMachineInstance protocol. CHProcess and VirtiofsdProcess manage the binaries; CHHotplugProvider handles virtio-blk and virtio-fs runtime hotplug (with one virtiofsd per unique source-hash tag, refcounted across containers).Linux host networking — BridgeManager brings up a Linux bridge with an IPv4 subnet and (opt-in via
--enable-nat) iptables MASQUERADE + scoped FORWARD rules. LinuxBridgedNetwork enslaves a fresh TAP per container to the bridge. State is recorded under/run/containerizationsocctl bridge deletereverses exactly what create did. Bridge teardown verifies the link kind via sysfs to refuse deleting non-bridge interfaces.cctl run / bridge — end-to-end Linux container run path (image pull, ext4 rootfs assembly, VM boot, container exec) plus
cctl bridge create|deletefor the host network plumbing.Build & dist —
make linux-build/make linux-integrationbuild and exercise the host side inside an apple/container--virtualizationdev container.make dist-x86_64produces a deployment tarball (cctl + cloud-hypervisor + virtiofsd + initfs + kernel) cross-compiled from the aarch64 dev container; pipeline documented indocs/x86_64-build.md. Static-musl C deps and the Zig cross compiler are pinned by SHA256.The host orchestrator runs as root. Per-VM runtime state lives under
/run/containerization/ch/<UUID>with mode 0700; UDS sockets inside are bound with mode 0600. Vminitd's gRPC channel inherits that trust boundary — socket-file perms are the auth.Sandbox flags are upstream-secure by default. Two per-component opt-outs exist for the apple/container dev-container case (where the host seccomp profile SIGSYS-kills CH and virtiofsd):
CONTAINERIZATION_NO_CH_SECCOMP=1—cloud-hypervisor --seccomp false.CONTAINERIZATION_NO_VIRTIOFSD_SANDBOX=1—virtiofsd --sandbox none. Each emits a one-shotlogger.warningat process start. Legacy aliasCONTAINERIZATION_RELAXED_SANDBOX=1flips both. cctl spawns both binaries withsetsidand a minimal env allowlist (PATH / HOME / RUST_LOG / RUST_BACKTRACE) so the parent's secrets don't leak to children.make linux-integrationruns the cross-platform integration suite against a real cloud-hypervisor VM inside the dev container. Linux runs the cross-platform subset (process true/false/echo hi, virtiofs round-trip, hotplug); the macOS suite is unchanged.