diff --git a/README.md b/README.md index 6224ab9..90c3e02 100644 --- a/README.md +++ b/README.md @@ -110,10 +110,12 @@ Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …): ``` `image pull` downloads the image, flattens its layers into an ext4 -rootfs, and registers it as a managed banger image. Experimental — see -[`docs/oci-import.md`](docs/oci-import.md) for current limitations -(notably: file-ownership caveat means pulled images are a base for -`image build`, not yet directly bootable). +rootfs, applies tar-header ownership via debugfs, and pre-injects +banger's guest agents (vsock agent + network bootstrap + a first-boot +unit that installs `openssh-server` via the guest's native package +manager). Boots as a banger VM directly, no `image build` step +required. See [`docs/oci-import.md`](docs/oci-import.md) for +supported distros and current limitations. Build a managed image from an existing registered image: diff --git a/docs/oci-import.md b/docs/oci-import.md index 24720b4..43aeb7d 100644 --- a/docs/oci-import.md +++ b/docs/oci-import.md @@ -15,15 +15,7 @@ banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12 banger image list # debian-bookworm appears, Managed=true ``` -## Status: Phase A (acquisition only) - -This is the first of a two-phase initiative. **Phase A (this feature)** -produces a working ext4 file from an OCI reference. **Phase B (not yet -implemented)** will add the steps needed to make the pulled image -directly bootable — init system hook-up, sshd install, vsock agent -drop-in, network bootstrap, and **file-ownership fixup**. - -What works today: +## What works - Pulling any public OCI image that exposes a `linux/amd64` manifest. - Correct layer replay with whiteout semantics (`.wh.*` deletes, @@ -31,20 +23,35 @@ What works today: - Path-traversal and relative-symlink-escape protection. - Content-aware default sizing (`content × 1.25`, floor 1 GiB). - Layer caching on disk, keyed by blob SHA256. +- **File ownership preservation.** Tar-header uid/gid/mode is captured + during flatten and applied to the resulting ext4 via a `debugfs` + pass, so setuid binaries (`sudo`, `passwd`) and root-owned config + files (`/etc/shadow`, `/etc/sudoers`) end up correctly owned. +- **Banger guest agents pre-injected.** The pulled ext4 ships with + `/usr/local/bin/banger-vsock-agent`, `banger-network.service`, and + `banger-vsock-agent.service` already in place and enabled. +- **First-boot sshd install.** A one-shot systemd service installs + `openssh-server` via the guest's package manager on first boot — + apt-get / apk / dnf / pacman / zypper dispatch based on + `/etc/os-release`. Subsequent boots skip the install. - Piping pulled images into the existing `banger image build --from-image` flow. -What does not yet work: +## What doesn't yet work -- **Booting a pulled image directly.** The produced ext4 has file - ownership set to the *runner's* uid/gid, not the tar headers'. - Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the - VM. This is deferred to Phase B. - **Private registries**. Auth is not implemented; anonymous pulls only. Docker Hub, GHCR (public), quay.io (public), etc. all work. -- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so - pulled rootfses match. `arm64` is additive in the schema; wire-up - lands when a user needs it. +- **Non-`linux/amd64` platforms**. The kernel catalog is x86_64-only, + so pulled rootfses match. `arm64` is additive in the schema; wire- + up lands when a user needs it. +- **Non-systemd distros.** The injected units assume systemd as PID 1. + Alpine ≥3.20 ships systemd; older alpine + void + busybox-init + images won't honour the banger-network / banger-first-boot units. +- **First boot needs network access.** The provisioning step reaches + out to the distro's package repo to install openssh-server. VMs + without NAT or without the bridge reaching the internet will time + out on first boot. The marker file stays in place so a later boot + retries. ## Architecture @@ -53,15 +60,32 @@ What does not yet work: - **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s `remote.Image` with the `linux/amd64` platform pinned. Layer blobs are cached on disk via `cache.NewFilesystemCache` under - `/blobs/sha256/` — OCI-standard layout so - `skopeo` or `crane` could co-exist. + `/blobs/` — Pull itself does not drain the layer + streams; that happens lazily during `Flatten`, and the cache + populates on read. - **`Flatten`** (`flatten.go`) replays layers oldest-first into a staging directory, applying whiteouts and rejecting unsafe paths. + Returns a `Metadata` map capturing per-file uid/gid/mode from + each tar header. - **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d -E root_owner=0:0` to populate the image file at create time — no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43` - (`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all + (`mkfs.ext4 -d` is the populate-at-create flag; nearly all modern distros ship it). +- **`ApplyOwnership`** (`ownership.go`) streams a batched + `set_inode_field` script to `debugfs -w -f -` to rewrite per-file + uid/gid/mode to the captured tar-header values. Without this pass + the ext4 would carry the runner's on-disk uids. +- **`InjectGuestAgents`** (`inject.go`) uses the same `debugfs` + scripting to drop banger's guest-side assets into the pulled ext4 + with root ownership: + - `/usr/local/bin/banger-vsock-agent` + - `/usr/local/libexec/banger-network-bootstrap` + - `/usr/local/libexec/banger-first-boot` + - `/etc/systemd/system/banger-{network,vsock-agent,first-boot}.service` + - enable-at-boot symlinks under `multi-user.target.wants/` + - `/etc/modules-load.d/banger-vsock.conf` + - `/var/lib/banger/first-boot-pending` (marker file) `internal/daemon/images_pull.go` orchestrates: @@ -74,13 +98,44 @@ What does not yet work: tree under `os.TempDir` (bulk transient data stays off the persistent state filesystem). 5. `imagepull.BuildExt4` produces `/rootfs.ext4`. -6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside. -7. Atomic `os.Rename(, )` publishes the artifact dir. -8. Persist a `model.Image{Managed: true, …}` record. +6. `ApplyOwnership` + `InjectGuestAgents` run in one finalize step. +7. `imagemgr.StageBootArtifacts` stages the kernel triple alongside. +8. Atomic `os.Rename(, )` publishes the artifact dir. +9. Persist a `model.Image{Managed: true, …}` record. Any failure removes the staging dir. Post-rename failures remove the final dir and roll back the store write. +## Guest-side boot sequence + +On the first boot of a pulled image, systemd starts three banger +units in order: + +1. **`banger-network.service`** — runs the bootstrap script that + parses `/etc/banger-network.conf` (written by banger's VM-create + lifecycle) and brings the guest interface up with the assigned IP. +2. **`banger-first-boot.service`** (only on first boot; removes its + own trigger file on success) — reads `/etc/os-release`, dispatches + to the native package manager, installs `openssh-server`, enables + `ssh.service` / `sshd.service`. +3. **`banger-vsock-agent.service`** — runs the health-check daemon + banger uses to confirm the VM is alive. + +After first boot completes, subsequent boots skip the install step +entirely. Banger's host-side SSH polling (`guest.WaitForSSH`) +naturally retries until sshd is listening. + +## Adding distro support + +`internal/imagepull/assets/first-boot.sh` is the POSIX-sh dispatch. +Add a new `ID=` branch and its install command to the `case` block, +then rebuild banger — the asset is `go:embed`-ed into the binary. +Supported `ID` values today: `debian`, `ubuntu`, `kali`, `raspbian`, +`linuxmint`, `pop`, `alpine`, `fedora`, `rhel`, `centos`, `rocky`, +`almalinux`, `arch`, `archlinux`, `manjaro`, `opensuse*`, `suse`. +Unknown distros fall back to `ID_LIKE`, then error clearly with a +pointer to edit the script. + ## Paths | What | Where | Purpose | @@ -92,10 +147,9 @@ final dir and roll back the store write. ## Composition with `image build` -A pulled image is "unconfigured" — it has no sshd, no vsock agent, no -banger-specific network unit, and file ownership is wrong for boot. -The natural next step is to feed it through the existing customization -pipeline: +A pulled image boots as-is — ownership is correct, sshd installs on +first boot, banger's agents are in place. That means the existing +`image build --from-image` pipeline composes on top: ```bash banger image build --from-image debian-bookworm --name debian-dev --docker @@ -103,32 +157,11 @@ banger image build --from-image debian-bookworm --name debian-dev --docker `image build` spins up a transient VM using the base image, runs `scripts/customize.sh` over it, and saves the result as a new managed -image. This is already how the opinionated `void` / `alpine` images -are produced today. - -The bootability gap means this composition only works once Phase B -lands an ownership-fixup pass. Until then, `image pull` gives you a -recorded primitive; the boot story requires the legacy manual rootfs -scripts. +image with the opinionated tooling (mise, opencode, claude, pi, tmux +plugins, optionally docker) layered on top. ## Tech debt -- **File-ownership preservation**. The ext4 is populated from a tree - extracted as the current user — `mkfs.ext4 -d` then copies those - on-disk uids/gids verbatim. Setuid bits survive but with the wrong - owner, so privilege escalation is broken inside the VM. Planned - fixes: - - **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay - tar headers through `debugfs -w` with `set_inode_field` to - rewrite per-file uid/gid/mode. No new runtime deps (debugfs - ships with e2fsprogs). Moderate implementation; keeps us on - `mkfs.ext4 -d`. - - **`tar2ext4`**: Microsoft's hcsshim ships a Go package that - streams tar entries directly into an ext4 image, preserving - ownership. Heavier dependency graph but purpose-built. - - Either approach lives in Phase B. - - **Auth**. When we add private-registry support, the natural path is `authn.DefaultKeychain` from `go-containerregistry`, which already honours `~/.docker/config.json` and the standard credential @@ -138,13 +171,17 @@ scripts. forever. A `banger image cache prune` command is a cheap follow-up when disk usage becomes a complaint. -- **Ownership fixup via user namespaces**. An alternative to - debugfs / tar2ext4 is running the entire extraction inside a user - namespace (`unshare -Ufr`), which lets us set uid=0 on files from - a non-privileged process. Cleaner in theory but requires - user-namespace support on the host and doesn't help when the - resulting tree is then passed to `mkfs.ext4 -d` (which copies - on-disk uids). +- **First-boot timeout UX**. If you run `banger vm ssh` immediately + after `banger vm create`, the package install for `openssh-server` + may still be running and SSH will fail. Current mitigation: retry. + Better: a per-image `FirstBootPending` flag that tells the daemon + to extend its SSH wait timeout for the first boot, cleared on + success. Tracked but not implemented. + +- **Non-systemd distros**. The guest agents assume systemd. Adding + openrc / s6 / busybox-init variants means keeping parallel unit + trees in `inject.go` keyed on `/etc/os-release`. Only pick up + when a user actually wants it. ## Trust model