Phase B-4: docs for Phase B completion

docs/oci-import.md: removed the "Phase A acquisition-only" framing and the bootability-gap warnings. Expanded architecture section with ApplyOwnership + InjectGuestAgents. Added a "guest-side boot sequence" diagram-in-prose showing network → first-boot → vsock- agent unit ordering. Added a "how to add distro support" section pointing at the ID-case dispatch in first-boot.sh. README.md: replaced the experimental-caveat block with an honest "boots as a banger VM directly, no image build step required" description. Pointer to the docs for distro support details. Tech-debt list trimmed — ownership fixup and first-boot install are no longer planned work, they shipped. What remains: private- registry auth (authn.DefaultKeychain), cache eviction, first-boot timeout UX (retry still works but could be smoother with a FirstBootPending flag), non-systemd distros. All 20 packages green. make lint clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 19:06:37 -03:00 · 2026-04-16 19:06:37 -03:00 · 2478fe3cc3
commit 2478fe3cc3
parent bddfa75feb
2 changed files with 100 additions and 61 deletions
--- a/README.md
+++ b/README.md
@ -110,10 +110,12 @@ Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …):
 ```

 `image pull` downloads the image, flattens its layers into an ext4
-rootfs, and registers it as a managed banger image. Experimental — see
-[`docs/oci-import.md`](docs/oci-import.md) for current limitations
-(notably: file-ownership caveat means pulled images are a base for
-`image build`, not yet directly bootable).
+rootfs, applies tar-header ownership via debugfs, and pre-injects
+banger's guest agents (vsock agent + network bootstrap + a first-boot
+unit that installs `openssh-server` via the guest's native package
+manager). Boots as a banger VM directly, no `image build` step
+required. See [`docs/oci-import.md`](docs/oci-import.md) for
+supported distros and current limitations.

 Build a managed image from an existing registered image:

--- a/docs/oci-import.md
+++ b/docs/oci-import.md
@ -15,15 +15,7 @@ banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
 banger image list     # debian-bookworm appears, Managed=true
 ```

-## Status: Phase A (acquisition only)
-
-This is the first of a two-phase initiative. **Phase A (this feature)**
-produces a working ext4 file from an OCI reference. **Phase B (not yet
-implemented)** will add the steps needed to make the pulled image
-directly bootable — init system hook-up, sshd install, vsock agent
-drop-in, network bootstrap, and **file-ownership fixup**.
-
-What works today:
+## What works

 - Pulling any public OCI image that exposes a `linux/amd64` manifest.
 - Correct layer replay with whiteout semantics (`.wh.*` deletes,
@ -31,20 +23,35 @@ What works today:
 - Path-traversal and relative-symlink-escape protection.
 - Content-aware default sizing (`content × 1.25`, floor 1 GiB).
 - Layer caching on disk, keyed by blob SHA256.
+- **File ownership preservation.** Tar-header uid/gid/mode is captured
+  during flatten and applied to the resulting ext4 via a `debugfs`
+  pass, so setuid binaries (`sudo`, `passwd`) and root-owned config
+  files (`/etc/shadow`, `/etc/sudoers`) end up correctly owned.
+- **Banger guest agents pre-injected.** The pulled ext4 ships with
+  `/usr/local/bin/banger-vsock-agent`, `banger-network.service`, and
+  `banger-vsock-agent.service` already in place and enabled.
+- **First-boot sshd install.** A one-shot systemd service installs
+  `openssh-server` via the guest's package manager on first boot —
+  apt-get / apk / dnf / pacman / zypper dispatch based on
+  `/etc/os-release`. Subsequent boots skip the install.
 - Piping pulled images into the existing `banger image build
  --from-image` flow.

-What does not yet work:
+## What doesn't yet work

- **Booting a pulled image directly.** The produced ext4 has file
-  ownership set to the *runner's* uid/gid, not the tar headers'.
-  Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
-  VM. This is deferred to Phase B.
 - **Private registries**. Auth is not implemented; anonymous pulls
  only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
-  pulled rootfses match. `arm64` is additive in the schema; wire-up
-  lands when a user needs it.
+- **Non-`linux/amd64` platforms**. The kernel catalog is x86_64-only,
+  so pulled rootfses match. `arm64` is additive in the schema; wire-
+  up lands when a user needs it.
+- **Non-systemd distros.** The injected units assume systemd as PID 1.
+  Alpine ≥3.20 ships systemd; older alpine + void + busybox-init
+  images won't honour the banger-network / banger-first-boot units.
+- **First boot needs network access.** The provisioning step reaches
+  out to the distro's package repo to install openssh-server. VMs
+  without NAT or without the bridge reaching the internet will time
+  out on first boot. The marker file stays in place so a later boot
+  retries.

 ## Architecture

@ -53,15 +60,32 @@ What does not yet work:
 - **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
  `remote.Image` with the `linux/amd64` platform pinned. Layer
  blobs are cached on disk via `cache.NewFilesystemCache` under
-  `<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
-  `skopeo` or `crane` could co-exist.
+  `<OCICacheDir>/blobs/` — Pull itself does not drain the layer
+  streams; that happens lazily during `Flatten`, and the cache
+  populates on read.
 - **`Flatten`** (`flatten.go`) replays layers oldest-first into a
  staging directory, applying whiteouts and rejecting unsafe paths.
+  Returns a `Metadata` map capturing per-file uid/gid/mode from
+  each tar header.
 - **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
  -E root_owner=0:0` to populate the image file at create time —
  no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
-  (`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
+  (`mkfs.ext4 -d` is the populate-at-create flag; nearly all
  modern distros ship it).
+- **`ApplyOwnership`** (`ownership.go`) streams a batched
+  `set_inode_field` script to `debugfs -w -f -` to rewrite per-file
+  uid/gid/mode to the captured tar-header values. Without this pass
+  the ext4 would carry the runner's on-disk uids.
+- **`InjectGuestAgents`** (`inject.go`) uses the same `debugfs`
+  scripting to drop banger's guest-side assets into the pulled ext4
+  with root ownership:
+  - `/usr/local/bin/banger-vsock-agent`
+  - `/usr/local/libexec/banger-network-bootstrap`
+  - `/usr/local/libexec/banger-first-boot`
+  - `/etc/systemd/system/banger-{network,vsock-agent,first-boot}.service`
+  - enable-at-boot symlinks under `multi-user.target.wants/`
+  - `/etc/modules-load.d/banger-vsock.conf`
+  - `/var/lib/banger/first-boot-pending` (marker file)

 `internal/daemon/images_pull.go` orchestrates:

@ -74,13 +98,44 @@ What does not yet work:
   tree under `os.TempDir` (bulk transient data stays off the
   persistent state filesystem).
 5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
-6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
-7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
-8. Persist a `model.Image{Managed: true, …}` record.
+6. `ApplyOwnership` + `InjectGuestAgents` run in one finalize step.
+7. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
+8. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
+9. Persist a `model.Image{Managed: true, …}` record.

 Any failure removes the staging dir. Post-rename failures remove the
 final dir and roll back the store write.

+## Guest-side boot sequence
+
+On the first boot of a pulled image, systemd starts three banger
+units in order:
+
+1. **`banger-network.service`** — runs the bootstrap script that
+   parses `/etc/banger-network.conf` (written by banger's VM-create
+   lifecycle) and brings the guest interface up with the assigned IP.
+2. **`banger-first-boot.service`** (only on first boot; removes its
+   own trigger file on success) — reads `/etc/os-release`, dispatches
+   to the native package manager, installs `openssh-server`, enables
+   `ssh.service` / `sshd.service`.
+3. **`banger-vsock-agent.service`** — runs the health-check daemon
+   banger uses to confirm the VM is alive.
+
+After first boot completes, subsequent boots skip the install step
+entirely. Banger's host-side SSH polling (`guest.WaitForSSH`)
+naturally retries until sshd is listening.
+
+## Adding distro support
+
+`internal/imagepull/assets/first-boot.sh` is the POSIX-sh dispatch.
+Add a new `ID=` branch and its install command to the `case` block,
+then rebuild banger — the asset is `go:embed`-ed into the binary.
+Supported `ID` values today: `debian`, `ubuntu`, `kali`, `raspbian`,
+`linuxmint`, `pop`, `alpine`, `fedora`, `rhel`, `centos`, `rocky`,
+`almalinux`, `arch`, `archlinux`, `manjaro`, `opensuse*`, `suse`.
+Unknown distros fall back to `ID_LIKE`, then error clearly with a
+pointer to edit the script.
+
 ## Paths

 | What | Where | Purpose |
@ -92,10 +147,9 @@ final dir and roll back the store write.

 ## Composition with `image build`

-A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
-banger-specific network unit, and file ownership is wrong for boot.
-The natural next step is to feed it through the existing customization
-pipeline:
+A pulled image boots as-is — ownership is correct, sshd installs on
+first boot, banger's agents are in place. That means the existing
+`image build --from-image` pipeline composes on top:

 ```bash
 banger image build --from-image debian-bookworm --name debian-dev --docker
@ -103,32 +157,11 @@ banger image build --from-image debian-bookworm --name debian-dev --docker

 `image build` spins up a transient VM using the base image, runs
 `scripts/customize.sh` over it, and saves the result as a new managed
-image. This is already how the opinionated `void` / `alpine` images
-are produced today.
-
-The bootability gap means this composition only works once Phase B
-lands an ownership-fixup pass. Until then, `image pull` gives you a
-recorded primitive; the boot story requires the legacy manual rootfs
-scripts.
+image with the opinionated tooling (mise, opencode, claude, pi, tmux
+plugins, optionally docker) layered on top.

 ## Tech debt

- **File-ownership preservation**. The ext4 is populated from a tree
-  extracted as the current user — `mkfs.ext4 -d` then copies those
-  on-disk uids/gids verbatim. Setuid bits survive but with the wrong
-  owner, so privilege escalation is broken inside the VM. Planned
-  fixes:
-  - **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
-    tar headers through `debugfs -w` with `set_inode_field` to
-    rewrite per-file uid/gid/mode. No new runtime deps (debugfs
-    ships with e2fsprogs). Moderate implementation; keeps us on
-    `mkfs.ext4 -d`.
-  - **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
-    streams tar entries directly into an ext4 image, preserving
-    ownership. Heavier dependency graph but purpose-built.
-
-  Either approach lives in Phase B.
-
 - **Auth**. When we add private-registry support, the natural path is
  `authn.DefaultKeychain` from `go-containerregistry`, which already
  honours `~/.docker/config.json` and the standard credential
@ -138,13 +171,17 @@ scripts.
  forever. A `banger image cache prune` command is a cheap follow-up
  when disk usage becomes a complaint.

- **Ownership fixup via user namespaces**. An alternative to
-  debugfs / tar2ext4 is running the entire extraction inside a user
-  namespace (`unshare -Ufr`), which lets us set uid=0 on files from
-  a non-privileged process. Cleaner in theory but requires
-  user-namespace support on the host and doesn't help when the
-  resulting tree is then passed to `mkfs.ext4 -d` (which copies
-  on-disk uids).
+- **First-boot timeout UX**. If you run `banger vm ssh` immediately
+  after `banger vm create`, the package install for `openssh-server`
+  may still be running and SSH will fail. Current mitigation: retry.
+  Better: a per-image `FirstBootPending` flag that tells the daemon
+  to extend its SSH wait timeout for the first boot, cleared on
+  success. Tracked but not implemented.
+
+- **Non-systemd distros**. The guest agents assume systemd. Adding
+  openrc / s6 / busybox-init variants means keeping parallel unit
+  trees in `inject.go` keyed on `/etc/os-release`. Only pick up
+  when a user actually wants it.

 ## Trust model