Phase B-4: docs for Phase B completion

docs/oci-import.md: removed the "Phase A acquisition-only" framing
and the bootability-gap warnings. Expanded architecture section
with ApplyOwnership + InjectGuestAgents. Added a "guest-side boot
sequence" diagram-in-prose showing network → first-boot → vsock-
agent unit ordering. Added a "how to add distro support" section
pointing at the ID-case dispatch in first-boot.sh.

README.md: replaced the experimental-caveat block with an honest
"boots as a banger VM directly, no image build step required"
description. Pointer to the docs for distro support details.

Tech-debt list trimmed — ownership fixup and first-boot install
are no longer planned work, they shipped. What remains: private-
registry auth (authn.DefaultKeychain), cache eviction, first-boot
timeout UX (retry still works but could be smoother with a
FirstBootPending flag), non-systemd distros.

All 20 packages green. make lint clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-16 19:06:37 -03:00
parent bddfa75feb
commit 2478fe3cc3
No known key found for this signature in database
GPG key ID: 33112E6833C34679
2 changed files with 100 additions and 61 deletions

View file

@ -110,10 +110,12 @@ Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …):
``` ```
`image pull` downloads the image, flattens its layers into an ext4 `image pull` downloads the image, flattens its layers into an ext4
rootfs, and registers it as a managed banger image. Experimental — see rootfs, applies tar-header ownership via debugfs, and pre-injects
[`docs/oci-import.md`](docs/oci-import.md) for current limitations banger's guest agents (vsock agent + network bootstrap + a first-boot
(notably: file-ownership caveat means pulled images are a base for unit that installs `openssh-server` via the guest's native package
`image build`, not yet directly bootable). manager). Boots as a banger VM directly, no `image build` step
required. See [`docs/oci-import.md`](docs/oci-import.md) for
supported distros and current limitations.
Build a managed image from an existing registered image: Build a managed image from an existing registered image:

View file

@ -15,15 +15,7 @@ banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
banger image list # debian-bookworm appears, Managed=true banger image list # debian-bookworm appears, Managed=true
``` ```
## Status: Phase A (acquisition only) ## What works
This is the first of a two-phase initiative. **Phase A (this feature)**
produces a working ext4 file from an OCI reference. **Phase B (not yet
implemented)** will add the steps needed to make the pulled image
directly bootable — init system hook-up, sshd install, vsock agent
drop-in, network bootstrap, and **file-ownership fixup**.
What works today:
- Pulling any public OCI image that exposes a `linux/amd64` manifest. - Pulling any public OCI image that exposes a `linux/amd64` manifest.
- Correct layer replay with whiteout semantics (`.wh.*` deletes, - Correct layer replay with whiteout semantics (`.wh.*` deletes,
@ -31,20 +23,35 @@ What works today:
- Path-traversal and relative-symlink-escape protection. - Path-traversal and relative-symlink-escape protection.
- Content-aware default sizing (`content × 1.25`, floor 1 GiB). - Content-aware default sizing (`content × 1.25`, floor 1 GiB).
- Layer caching on disk, keyed by blob SHA256. - Layer caching on disk, keyed by blob SHA256.
- **File ownership preservation.** Tar-header uid/gid/mode is captured
during flatten and applied to the resulting ext4 via a `debugfs`
pass, so setuid binaries (`sudo`, `passwd`) and root-owned config
files (`/etc/shadow`, `/etc/sudoers`) end up correctly owned.
- **Banger guest agents pre-injected.** The pulled ext4 ships with
`/usr/local/bin/banger-vsock-agent`, `banger-network.service`, and
`banger-vsock-agent.service` already in place and enabled.
- **First-boot sshd install.** A one-shot systemd service installs
`openssh-server` via the guest's package manager on first boot —
apt-get / apk / dnf / pacman / zypper dispatch based on
`/etc/os-release`. Subsequent boots skip the install.
- Piping pulled images into the existing `banger image build - Piping pulled images into the existing `banger image build
--from-image` flow. --from-image` flow.
What does not yet work: ## What doesn't yet work
- **Booting a pulled image directly.** The produced ext4 has file
ownership set to the *runner's* uid/gid, not the tar headers'.
Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
VM. This is deferred to Phase B.
- **Private registries**. Auth is not implemented; anonymous pulls - **Private registries**. Auth is not implemented; anonymous pulls
only. Docker Hub, GHCR (public), quay.io (public), etc. all work. only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so - **Non-`linux/amd64` platforms**. The kernel catalog is x86_64-only,
pulled rootfses match. `arm64` is additive in the schema; wire-up so pulled rootfses match. `arm64` is additive in the schema; wire-
lands when a user needs it. up lands when a user needs it.
- **Non-systemd distros.** The injected units assume systemd as PID 1.
Alpine ≥3.20 ships systemd; older alpine + void + busybox-init
images won't honour the banger-network / banger-first-boot units.
- **First boot needs network access.** The provisioning step reaches
out to the distro's package repo to install openssh-server. VMs
without NAT or without the bridge reaching the internet will time
out on first boot. The marker file stays in place so a later boot
retries.
## Architecture ## Architecture
@ -53,15 +60,32 @@ What does not yet work:
- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s - **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
`remote.Image` with the `linux/amd64` platform pinned. Layer `remote.Image` with the `linux/amd64` platform pinned. Layer
blobs are cached on disk via `cache.NewFilesystemCache` under blobs are cached on disk via `cache.NewFilesystemCache` under
`<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so `<OCICacheDir>/blobs/` — Pull itself does not drain the layer
`skopeo` or `crane` could co-exist. streams; that happens lazily during `Flatten`, and the cache
populates on read.
- **`Flatten`** (`flatten.go`) replays layers oldest-first into a - **`Flatten`** (`flatten.go`) replays layers oldest-first into a
staging directory, applying whiteouts and rejecting unsafe paths. staging directory, applying whiteouts and rejecting unsafe paths.
Returns a `Metadata` map capturing per-file uid/gid/mode from
each tar header.
- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging> - **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
-E root_owner=0:0` to populate the image file at create time — -E root_owner=0:0` to populate the image file at create time —
no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43` no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
(`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all (`mkfs.ext4 -d` is the populate-at-create flag; nearly all
modern distros ship it). modern distros ship it).
- **`ApplyOwnership`** (`ownership.go`) streams a batched
`set_inode_field` script to `debugfs -w -f -` to rewrite per-file
uid/gid/mode to the captured tar-header values. Without this pass
the ext4 would carry the runner's on-disk uids.
- **`InjectGuestAgents`** (`inject.go`) uses the same `debugfs`
scripting to drop banger's guest-side assets into the pulled ext4
with root ownership:
- `/usr/local/bin/banger-vsock-agent`
- `/usr/local/libexec/banger-network-bootstrap`
- `/usr/local/libexec/banger-first-boot`
- `/etc/systemd/system/banger-{network,vsock-agent,first-boot}.service`
- enable-at-boot symlinks under `multi-user.target.wants/`
- `/etc/modules-load.d/banger-vsock.conf`
- `/var/lib/banger/first-boot-pending` (marker file)
`internal/daemon/images_pull.go` orchestrates: `internal/daemon/images_pull.go` orchestrates:
@ -74,13 +98,44 @@ What does not yet work:
tree under `os.TempDir` (bulk transient data stays off the tree under `os.TempDir` (bulk transient data stays off the
persistent state filesystem). persistent state filesystem).
5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`. 5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside. 6. `ApplyOwnership` + `InjectGuestAgents` run in one finalize step.
7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir. 7. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
8. Persist a `model.Image{Managed: true, …}` record. 8. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
9. Persist a `model.Image{Managed: true, …}` record.
Any failure removes the staging dir. Post-rename failures remove the Any failure removes the staging dir. Post-rename failures remove the
final dir and roll back the store write. final dir and roll back the store write.
## Guest-side boot sequence
On the first boot of a pulled image, systemd starts three banger
units in order:
1. **`banger-network.service`** — runs the bootstrap script that
parses `/etc/banger-network.conf` (written by banger's VM-create
lifecycle) and brings the guest interface up with the assigned IP.
2. **`banger-first-boot.service`** (only on first boot; removes its
own trigger file on success) — reads `/etc/os-release`, dispatches
to the native package manager, installs `openssh-server`, enables
`ssh.service` / `sshd.service`.
3. **`banger-vsock-agent.service`** — runs the health-check daemon
banger uses to confirm the VM is alive.
After first boot completes, subsequent boots skip the install step
entirely. Banger's host-side SSH polling (`guest.WaitForSSH`)
naturally retries until sshd is listening.
## Adding distro support
`internal/imagepull/assets/first-boot.sh` is the POSIX-sh dispatch.
Add a new `ID=` branch and its install command to the `case` block,
then rebuild banger — the asset is `go:embed`-ed into the binary.
Supported `ID` values today: `debian`, `ubuntu`, `kali`, `raspbian`,
`linuxmint`, `pop`, `alpine`, `fedora`, `rhel`, `centos`, `rocky`,
`almalinux`, `arch`, `archlinux`, `manjaro`, `opensuse*`, `suse`.
Unknown distros fall back to `ID_LIKE`, then error clearly with a
pointer to edit the script.
## Paths ## Paths
| What | Where | Purpose | | What | Where | Purpose |
@ -92,10 +147,9 @@ final dir and roll back the store write.
## Composition with `image build` ## Composition with `image build`
A pulled image is "unconfigured" — it has no sshd, no vsock agent, no A pulled image boots as-is — ownership is correct, sshd installs on
banger-specific network unit, and file ownership is wrong for boot. first boot, banger's agents are in place. That means the existing
The natural next step is to feed it through the existing customization `image build --from-image` pipeline composes on top:
pipeline:
```bash ```bash
banger image build --from-image debian-bookworm --name debian-dev --docker banger image build --from-image debian-bookworm --name debian-dev --docker
@ -103,32 +157,11 @@ banger image build --from-image debian-bookworm --name debian-dev --docker
`image build` spins up a transient VM using the base image, runs `image build` spins up a transient VM using the base image, runs
`scripts/customize.sh` over it, and saves the result as a new managed `scripts/customize.sh` over it, and saves the result as a new managed
image. This is already how the opinionated `void` / `alpine` images image with the opinionated tooling (mise, opencode, claude, pi, tmux
are produced today. plugins, optionally docker) layered on top.
The bootability gap means this composition only works once Phase B
lands an ownership-fixup pass. Until then, `image pull` gives you a
recorded primitive; the boot story requires the legacy manual rootfs
scripts.
## Tech debt ## Tech debt
- **File-ownership preservation**. The ext4 is populated from a tree
extracted as the current user — `mkfs.ext4 -d` then copies those
on-disk uids/gids verbatim. Setuid bits survive but with the wrong
owner, so privilege escalation is broken inside the VM. Planned
fixes:
- **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
tar headers through `debugfs -w` with `set_inode_field` to
rewrite per-file uid/gid/mode. No new runtime deps (debugfs
ships with e2fsprogs). Moderate implementation; keeps us on
`mkfs.ext4 -d`.
- **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
streams tar entries directly into an ext4 image, preserving
ownership. Heavier dependency graph but purpose-built.
Either approach lives in Phase B.
- **Auth**. When we add private-registry support, the natural path is - **Auth**. When we add private-registry support, the natural path is
`authn.DefaultKeychain` from `go-containerregistry`, which already `authn.DefaultKeychain` from `go-containerregistry`, which already
honours `~/.docker/config.json` and the standard credential honours `~/.docker/config.json` and the standard credential
@ -138,13 +171,17 @@ scripts.
forever. A `banger image cache prune` command is a cheap follow-up forever. A `banger image cache prune` command is a cheap follow-up
when disk usage becomes a complaint. when disk usage becomes a complaint.
- **Ownership fixup via user namespaces**. An alternative to - **First-boot timeout UX**. If you run `banger vm ssh` immediately
debugfs / tar2ext4 is running the entire extraction inside a user after `banger vm create`, the package install for `openssh-server`
namespace (`unshare -Ufr`), which lets us set uid=0 on files from may still be running and SSH will fail. Current mitigation: retry.
a non-privileged process. Cleaner in theory but requires Better: a per-image `FirstBootPending` flag that tells the daemon
user-namespace support on the host and doesn't help when the to extend its SSH wait timeout for the first boot, cleared on
resulting tree is then passed to `mkfs.ext4 -d` (which copies success. Tracked but not implemented.
on-disk uids).
- **Non-systemd distros**. The guest agents assume systemd. Adding
openrc / s6 / busybox-init variants means keeping parallel unit
trees in `inject.go` keyed on `/etc/os-release`. Only pick up
when a user actually wants it.
## Trust model ## Trust model