From 2e4d4b14da177ad8ad4df59a70e1381dc1e6acdf Mon Sep 17 00:00:00 2001 From: Thales Maciel Date: Thu, 16 Apr 2026 17:37:07 -0300 Subject: [PATCH] Phase 4: OCI import docs New docs/oci-import.md covers the full Phase A story: - end-user flow (kernel pull + image pull + image list) - what works now (layer replay + whiteouts, path-traversal hardening, content-aware sizing, layer caching, composition with image build) - what does not work yet (direct boot due to ownership caveat, private registries, non-amd64 platforms) - architecture of internal/imagepull + the daemon orchestrator - path layout (OCI cache, staging, published) - tech debt: the three plausible ownership-fixup approaches (debugfs, hcsshim/tar2ext4, user namespaces) with honest trade-offs for Phase B to choose from later - trust model (digest chain covers transport; signature verification out of scope) README.md gains an image pull example alongside image register + --kernel-ref, with a pointer to the docs and an honest "pulled images are a base for image build, not yet directly bootable" warning. AGENTS.md gets the one-line note pointing at the new doc. Co-Authored-By: Claude Sonnet 4.6 --- AGENTS.md | 1 + README.md | 13 ++++ docs/oci-import.md | 156 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 170 insertions(+) create mode 100644 docs/oci-import.md diff --git a/AGENTS.md b/AGENTS.md index c935bf2..5b5204b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,6 +22,7 @@ Always run `make build` before commit. - `./build/bin/banger image promote ` copies an unmanaged image into daemon-owned managed artifacts. - `make void-kernel`, `make rootfs-void`, and `make void-register` drive the experimental Void flow under `./build/manual`. - `scripts/publish-kernel.sh ` packages a locally-imported kernel and uploads it to the catalog; see `docs/kernel-catalog.md`. +- `banger image pull --kernel-ref ` pulls a rootfs from any OCI registry; see `docs/oci-import.md` (experimental — file-ownership caveat). ## Image Model diff --git a/README.md b/README.md index 206b300..6224ab9 100644 --- a/README.md +++ b/README.md @@ -102,6 +102,19 @@ Or pull a pre-built kernel from the catalog and reference it by name: See [`docs/kernel-catalog.md`](docs/kernel-catalog.md) for catalog maintenance. +Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …): + +```bash +./build/bin/banger image pull docker.io/library/debian:bookworm \ + --kernel-ref void-6.12 +``` + +`image pull` downloads the image, flattens its layers into an ext4 +rootfs, and registers it as a managed banger image. Experimental — see +[`docs/oci-import.md`](docs/oci-import.md) for current limitations +(notably: file-ownership caveat means pulled images are a base for +`image build`, not yet directly bootable). + Build a managed image from an existing registered image: ```bash diff --git a/docs/oci-import.md b/docs/oci-import.md new file mode 100644 index 0000000..24720b4 --- /dev/null +++ b/docs/oci-import.md @@ -0,0 +1,156 @@ +# OCI import (`banger image pull`) + +`banger image pull ` downloads a container image from any +OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …), +flattens its layers into an ext4 rootfs, and registers the result as +a managed banger image. + +Paired with the kernel catalog, this dissolves the "where do I get a +rootfs" bottleneck for most users — any distro that ships an official +container image can now boot (eventually) as a banger VM. + +```bash +banger kernel pull void-6.12 +banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12 +banger image list # debian-bookworm appears, Managed=true +``` + +## Status: Phase A (acquisition only) + +This is the first of a two-phase initiative. **Phase A (this feature)** +produces a working ext4 file from an OCI reference. **Phase B (not yet +implemented)** will add the steps needed to make the pulled image +directly bootable — init system hook-up, sshd install, vsock agent +drop-in, network bootstrap, and **file-ownership fixup**. + +What works today: + +- Pulling any public OCI image that exposes a `linux/amd64` manifest. +- Correct layer replay with whiteout semantics (`.wh.*` deletes, + `.wh..wh..opq` opaque-dir markers). +- Path-traversal and relative-symlink-escape protection. +- Content-aware default sizing (`content × 1.25`, floor 1 GiB). +- Layer caching on disk, keyed by blob SHA256. +- Piping pulled images into the existing `banger image build + --from-image` flow. + +What does not yet work: + +- **Booting a pulled image directly.** The produced ext4 has file + ownership set to the *runner's* uid/gid, not the tar headers'. + Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the + VM. This is deferred to Phase B. +- **Private registries**. Auth is not implemented; anonymous pulls + only. Docker Hub, GHCR (public), quay.io (public), etc. all work. +- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so + pulled rootfses match. `arm64` is additive in the schema; wire-up + lands when a user needs it. + +## Architecture + +`internal/imagepull/` owns the pure mechanics: + +- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s + `remote.Image` with the `linux/amd64` platform pinned. Layer + blobs are cached on disk via `cache.NewFilesystemCache` under + `/blobs/sha256/` — OCI-standard layout so + `skopeo` or `crane` could co-exist. +- **`Flatten`** (`flatten.go`) replays layers oldest-first into a + staging directory, applying whiteouts and rejecting unsafe paths. +- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d + -E root_owner=0:0` to populate the image file at create time — + no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43` + (`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all + modern distros ship it). + +`internal/daemon/images_pull.go` orchestrates: + +1. Parse + validate the OCI ref. +2. Derive a friendly default name (`debian-bookworm` for + `docker.io/library/debian:bookworm`) when `--name` is omitted. +3. Resolve kernel info via the shared `resolveKernelInputs` helper + (the same code path as `image register --kernel-ref`). +4. Stage at `/.staging`; extract layers to a temp + tree under `os.TempDir` (bulk transient data stays off the + persistent state filesystem). +5. `imagepull.BuildExt4` produces `/rootfs.ext4`. +6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside. +7. Atomic `os.Rename(, )` publishes the artifact dir. +8. Persist a `model.Image{Managed: true, …}` record. + +Any failure removes the staging dir. Post-rename failures remove the +final dir and roll back the store write. + +## Paths + +| What | Where | Purpose | +|------|-------|---------| +| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/` | Re-pulls of the same image digest are local-only | +| Staging dir | `~/.local/state/banger/images/.staging/` | Short-lived; atomic-renamed to `/` on success | +| Staging rootfs tree | `$TMPDIR/banger-pull-/` | Extraction scratch space; removed after ext4 build | +| Published image | `~/.local/state/banger/images//rootfs.ext4` | Managed artifact stored alongside the kernel triple | + +## Composition with `image build` + +A pulled image is "unconfigured" — it has no sshd, no vsock agent, no +banger-specific network unit, and file ownership is wrong for boot. +The natural next step is to feed it through the existing customization +pipeline: + +```bash +banger image build --from-image debian-bookworm --name debian-dev --docker +``` + +`image build` spins up a transient VM using the base image, runs +`scripts/customize.sh` over it, and saves the result as a new managed +image. This is already how the opinionated `void` / `alpine` images +are produced today. + +The bootability gap means this composition only works once Phase B +lands an ownership-fixup pass. Until then, `image pull` gives you a +recorded primitive; the boot story requires the legacy manual rootfs +scripts. + +## Tech debt + +- **File-ownership preservation**. The ext4 is populated from a tree + extracted as the current user — `mkfs.ext4 -d` then copies those + on-disk uids/gids verbatim. Setuid bits survive but with the wrong + owner, so privilege escalation is broken inside the VM. Planned + fixes: + - **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay + tar headers through `debugfs -w` with `set_inode_field` to + rewrite per-file uid/gid/mode. No new runtime deps (debugfs + ships with e2fsprogs). Moderate implementation; keeps us on + `mkfs.ext4 -d`. + - **`tar2ext4`**: Microsoft's hcsshim ships a Go package that + streams tar entries directly into an ext4 image, preserving + ownership. Heavier dependency graph but purpose-built. + + Either approach lives in Phase B. + +- **Auth**. When we add private-registry support, the natural path is + `authn.DefaultKeychain` from `go-containerregistry`, which already + honours `~/.docker/config.json` and the standard credential + helpers. No banger-specific config needed. + +- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate + forever. A `banger image cache prune` command is a cheap follow-up + when disk usage becomes a complaint. + +- **Ownership fixup via user namespaces**. An alternative to + debugfs / tar2ext4 is running the entire extraction inside a user + namespace (`unshare -Ufr`), which lets us set uid=0 on files from + a non-privileged process. Cleaner in theory but requires + user-namespace support on the host and doesn't help when the + resulting tree is then passed to `mkfs.ext4 -d` (which copies + on-disk uids). + +## Trust model + +`image pull` delegates trust to the OCI registry the user selected. +`go-containerregistry` verifies layer digests against the manifest +during download, so a tampered mirror can't ship modified layers +without breaking the sha256 chain. Beyond that, banger does not +verify OCI image signatures (cosign/sigstore) — users who care should +verify their references out-of-band.