Phase 4: OCI import docs

New docs/oci-import.md covers the full Phase A story: - end-user flow (kernel pull + image pull + image list) - what works now (layer replay + whiteouts, path-traversal hardening, content-aware sizing, layer caching, composition with image build) - what does not work yet (direct boot due to ownership caveat, private registries, non-amd64 platforms) - architecture of internal/imagepull + the daemon orchestrator - path layout (OCI cache, staging, published) - tech debt: the three plausible ownership-fixup approaches (debugfs, hcsshim/tar2ext4, user namespaces) with honest trade-offs for Phase B to choose from later - trust model (digest chain covers transport; signature verification out of scope) README.md gains an image pull example alongside image register + --kernel-ref, with a pointer to the docs and an honest "pulled images are a base for image build, not yet directly bootable" warning. AGENTS.md gets the one-line note pointing at the new doc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 17:37:07 -03:00 · 2026-04-16 17:37:07 -03:00 · 2e4d4b14da
commit 2e4d4b14da
parent fdaf7cce0f
3 changed files with 170 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -22,6 +22,7 @@ Always run `make build` before commit.
 - `./build/bin/banger image promote <image>` copies an unmanaged image into daemon-owned managed artifacts.
 - `make void-kernel`, `make rootfs-void`, and `make void-register` drive the experimental Void flow under `./build/manual`.
 - `scripts/publish-kernel.sh <name>` packages a locally-imported kernel and uploads it to the catalog; see `docs/kernel-catalog.md`.
+- `banger image pull <oci-ref> --kernel-ref <name>` pulls a rootfs from any OCI registry; see `docs/oci-import.md` (experimental — file-ownership caveat).

 ## Image Model

--- a/README.md
+++ b/README.md
@ -102,6 +102,19 @@ Or pull a pre-built kernel from the catalog and reference it by name:
 See [`docs/kernel-catalog.md`](docs/kernel-catalog.md) for catalog
 maintenance.

+Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …):
+
+```bash
+./build/bin/banger image pull docker.io/library/debian:bookworm \
+  --kernel-ref void-6.12
+```
+
+`image pull` downloads the image, flattens its layers into an ext4
+rootfs, and registers it as a managed banger image. Experimental — see
+[`docs/oci-import.md`](docs/oci-import.md) for current limitations
+(notably: file-ownership caveat means pulled images are a base for
+`image build`, not yet directly bootable).
+
 Build a managed image from an existing registered image:

 ```bash
--- a/docs/oci-import.md
+++ b/docs/oci-import.md
@ -0,0 +1,156 @@
+# OCI import (`banger image pull`)
+
+`banger image pull <oci-ref>` downloads a container image from any
+OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …),
+flattens its layers into an ext4 rootfs, and registers the result as
+a managed banger image.
+
+Paired with the kernel catalog, this dissolves the "where do I get a
+rootfs" bottleneck for most users — any distro that ships an official
+container image can now boot (eventually) as a banger VM.
+
+```bash
+banger kernel pull void-6.12
+banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
+banger image list     # debian-bookworm appears, Managed=true
+```
+
+## Status: Phase A (acquisition only)
+
+This is the first of a two-phase initiative. **Phase A (this feature)**
+produces a working ext4 file from an OCI reference. **Phase B (not yet
+implemented)** will add the steps needed to make the pulled image
+directly bootable — init system hook-up, sshd install, vsock agent
+drop-in, network bootstrap, and **file-ownership fixup**.
+
+What works today:
+
+- Pulling any public OCI image that exposes a `linux/amd64` manifest.
+- Correct layer replay with whiteout semantics (`.wh.*` deletes,
+  `.wh..wh..opq` opaque-dir markers).
+- Path-traversal and relative-symlink-escape protection.
+- Content-aware default sizing (`content × 1.25`, floor 1 GiB).
+- Layer caching on disk, keyed by blob SHA256.
+- Piping pulled images into the existing `banger image build
+  --from-image` flow.
+
+What does not yet work:
+
+- **Booting a pulled image directly.** The produced ext4 has file
+  ownership set to the *runner's* uid/gid, not the tar headers'.
+  Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
+  VM. This is deferred to Phase B.
+- **Private registries**. Auth is not implemented; anonymous pulls
+  only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
+- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
+  pulled rootfses match. `arm64` is additive in the schema; wire-up
+  lands when a user needs it.
+
+## Architecture
+
+`internal/imagepull/` owns the pure mechanics:
+
+- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
+  `remote.Image` with the `linux/amd64` platform pinned. Layer
+  blobs are cached on disk via `cache.NewFilesystemCache` under
+  `<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
+  `skopeo` or `crane` could co-exist.
+- **`Flatten`** (`flatten.go`) replays layers oldest-first into a
+  staging directory, applying whiteouts and rejecting unsafe paths.
+- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
+  -E root_owner=0:0` to populate the image file at create time —
+  no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
+  (`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
+  modern distros ship it).
+
+`internal/daemon/images_pull.go` orchestrates:
+
+1. Parse + validate the OCI ref.
+2. Derive a friendly default name (`debian-bookworm` for
+   `docker.io/library/debian:bookworm`) when `--name` is omitted.
+3. Resolve kernel info via the shared `resolveKernelInputs` helper
+   (the same code path as `image register --kernel-ref`).
+4. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
+   tree under `os.TempDir` (bulk transient data stays off the
+   persistent state filesystem).
+5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
+6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
+7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
+8. Persist a `model.Image{Managed: true, …}` record.
+
+Any failure removes the staging dir. Post-rename failures remove the
+final dir and roll back the store write.
+
+## Paths
+
+| What | Where | Purpose |
+|------|-------|---------|
+| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` | Re-pulls of the same image digest are local-only |
+| Staging dir | `~/.local/state/banger/images/<id>.staging/` | Short-lived; atomic-renamed to `<id>/` on success |
+| Staging rootfs tree | `$TMPDIR/banger-pull-<rand>/` | Extraction scratch space; removed after ext4 build |
+| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` | Managed artifact stored alongside the kernel triple |
+
+## Composition with `image build`
+
+A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
+banger-specific network unit, and file ownership is wrong for boot.
+The natural next step is to feed it through the existing customization
+pipeline:
+
+```bash
+banger image build --from-image debian-bookworm --name debian-dev --docker
+```
+
+`image build` spins up a transient VM using the base image, runs
+`scripts/customize.sh` over it, and saves the result as a new managed
+image. This is already how the opinionated `void` / `alpine` images
+are produced today.
+
+The bootability gap means this composition only works once Phase B
+lands an ownership-fixup pass. Until then, `image pull` gives you a
+recorded primitive; the boot story requires the legacy manual rootfs
+scripts.
+
+## Tech debt
+
+- **File-ownership preservation**. The ext4 is populated from a tree
+  extracted as the current user — `mkfs.ext4 -d` then copies those
+  on-disk uids/gids verbatim. Setuid bits survive but with the wrong
+  owner, so privilege escalation is broken inside the VM. Planned
+  fixes:
+  - **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
+    tar headers through `debugfs -w` with `set_inode_field` to
+    rewrite per-file uid/gid/mode. No new runtime deps (debugfs
+    ships with e2fsprogs). Moderate implementation; keeps us on
+    `mkfs.ext4 -d`.
+  - **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
+    streams tar entries directly into an ext4 image, preserving
+    ownership. Heavier dependency graph but purpose-built.
+
+  Either approach lives in Phase B.
+
+- **Auth**. When we add private-registry support, the natural path is
+  `authn.DefaultKeychain` from `go-containerregistry`, which already
+  honours `~/.docker/config.json` and the standard credential
+  helpers. No banger-specific config needed.
+
+- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate
+  forever. A `banger image cache prune` command is a cheap follow-up
+  when disk usage becomes a complaint.
+
+- **Ownership fixup via user namespaces**. An alternative to
+  debugfs / tar2ext4 is running the entire extraction inside a user
+  namespace (`unshare -Ufr`), which lets us set uid=0 on files from
+  a non-privileged process. Cleaner in theory but requires
+  user-namespace support on the host and doesn't help when the
+  resulting tree is then passed to `mkfs.ext4 -d` (which copies
+  on-disk uids).
+
+## Trust model
+
+`image pull` delegates trust to the OCI registry the user selected.
+`go-containerregistry` verifies layer digests against the manifest
+during download, so a tampered mirror can't ship modified layers
+without breaking the sha256 chain. Beyond that, banger does not
+verify OCI image signatures (cosign/sigstore) — users who care should
+verify their references out-of-band.