banger/docs/oci-import.md
Thales Maciel 2e4d4b14da
Phase 4: OCI import docs
New docs/oci-import.md covers the full Phase A story:
 - end-user flow (kernel pull + image pull + image list)
 - what works now (layer replay + whiteouts, path-traversal
   hardening, content-aware sizing, layer caching, composition
   with image build)
 - what does not work yet (direct boot due to ownership
   caveat, private registries, non-amd64 platforms)
 - architecture of internal/imagepull + the daemon orchestrator
 - path layout (OCI cache, staging, published)
 - tech debt: the three plausible ownership-fixup approaches
   (debugfs, hcsshim/tar2ext4, user namespaces) with honest
   trade-offs for Phase B to choose from later
 - trust model (digest chain covers transport; signature
   verification out of scope)

README.md gains an image pull example alongside image register
+ --kernel-ref, with a pointer to the docs and an honest "pulled
images are a base for image build, not yet directly bootable"
warning.

AGENTS.md gets the one-line note pointing at the new doc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 17:37:07 -03:00

6.8 KiB
Raw Blame History

OCI import (banger image pull)

banger image pull <oci-ref> downloads a container image from any OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …), flattens its layers into an ext4 rootfs, and registers the result as a managed banger image.

Paired with the kernel catalog, this dissolves the "where do I get a rootfs" bottleneck for most users — any distro that ships an official container image can now boot (eventually) as a banger VM.

banger kernel pull void-6.12
banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
banger image list     # debian-bookworm appears, Managed=true

Status: Phase A (acquisition only)

This is the first of a two-phase initiative. Phase A (this feature) produces a working ext4 file from an OCI reference. Phase B (not yet implemented) will add the steps needed to make the pulled image directly bootable — init system hook-up, sshd install, vsock agent drop-in, network bootstrap, and file-ownership fixup.

What works today:

  • Pulling any public OCI image that exposes a linux/amd64 manifest.
  • Correct layer replay with whiteout semantics (.wh.* deletes, .wh..wh..opq opaque-dir markers).
  • Path-traversal and relative-symlink-escape protection.
  • Content-aware default sizing (content × 1.25, floor 1 GiB).
  • Layer caching on disk, keyed by blob SHA256.
  • Piping pulled images into the existing banger image build --from-image flow.

What does not yet work:

  • Booting a pulled image directly. The produced ext4 has file ownership set to the runner's uid/gid, not the tar headers'. Setuid binaries (sudo, ping, …) run as the wrong user in the VM. This is deferred to Phase B.
  • Private registries. Auth is not implemented; anonymous pulls only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
  • Non-linux/amd64 platforms. The catalog is x86_64-only, so pulled rootfses match. arm64 is additive in the schema; wire-up lands when a user needs it.

Architecture

internal/imagepull/ owns the pure mechanics:

  • Pull (imagepull.go) wraps go-containerregistry's remote.Image with the linux/amd64 platform pinned. Layer blobs are cached on disk via cache.NewFilesystemCache under <OCICacheDir>/blobs/sha256/<hex> — OCI-standard layout so skopeo or crane could co-exist.
  • Flatten (flatten.go) replays layers oldest-first into a staging directory, applying whiteouts and rejecting unsafe paths.
  • BuildExt4 (ext4.go) runs mkfs.ext4 -F -d <staging> -E root_owner=0:0 to populate the image file at create time — no mount, no sudo, no loopback. Requires e2fsprogs ≥ 1.43 (mkfs.ext4 -d is the Populate-at-Create flag; nearly all modern distros ship it).

internal/daemon/images_pull.go orchestrates:

  1. Parse + validate the OCI ref.
  2. Derive a friendly default name (debian-bookworm for docker.io/library/debian:bookworm) when --name is omitted.
  3. Resolve kernel info via the shared resolveKernelInputs helper (the same code path as image register --kernel-ref).
  4. Stage at <ImagesDir>/<id>.staging; extract layers to a temp tree under os.TempDir (bulk transient data stays off the persistent state filesystem).
  5. imagepull.BuildExt4 produces <staging>/rootfs.ext4.
  6. imagemgr.StageBootArtifacts stages the kernel triple alongside.
  7. Atomic os.Rename(<staging>, <final>) publishes the artifact dir.
  8. Persist a model.Image{Managed: true, …} record.

Any failure removes the staging dir. Post-rename failures remove the final dir and roll back the store write.

Paths

What Where Purpose
Layer blob cache ~/.cache/banger/oci/blobs/sha256/<hex> Re-pulls of the same image digest are local-only
Staging dir ~/.local/state/banger/images/<id>.staging/ Short-lived; atomic-renamed to <id>/ on success
Staging rootfs tree $TMPDIR/banger-pull-<rand>/ Extraction scratch space; removed after ext4 build
Published image ~/.local/state/banger/images/<id>/rootfs.ext4 Managed artifact stored alongside the kernel triple

Composition with image build

A pulled image is "unconfigured" — it has no sshd, no vsock agent, no banger-specific network unit, and file ownership is wrong for boot. The natural next step is to feed it through the existing customization pipeline:

banger image build --from-image debian-bookworm --name debian-dev --docker

image build spins up a transient VM using the base image, runs scripts/customize.sh over it, and saves the result as a new managed image. This is already how the opinionated void / alpine images are produced today.

The bootability gap means this composition only works once Phase B lands an ownership-fixup pass. Until then, image pull gives you a recorded primitive; the boot story requires the legacy manual rootfs scripts.

Tech debt

  • File-ownership preservation. The ext4 is populated from a tree extracted as the current user — mkfs.ext4 -d then copies those on-disk uids/gids verbatim. Setuid bits survive but with the wrong owner, so privilege escalation is broken inside the VM. Planned fixes:

    • debugfs ownership-fixup pass: after mkfs.ext4 -d, replay tar headers through debugfs -w with set_inode_field to rewrite per-file uid/gid/mode. No new runtime deps (debugfs ships with e2fsprogs). Moderate implementation; keeps us on mkfs.ext4 -d.
    • tar2ext4: Microsoft's hcsshim ships a Go package that streams tar entries directly into an ext4 image, preserving ownership. Heavier dependency graph but purpose-built.

    Either approach lives in Phase B.

  • Auth. When we add private-registry support, the natural path is authn.DefaultKeychain from go-containerregistry, which already honours ~/.docker/config.json and the standard credential helpers. No banger-specific config needed.

  • Cache eviction. Layer blobs under OCICacheDir accumulate forever. A banger image cache prune command is a cheap follow-up when disk usage becomes a complaint.

  • Ownership fixup via user namespaces. An alternative to debugfs / tar2ext4 is running the entire extraction inside a user namespace (unshare -Ufr), which lets us set uid=0 on files from a non-privileged process. Cleaner in theory but requires user-namespace support on the host and doesn't help when the resulting tree is then passed to mkfs.ext4 -d (which copies on-disk uids).

Trust model

image pull delegates trust to the OCI registry the user selected. go-containerregistry verifies layer digests against the manifest during download, so a tampered mirror can't ship modified layers without breaking the sha256 chain. Beyond that, banger does not verify OCI image signatures (cosign/sigstore) — users who care should verify their references out-of-band.