# OCI import (`banger image pull`) `banger image pull ` downloads a container image from any OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …), flattens its layers into an ext4 rootfs, and registers the result as a managed banger image. Paired with the kernel catalog, this dissolves the "where do I get a rootfs" bottleneck for most users — any distro that ships an official container image can now boot (eventually) as a banger VM. ```bash banger kernel pull void-6.12 banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12 banger image list # debian-bookworm appears, Managed=true ``` ## Status: Phase A (acquisition only) This is the first of a two-phase initiative. **Phase A (this feature)** produces a working ext4 file from an OCI reference. **Phase B (not yet implemented)** will add the steps needed to make the pulled image directly bootable — init system hook-up, sshd install, vsock agent drop-in, network bootstrap, and **file-ownership fixup**. What works today: - Pulling any public OCI image that exposes a `linux/amd64` manifest. - Correct layer replay with whiteout semantics (`.wh.*` deletes, `.wh..wh..opq` opaque-dir markers). - Path-traversal and relative-symlink-escape protection. - Content-aware default sizing (`content × 1.25`, floor 1 GiB). - Layer caching on disk, keyed by blob SHA256. - Piping pulled images into the existing `banger image build --from-image` flow. What does not yet work: - **Booting a pulled image directly.** The produced ext4 has file ownership set to the *runner's* uid/gid, not the tar headers'. Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the VM. This is deferred to Phase B. - **Private registries**. Auth is not implemented; anonymous pulls only. Docker Hub, GHCR (public), quay.io (public), etc. all work. - **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so pulled rootfses match. `arm64` is additive in the schema; wire-up lands when a user needs it. ## Architecture `internal/imagepull/` owns the pure mechanics: - **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s `remote.Image` with the `linux/amd64` platform pinned. Layer blobs are cached on disk via `cache.NewFilesystemCache` under `/blobs/sha256/` — OCI-standard layout so `skopeo` or `crane` could co-exist. - **`Flatten`** (`flatten.go`) replays layers oldest-first into a staging directory, applying whiteouts and rejecting unsafe paths. - **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d -E root_owner=0:0` to populate the image file at create time — no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43` (`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all modern distros ship it). `internal/daemon/images_pull.go` orchestrates: 1. Parse + validate the OCI ref. 2. Derive a friendly default name (`debian-bookworm` for `docker.io/library/debian:bookworm`) when `--name` is omitted. 3. Resolve kernel info via the shared `resolveKernelInputs` helper (the same code path as `image register --kernel-ref`). 4. Stage at `/.staging`; extract layers to a temp tree under `os.TempDir` (bulk transient data stays off the persistent state filesystem). 5. `imagepull.BuildExt4` produces `/rootfs.ext4`. 6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside. 7. Atomic `os.Rename(, )` publishes the artifact dir. 8. Persist a `model.Image{Managed: true, …}` record. Any failure removes the staging dir. Post-rename failures remove the final dir and roll back the store write. ## Paths | What | Where | Purpose | |------|-------|---------| | Layer blob cache | `~/.cache/banger/oci/blobs/sha256/` | Re-pulls of the same image digest are local-only | | Staging dir | `~/.local/state/banger/images/.staging/` | Short-lived; atomic-renamed to `/` on success | | Staging rootfs tree | `$TMPDIR/banger-pull-/` | Extraction scratch space; removed after ext4 build | | Published image | `~/.local/state/banger/images//rootfs.ext4` | Managed artifact stored alongside the kernel triple | ## Composition with `image build` A pulled image is "unconfigured" — it has no sshd, no vsock agent, no banger-specific network unit, and file ownership is wrong for boot. The natural next step is to feed it through the existing customization pipeline: ```bash banger image build --from-image debian-bookworm --name debian-dev --docker ``` `image build` spins up a transient VM using the base image, runs `scripts/customize.sh` over it, and saves the result as a new managed image. This is already how the opinionated `void` / `alpine` images are produced today. The bootability gap means this composition only works once Phase B lands an ownership-fixup pass. Until then, `image pull` gives you a recorded primitive; the boot story requires the legacy manual rootfs scripts. ## Tech debt - **File-ownership preservation**. The ext4 is populated from a tree extracted as the current user — `mkfs.ext4 -d` then copies those on-disk uids/gids verbatim. Setuid bits survive but with the wrong owner, so privilege escalation is broken inside the VM. Planned fixes: - **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay tar headers through `debugfs -w` with `set_inode_field` to rewrite per-file uid/gid/mode. No new runtime deps (debugfs ships with e2fsprogs). Moderate implementation; keeps us on `mkfs.ext4 -d`. - **`tar2ext4`**: Microsoft's hcsshim ships a Go package that streams tar entries directly into an ext4 image, preserving ownership. Heavier dependency graph but purpose-built. Either approach lives in Phase B. - **Auth**. When we add private-registry support, the natural path is `authn.DefaultKeychain` from `go-containerregistry`, which already honours `~/.docker/config.json` and the standard credential helpers. No banger-specific config needed. - **Cache eviction**. Layer blobs under `OCICacheDir` accumulate forever. A `banger image cache prune` command is a cheap follow-up when disk usage becomes a complaint. - **Ownership fixup via user namespaces**. An alternative to debugfs / tar2ext4 is running the entire extraction inside a user namespace (`unshare -Ufr`), which lets us set uid=0 on files from a non-privileged process. Cleaner in theory but requires user-namespace support on the host and doesn't help when the resulting tree is then passed to `mkfs.ext4 -d` (which copies on-disk uids). ## Trust model `image pull` delegates trust to the OCI registry the user selected. `go-containerregistry` verifies layer digests against the manifest during download, so a tampered mirror can't ship modified layers without breaking the sha256 chain. Beyond that, banger does not verify OCI image signatures (cosign/sigstore) — users who care should verify their references out-of-band.