New docs/oci-import.md covers the full Phase A story: - end-user flow (kernel pull + image pull + image list) - what works now (layer replay + whiteouts, path-traversal hardening, content-aware sizing, layer caching, composition with image build) - what does not work yet (direct boot due to ownership caveat, private registries, non-amd64 platforms) - architecture of internal/imagepull + the daemon orchestrator - path layout (OCI cache, staging, published) - tech debt: the three plausible ownership-fixup approaches (debugfs, hcsshim/tar2ext4, user namespaces) with honest trade-offs for Phase B to choose from later - trust model (digest chain covers transport; signature verification out of scope) README.md gains an image pull example alongside image register + --kernel-ref, with a pointer to the docs and an honest "pulled images are a base for image build, not yet directly bootable" warning. AGENTS.md gets the one-line note pointing at the new doc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
156 lines
6.8 KiB
Markdown
156 lines
6.8 KiB
Markdown
# OCI import (`banger image pull`)
|
||
|
||
`banger image pull <oci-ref>` downloads a container image from any
|
||
OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …),
|
||
flattens its layers into an ext4 rootfs, and registers the result as
|
||
a managed banger image.
|
||
|
||
Paired with the kernel catalog, this dissolves the "where do I get a
|
||
rootfs" bottleneck for most users — any distro that ships an official
|
||
container image can now boot (eventually) as a banger VM.
|
||
|
||
```bash
|
||
banger kernel pull void-6.12
|
||
banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
|
||
banger image list # debian-bookworm appears, Managed=true
|
||
```
|
||
|
||
## Status: Phase A (acquisition only)
|
||
|
||
This is the first of a two-phase initiative. **Phase A (this feature)**
|
||
produces a working ext4 file from an OCI reference. **Phase B (not yet
|
||
implemented)** will add the steps needed to make the pulled image
|
||
directly bootable — init system hook-up, sshd install, vsock agent
|
||
drop-in, network bootstrap, and **file-ownership fixup**.
|
||
|
||
What works today:
|
||
|
||
- Pulling any public OCI image that exposes a `linux/amd64` manifest.
|
||
- Correct layer replay with whiteout semantics (`.wh.*` deletes,
|
||
`.wh..wh..opq` opaque-dir markers).
|
||
- Path-traversal and relative-symlink-escape protection.
|
||
- Content-aware default sizing (`content × 1.25`, floor 1 GiB).
|
||
- Layer caching on disk, keyed by blob SHA256.
|
||
- Piping pulled images into the existing `banger image build
|
||
--from-image` flow.
|
||
|
||
What does not yet work:
|
||
|
||
- **Booting a pulled image directly.** The produced ext4 has file
|
||
ownership set to the *runner's* uid/gid, not the tar headers'.
|
||
Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
|
||
VM. This is deferred to Phase B.
|
||
- **Private registries**. Auth is not implemented; anonymous pulls
|
||
only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
|
||
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
|
||
pulled rootfses match. `arm64` is additive in the schema; wire-up
|
||
lands when a user needs it.
|
||
|
||
## Architecture
|
||
|
||
`internal/imagepull/` owns the pure mechanics:
|
||
|
||
- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
|
||
`remote.Image` with the `linux/amd64` platform pinned. Layer
|
||
blobs are cached on disk via `cache.NewFilesystemCache` under
|
||
`<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
|
||
`skopeo` or `crane` could co-exist.
|
||
- **`Flatten`** (`flatten.go`) replays layers oldest-first into a
|
||
staging directory, applying whiteouts and rejecting unsafe paths.
|
||
- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
|
||
-E root_owner=0:0` to populate the image file at create time —
|
||
no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
|
||
(`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
|
||
modern distros ship it).
|
||
|
||
`internal/daemon/images_pull.go` orchestrates:
|
||
|
||
1. Parse + validate the OCI ref.
|
||
2. Derive a friendly default name (`debian-bookworm` for
|
||
`docker.io/library/debian:bookworm`) when `--name` is omitted.
|
||
3. Resolve kernel info via the shared `resolveKernelInputs` helper
|
||
(the same code path as `image register --kernel-ref`).
|
||
4. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
|
||
tree under `os.TempDir` (bulk transient data stays off the
|
||
persistent state filesystem).
|
||
5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
|
||
6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
|
||
7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
|
||
8. Persist a `model.Image{Managed: true, …}` record.
|
||
|
||
Any failure removes the staging dir. Post-rename failures remove the
|
||
final dir and roll back the store write.
|
||
|
||
## Paths
|
||
|
||
| What | Where | Purpose |
|
||
|------|-------|---------|
|
||
| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` | Re-pulls of the same image digest are local-only |
|
||
| Staging dir | `~/.local/state/banger/images/<id>.staging/` | Short-lived; atomic-renamed to `<id>/` on success |
|
||
| Staging rootfs tree | `$TMPDIR/banger-pull-<rand>/` | Extraction scratch space; removed after ext4 build |
|
||
| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` | Managed artifact stored alongside the kernel triple |
|
||
|
||
## Composition with `image build`
|
||
|
||
A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
|
||
banger-specific network unit, and file ownership is wrong for boot.
|
||
The natural next step is to feed it through the existing customization
|
||
pipeline:
|
||
|
||
```bash
|
||
banger image build --from-image debian-bookworm --name debian-dev --docker
|
||
```
|
||
|
||
`image build` spins up a transient VM using the base image, runs
|
||
`scripts/customize.sh` over it, and saves the result as a new managed
|
||
image. This is already how the opinionated `void` / `alpine` images
|
||
are produced today.
|
||
|
||
The bootability gap means this composition only works once Phase B
|
||
lands an ownership-fixup pass. Until then, `image pull` gives you a
|
||
recorded primitive; the boot story requires the legacy manual rootfs
|
||
scripts.
|
||
|
||
## Tech debt
|
||
|
||
- **File-ownership preservation**. The ext4 is populated from a tree
|
||
extracted as the current user — `mkfs.ext4 -d` then copies those
|
||
on-disk uids/gids verbatim. Setuid bits survive but with the wrong
|
||
owner, so privilege escalation is broken inside the VM. Planned
|
||
fixes:
|
||
- **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
|
||
tar headers through `debugfs -w` with `set_inode_field` to
|
||
rewrite per-file uid/gid/mode. No new runtime deps (debugfs
|
||
ships with e2fsprogs). Moderate implementation; keeps us on
|
||
`mkfs.ext4 -d`.
|
||
- **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
|
||
streams tar entries directly into an ext4 image, preserving
|
||
ownership. Heavier dependency graph but purpose-built.
|
||
|
||
Either approach lives in Phase B.
|
||
|
||
- **Auth**. When we add private-registry support, the natural path is
|
||
`authn.DefaultKeychain` from `go-containerregistry`, which already
|
||
honours `~/.docker/config.json` and the standard credential
|
||
helpers. No banger-specific config needed.
|
||
|
||
- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate
|
||
forever. A `banger image cache prune` command is a cheap follow-up
|
||
when disk usage becomes a complaint.
|
||
|
||
- **Ownership fixup via user namespaces**. An alternative to
|
||
debugfs / tar2ext4 is running the entire extraction inside a user
|
||
namespace (`unshare -Ufr`), which lets us set uid=0 on files from
|
||
a non-privileged process. Cleaner in theory but requires
|
||
user-namespace support on the host and doesn't help when the
|
||
resulting tree is then passed to `mkfs.ext4 -d` (which copies
|
||
on-disk uids).
|
||
|
||
## Trust model
|
||
|
||
`image pull` delegates trust to the OCI registry the user selected.
|
||
`go-containerregistry` verifies layer digests against the manifest
|
||
during download, so a tampered mirror can't ship modified layers
|
||
without breaking the sha256 chain. Beyond that, banger does not
|
||
verify OCI image signatures (cosign/sigstore) — users who care should
|
||
verify their references out-of-band.
|