Phase 4: OCI import docs

New docs/oci-import.md covers the full Phase A story:
 - end-user flow (kernel pull + image pull + image list)
 - what works now (layer replay + whiteouts, path-traversal
   hardening, content-aware sizing, layer caching, composition
   with image build)
 - what does not work yet (direct boot due to ownership
   caveat, private registries, non-amd64 platforms)
 - architecture of internal/imagepull + the daemon orchestrator
 - path layout (OCI cache, staging, published)
 - tech debt: the three plausible ownership-fixup approaches
   (debugfs, hcsshim/tar2ext4, user namespaces) with honest
   trade-offs for Phase B to choose from later
 - trust model (digest chain covers transport; signature
   verification out of scope)

README.md gains an image pull example alongside image register
+ --kernel-ref, with a pointer to the docs and an honest "pulled
images are a base for image build, not yet directly bootable"
warning.

AGENTS.md gets the one-line note pointing at the new doc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-16 17:37:07 -03:00
parent fdaf7cce0f
commit 2e4d4b14da
No known key found for this signature in database
GPG key ID: 33112E6833C34679
3 changed files with 170 additions and 0 deletions

View file

@ -22,6 +22,7 @@ Always run `make build` before commit.
- `./build/bin/banger image promote <image>` copies an unmanaged image into daemon-owned managed artifacts.
- `make void-kernel`, `make rootfs-void`, and `make void-register` drive the experimental Void flow under `./build/manual`.
- `scripts/publish-kernel.sh <name>` packages a locally-imported kernel and uploads it to the catalog; see `docs/kernel-catalog.md`.
- `banger image pull <oci-ref> --kernel-ref <name>` pulls a rootfs from any OCI registry; see `docs/oci-import.md` (experimental — file-ownership caveat).
## Image Model

View file

@ -102,6 +102,19 @@ Or pull a pre-built kernel from the catalog and reference it by name:
See [`docs/kernel-catalog.md`](docs/kernel-catalog.md) for catalog
maintenance.
Or pull a rootfs directly from any OCI registry (Docker Hub, GHCR, …):
```bash
./build/bin/banger image pull docker.io/library/debian:bookworm \
--kernel-ref void-6.12
```
`image pull` downloads the image, flattens its layers into an ext4
rootfs, and registers it as a managed banger image. Experimental — see
[`docs/oci-import.md`](docs/oci-import.md) for current limitations
(notably: file-ownership caveat means pulled images are a base for
`image build`, not yet directly bootable).
Build a managed image from an existing registered image:
```bash

156
docs/oci-import.md Normal file
View file

@ -0,0 +1,156 @@
# OCI import (`banger image pull`)
`banger image pull <oci-ref>` downloads a container image from any
OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …),
flattens its layers into an ext4 rootfs, and registers the result as
a managed banger image.
Paired with the kernel catalog, this dissolves the "where do I get a
rootfs" bottleneck for most users — any distro that ships an official
container image can now boot (eventually) as a banger VM.
```bash
banger kernel pull void-6.12
banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
banger image list # debian-bookworm appears, Managed=true
```
## Status: Phase A (acquisition only)
This is the first of a two-phase initiative. **Phase A (this feature)**
produces a working ext4 file from an OCI reference. **Phase B (not yet
implemented)** will add the steps needed to make the pulled image
directly bootable — init system hook-up, sshd install, vsock agent
drop-in, network bootstrap, and **file-ownership fixup**.
What works today:
- Pulling any public OCI image that exposes a `linux/amd64` manifest.
- Correct layer replay with whiteout semantics (`.wh.*` deletes,
`.wh..wh..opq` opaque-dir markers).
- Path-traversal and relative-symlink-escape protection.
- Content-aware default sizing (`content × 1.25`, floor 1 GiB).
- Layer caching on disk, keyed by blob SHA256.
- Piping pulled images into the existing `banger image build
--from-image` flow.
What does not yet work:
- **Booting a pulled image directly.** The produced ext4 has file
ownership set to the *runner's* uid/gid, not the tar headers'.
Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
VM. This is deferred to Phase B.
- **Private registries**. Auth is not implemented; anonymous pulls
only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
pulled rootfses match. `arm64` is additive in the schema; wire-up
lands when a user needs it.
## Architecture
`internal/imagepull/` owns the pure mechanics:
- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
`remote.Image` with the `linux/amd64` platform pinned. Layer
blobs are cached on disk via `cache.NewFilesystemCache` under
`<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
`skopeo` or `crane` could co-exist.
- **`Flatten`** (`flatten.go`) replays layers oldest-first into a
staging directory, applying whiteouts and rejecting unsafe paths.
- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
-E root_owner=0:0` to populate the image file at create time —
no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
(`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
modern distros ship it).
`internal/daemon/images_pull.go` orchestrates:
1. Parse + validate the OCI ref.
2. Derive a friendly default name (`debian-bookworm` for
`docker.io/library/debian:bookworm`) when `--name` is omitted.
3. Resolve kernel info via the shared `resolveKernelInputs` helper
(the same code path as `image register --kernel-ref`).
4. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
tree under `os.TempDir` (bulk transient data stays off the
persistent state filesystem).
5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
8. Persist a `model.Image{Managed: true, …}` record.
Any failure removes the staging dir. Post-rename failures remove the
final dir and roll back the store write.
## Paths
| What | Where | Purpose |
|------|-------|---------|
| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` | Re-pulls of the same image digest are local-only |
| Staging dir | `~/.local/state/banger/images/<id>.staging/` | Short-lived; atomic-renamed to `<id>/` on success |
| Staging rootfs tree | `$TMPDIR/banger-pull-<rand>/` | Extraction scratch space; removed after ext4 build |
| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` | Managed artifact stored alongside the kernel triple |
## Composition with `image build`
A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
banger-specific network unit, and file ownership is wrong for boot.
The natural next step is to feed it through the existing customization
pipeline:
```bash
banger image build --from-image debian-bookworm --name debian-dev --docker
```
`image build` spins up a transient VM using the base image, runs
`scripts/customize.sh` over it, and saves the result as a new managed
image. This is already how the opinionated `void` / `alpine` images
are produced today.
The bootability gap means this composition only works once Phase B
lands an ownership-fixup pass. Until then, `image pull` gives you a
recorded primitive; the boot story requires the legacy manual rootfs
scripts.
## Tech debt
- **File-ownership preservation**. The ext4 is populated from a tree
extracted as the current user — `mkfs.ext4 -d` then copies those
on-disk uids/gids verbatim. Setuid bits survive but with the wrong
owner, so privilege escalation is broken inside the VM. Planned
fixes:
- **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
tar headers through `debugfs -w` with `set_inode_field` to
rewrite per-file uid/gid/mode. No new runtime deps (debugfs
ships with e2fsprogs). Moderate implementation; keeps us on
`mkfs.ext4 -d`.
- **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
streams tar entries directly into an ext4 image, preserving
ownership. Heavier dependency graph but purpose-built.
Either approach lives in Phase B.
- **Auth**. When we add private-registry support, the natural path is
`authn.DefaultKeychain` from `go-containerregistry`, which already
honours `~/.docker/config.json` and the standard credential
helpers. No banger-specific config needed.
- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate
forever. A `banger image cache prune` command is a cheap follow-up
when disk usage becomes a complaint.
- **Ownership fixup via user namespaces**. An alternative to
debugfs / tar2ext4 is running the entire extraction inside a user
namespace (`unshare -Ufr`), which lets us set uid=0 on files from
a non-privileged process. Cleaner in theory but requires
user-namespace support on the host and doesn't help when the
resulting tree is then passed to `mkfs.ext4 -d` (which copies
on-disk uids).
## Trust model
`image pull` delegates trust to the OCI registry the user selected.
`go-containerregistry` verifies layer digests against the manifest
during download, so a tampered mirror can't ship modified layers
without breaking the sha256 chain. Beyond that, banger does not
verify OCI image signatures (cosign/sigstore) — users who care should
verify their references out-of-band.