banger/docs/oci-import.md
Thales Maciel 2e4d4b14da
Phase 4: OCI import docs
New docs/oci-import.md covers the full Phase A story:
 - end-user flow (kernel pull + image pull + image list)
 - what works now (layer replay + whiteouts, path-traversal
   hardening, content-aware sizing, layer caching, composition
   with image build)
 - what does not work yet (direct boot due to ownership
   caveat, private registries, non-amd64 platforms)
 - architecture of internal/imagepull + the daemon orchestrator
 - path layout (OCI cache, staging, published)
 - tech debt: the three plausible ownership-fixup approaches
   (debugfs, hcsshim/tar2ext4, user namespaces) with honest
   trade-offs for Phase B to choose from later
 - trust model (digest chain covers transport; signature
   verification out of scope)

README.md gains an image pull example alongside image register
+ --kernel-ref, with a pointer to the docs and an honest "pulled
images are a base for image build, not yet directly bootable"
warning.

AGENTS.md gets the one-line note pointing at the new doc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 17:37:07 -03:00

156 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# OCI import (`banger image pull`)
`banger image pull <oci-ref>` downloads a container image from any
OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …),
flattens its layers into an ext4 rootfs, and registers the result as
a managed banger image.
Paired with the kernel catalog, this dissolves the "where do I get a
rootfs" bottleneck for most users — any distro that ships an official
container image can now boot (eventually) as a banger VM.
```bash
banger kernel pull void-6.12
banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
banger image list # debian-bookworm appears, Managed=true
```
## Status: Phase A (acquisition only)
This is the first of a two-phase initiative. **Phase A (this feature)**
produces a working ext4 file from an OCI reference. **Phase B (not yet
implemented)** will add the steps needed to make the pulled image
directly bootable — init system hook-up, sshd install, vsock agent
drop-in, network bootstrap, and **file-ownership fixup**.
What works today:
- Pulling any public OCI image that exposes a `linux/amd64` manifest.
- Correct layer replay with whiteout semantics (`.wh.*` deletes,
`.wh..wh..opq` opaque-dir markers).
- Path-traversal and relative-symlink-escape protection.
- Content-aware default sizing (`content × 1.25`, floor 1 GiB).
- Layer caching on disk, keyed by blob SHA256.
- Piping pulled images into the existing `banger image build
--from-image` flow.
What does not yet work:
- **Booting a pulled image directly.** The produced ext4 has file
ownership set to the *runner's* uid/gid, not the tar headers'.
Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
VM. This is deferred to Phase B.
- **Private registries**. Auth is not implemented; anonymous pulls
only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
pulled rootfses match. `arm64` is additive in the schema; wire-up
lands when a user needs it.
## Architecture
`internal/imagepull/` owns the pure mechanics:
- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
`remote.Image` with the `linux/amd64` platform pinned. Layer
blobs are cached on disk via `cache.NewFilesystemCache` under
`<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
`skopeo` or `crane` could co-exist.
- **`Flatten`** (`flatten.go`) replays layers oldest-first into a
staging directory, applying whiteouts and rejecting unsafe paths.
- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
-E root_owner=0:0` to populate the image file at create time —
no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
(`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
modern distros ship it).
`internal/daemon/images_pull.go` orchestrates:
1. Parse + validate the OCI ref.
2. Derive a friendly default name (`debian-bookworm` for
`docker.io/library/debian:bookworm`) when `--name` is omitted.
3. Resolve kernel info via the shared `resolveKernelInputs` helper
(the same code path as `image register --kernel-ref`).
4. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
tree under `os.TempDir` (bulk transient data stays off the
persistent state filesystem).
5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
8. Persist a `model.Image{Managed: true, …}` record.
Any failure removes the staging dir. Post-rename failures remove the
final dir and roll back the store write.
## Paths
| What | Where | Purpose |
|------|-------|---------|
| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` | Re-pulls of the same image digest are local-only |
| Staging dir | `~/.local/state/banger/images/<id>.staging/` | Short-lived; atomic-renamed to `<id>/` on success |
| Staging rootfs tree | `$TMPDIR/banger-pull-<rand>/` | Extraction scratch space; removed after ext4 build |
| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` | Managed artifact stored alongside the kernel triple |
## Composition with `image build`
A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
banger-specific network unit, and file ownership is wrong for boot.
The natural next step is to feed it through the existing customization
pipeline:
```bash
banger image build --from-image debian-bookworm --name debian-dev --docker
```
`image build` spins up a transient VM using the base image, runs
`scripts/customize.sh` over it, and saves the result as a new managed
image. This is already how the opinionated `void` / `alpine` images
are produced today.
The bootability gap means this composition only works once Phase B
lands an ownership-fixup pass. Until then, `image pull` gives you a
recorded primitive; the boot story requires the legacy manual rootfs
scripts.
## Tech debt
- **File-ownership preservation**. The ext4 is populated from a tree
extracted as the current user — `mkfs.ext4 -d` then copies those
on-disk uids/gids verbatim. Setuid bits survive but with the wrong
owner, so privilege escalation is broken inside the VM. Planned
fixes:
- **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
tar headers through `debugfs -w` with `set_inode_field` to
rewrite per-file uid/gid/mode. No new runtime deps (debugfs
ships with e2fsprogs). Moderate implementation; keeps us on
`mkfs.ext4 -d`.
- **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
streams tar entries directly into an ext4 image, preserving
ownership. Heavier dependency graph but purpose-built.
Either approach lives in Phase B.
- **Auth**. When we add private-registry support, the natural path is
`authn.DefaultKeychain` from `go-containerregistry`, which already
honours `~/.docker/config.json` and the standard credential
helpers. No banger-specific config needed.
- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate
forever. A `banger image cache prune` command is a cheap follow-up
when disk usage becomes a complaint.
- **Ownership fixup via user namespaces**. An alternative to
debugfs / tar2ext4 is running the entire extraction inside a user
namespace (`unshare -Ufr`), which lets us set uid=0 on files from
a non-privileged process. Cleaner in theory but requires
user-namespace support on the host and doesn't help when the
resulting tree is then passed to `mkfs.ext4 -d` (which copies
on-disk uids).
## Trust model
`image pull` delegates trust to the OCI registry the user selected.
`go-containerregistry` verifies layer digests against the manifest
during download, so a tampered mirror can't ship modified layers
without breaking the sha256 chain. Beyond that, banger does not
verify OCI image signatures (cosign/sigstore) — users who care should
verify their references out-of-band.