Phase 4: OCI import docs
New docs/oci-import.md covers the full Phase A story: - end-user flow (kernel pull + image pull + image list) - what works now (layer replay + whiteouts, path-traversal hardening, content-aware sizing, layer caching, composition with image build) - what does not work yet (direct boot due to ownership caveat, private registries, non-amd64 platforms) - architecture of internal/imagepull + the daemon orchestrator - path layout (OCI cache, staging, published) - tech debt: the three plausible ownership-fixup approaches (debugfs, hcsshim/tar2ext4, user namespaces) with honest trade-offs for Phase B to choose from later - trust model (digest chain covers transport; signature verification out of scope) README.md gains an image pull example alongside image register + --kernel-ref, with a pointer to the docs and an honest "pulled images are a base for image build, not yet directly bootable" warning. AGENTS.md gets the one-line note pointing at the new doc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
fdaf7cce0f
commit
2e4d4b14da
3 changed files with 170 additions and 0 deletions
156
docs/oci-import.md
Normal file
156
docs/oci-import.md
Normal file
|
|
@ -0,0 +1,156 @@
|
|||
# OCI import (`banger image pull`)
|
||||
|
||||
`banger image pull <oci-ref>` downloads a container image from any
|
||||
OCI-compatible registry (Docker Hub, GHCR, quay.io, self-hosted, …),
|
||||
flattens its layers into an ext4 rootfs, and registers the result as
|
||||
a managed banger image.
|
||||
|
||||
Paired with the kernel catalog, this dissolves the "where do I get a
|
||||
rootfs" bottleneck for most users — any distro that ships an official
|
||||
container image can now boot (eventually) as a banger VM.
|
||||
|
||||
```bash
|
||||
banger kernel pull void-6.12
|
||||
banger image pull docker.io/library/debian:bookworm --kernel-ref void-6.12
|
||||
banger image list # debian-bookworm appears, Managed=true
|
||||
```
|
||||
|
||||
## Status: Phase A (acquisition only)
|
||||
|
||||
This is the first of a two-phase initiative. **Phase A (this feature)**
|
||||
produces a working ext4 file from an OCI reference. **Phase B (not yet
|
||||
implemented)** will add the steps needed to make the pulled image
|
||||
directly bootable — init system hook-up, sshd install, vsock agent
|
||||
drop-in, network bootstrap, and **file-ownership fixup**.
|
||||
|
||||
What works today:
|
||||
|
||||
- Pulling any public OCI image that exposes a `linux/amd64` manifest.
|
||||
- Correct layer replay with whiteout semantics (`.wh.*` deletes,
|
||||
`.wh..wh..opq` opaque-dir markers).
|
||||
- Path-traversal and relative-symlink-escape protection.
|
||||
- Content-aware default sizing (`content × 1.25`, floor 1 GiB).
|
||||
- Layer caching on disk, keyed by blob SHA256.
|
||||
- Piping pulled images into the existing `banger image build
|
||||
--from-image` flow.
|
||||
|
||||
What does not yet work:
|
||||
|
||||
- **Booting a pulled image directly.** The produced ext4 has file
|
||||
ownership set to the *runner's* uid/gid, not the tar headers'.
|
||||
Setuid binaries (`sudo`, `ping`, …) run as the wrong user in the
|
||||
VM. This is deferred to Phase B.
|
||||
- **Private registries**. Auth is not implemented; anonymous pulls
|
||||
only. Docker Hub, GHCR (public), quay.io (public), etc. all work.
|
||||
- **Non-`linux/amd64` platforms**. The catalog is x86_64-only, so
|
||||
pulled rootfses match. `arm64` is additive in the schema; wire-up
|
||||
lands when a user needs it.
|
||||
|
||||
## Architecture
|
||||
|
||||
`internal/imagepull/` owns the pure mechanics:
|
||||
|
||||
- **`Pull`** (`imagepull.go`) wraps `go-containerregistry`'s
|
||||
`remote.Image` with the `linux/amd64` platform pinned. Layer
|
||||
blobs are cached on disk via `cache.NewFilesystemCache` under
|
||||
`<OCICacheDir>/blobs/sha256/<hex>` — OCI-standard layout so
|
||||
`skopeo` or `crane` could co-exist.
|
||||
- **`Flatten`** (`flatten.go`) replays layers oldest-first into a
|
||||
staging directory, applying whiteouts and rejecting unsafe paths.
|
||||
- **`BuildExt4`** (`ext4.go`) runs `mkfs.ext4 -F -d <staging>
|
||||
-E root_owner=0:0` to populate the image file at create time —
|
||||
no mount, no sudo, no loopback. Requires `e2fsprogs ≥ 1.43`
|
||||
(`mkfs.ext4 -d` is the Populate-at-Create flag; nearly all
|
||||
modern distros ship it).
|
||||
|
||||
`internal/daemon/images_pull.go` orchestrates:
|
||||
|
||||
1. Parse + validate the OCI ref.
|
||||
2. Derive a friendly default name (`debian-bookworm` for
|
||||
`docker.io/library/debian:bookworm`) when `--name` is omitted.
|
||||
3. Resolve kernel info via the shared `resolveKernelInputs` helper
|
||||
(the same code path as `image register --kernel-ref`).
|
||||
4. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
|
||||
tree under `os.TempDir` (bulk transient data stays off the
|
||||
persistent state filesystem).
|
||||
5. `imagepull.BuildExt4` produces `<staging>/rootfs.ext4`.
|
||||
6. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
|
||||
7. Atomic `os.Rename(<staging>, <final>)` publishes the artifact dir.
|
||||
8. Persist a `model.Image{Managed: true, …}` record.
|
||||
|
||||
Any failure removes the staging dir. Post-rename failures remove the
|
||||
final dir and roll back the store write.
|
||||
|
||||
## Paths
|
||||
|
||||
| What | Where | Purpose |
|
||||
|------|-------|---------|
|
||||
| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` | Re-pulls of the same image digest are local-only |
|
||||
| Staging dir | `~/.local/state/banger/images/<id>.staging/` | Short-lived; atomic-renamed to `<id>/` on success |
|
||||
| Staging rootfs tree | `$TMPDIR/banger-pull-<rand>/` | Extraction scratch space; removed after ext4 build |
|
||||
| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` | Managed artifact stored alongside the kernel triple |
|
||||
|
||||
## Composition with `image build`
|
||||
|
||||
A pulled image is "unconfigured" — it has no sshd, no vsock agent, no
|
||||
banger-specific network unit, and file ownership is wrong for boot.
|
||||
The natural next step is to feed it through the existing customization
|
||||
pipeline:
|
||||
|
||||
```bash
|
||||
banger image build --from-image debian-bookworm --name debian-dev --docker
|
||||
```
|
||||
|
||||
`image build` spins up a transient VM using the base image, runs
|
||||
`scripts/customize.sh` over it, and saves the result as a new managed
|
||||
image. This is already how the opinionated `void` / `alpine` images
|
||||
are produced today.
|
||||
|
||||
The bootability gap means this composition only works once Phase B
|
||||
lands an ownership-fixup pass. Until then, `image pull` gives you a
|
||||
recorded primitive; the boot story requires the legacy manual rootfs
|
||||
scripts.
|
||||
|
||||
## Tech debt
|
||||
|
||||
- **File-ownership preservation**. The ext4 is populated from a tree
|
||||
extracted as the current user — `mkfs.ext4 -d` then copies those
|
||||
on-disk uids/gids verbatim. Setuid bits survive but with the wrong
|
||||
owner, so privilege escalation is broken inside the VM. Planned
|
||||
fixes:
|
||||
- **debugfs ownership-fixup pass**: after `mkfs.ext4 -d`, replay
|
||||
tar headers through `debugfs -w` with `set_inode_field` to
|
||||
rewrite per-file uid/gid/mode. No new runtime deps (debugfs
|
||||
ships with e2fsprogs). Moderate implementation; keeps us on
|
||||
`mkfs.ext4 -d`.
|
||||
- **`tar2ext4`**: Microsoft's hcsshim ships a Go package that
|
||||
streams tar entries directly into an ext4 image, preserving
|
||||
ownership. Heavier dependency graph but purpose-built.
|
||||
|
||||
Either approach lives in Phase B.
|
||||
|
||||
- **Auth**. When we add private-registry support, the natural path is
|
||||
`authn.DefaultKeychain` from `go-containerregistry`, which already
|
||||
honours `~/.docker/config.json` and the standard credential
|
||||
helpers. No banger-specific config needed.
|
||||
|
||||
- **Cache eviction**. Layer blobs under `OCICacheDir` accumulate
|
||||
forever. A `banger image cache prune` command is a cheap follow-up
|
||||
when disk usage becomes a complaint.
|
||||
|
||||
- **Ownership fixup via user namespaces**. An alternative to
|
||||
debugfs / tar2ext4 is running the entire extraction inside a user
|
||||
namespace (`unshare -Ufr`), which lets us set uid=0 on files from
|
||||
a non-privileged process. Cleaner in theory but requires
|
||||
user-namespace support on the host and doesn't help when the
|
||||
resulting tree is then passed to `mkfs.ext4 -d` (which copies
|
||||
on-disk uids).
|
||||
|
||||
## Trust model
|
||||
|
||||
`image pull` delegates trust to the OCI registry the user selected.
|
||||
`go-containerregistry` verifies layer digests against the manifest
|
||||
during download, so a tampered mirror can't ship modified layers
|
||||
without breaking the sha256 chain. Beyond that, banger does not
|
||||
verify OCI image signatures (cosign/sigstore) — users who care should
|
||||
verify their references out-of-band.
|
||||
Loading…
Add table
Add a link
Reference in a new issue