Preserve cleanup after daemon restarts and harden OCI and tar imports against filenames that debugfs cannot encode safely. Mirror tap, loop, and dm teardown identity onto VM.Runtime, teach cleanup and reconcile to fall back to those persisted fields when handles.json is missing or corrupt, and clear the recovery state on stop, error, and delete paths. Reject debugfs-hostile entry names during flattening and in ApplyOwnership itself, then add regression coverage for corrupt handles.json recovery and unsafe import paths. Verified with targeted go tests, make lint-go, make lint-shell, and make build.
153 lines
6.6 KiB
Markdown
153 lines
6.6 KiB
Markdown
# OCI import (`banger image pull`)
|
||
|
||
`banger image pull` has two paths. The primary one — catalog bundle —
|
||
is documented in [`docs/image-catalog.md`](image-catalog.md). This
|
||
doc covers the fallthrough: OCI-registry pull for arbitrary container
|
||
images.
|
||
|
||
## When to use it
|
||
|
||
Use the OCI path when you need a distro or image that isn't in the
|
||
catalog. The catalog covers the common happy path
|
||
(`debian-bookworm`); anything else (`alpine`, `fedora`, `ubuntu`,
|
||
custom corporate images) goes through OCI pull.
|
||
|
||
```bash
|
||
banger image pull docker.io/library/alpine:3.20 --kernel-ref generic-6.12
|
||
banger image pull ghcr.io/myorg/devimg:v2 --kernel-ref generic-6.12
|
||
```
|
||
|
||
`banger image pull` dispatches based on the reference:
|
||
|
||
- `banger image pull debian-bookworm` → catalog (fast path).
|
||
- `banger image pull docker.io/library/foo:bar` → OCI (anything not
|
||
in the catalog).
|
||
|
||
## What works
|
||
|
||
- Any public OCI image that exposes a `linux/amd64` manifest.
|
||
- Correct layer replay with whiteout semantics (`.wh.*` deletes,
|
||
`.wh..wh..opq` opaque-dir markers).
|
||
- Path-traversal, debugfs-hostile filename, and relative-symlink-escape protection.
|
||
- Content-aware default sizing (`content × 1.5`, floor 1 GiB).
|
||
- Layer caching on disk, keyed by blob sha256.
|
||
- **Ownership preservation** — tar-header uid/gid/mode captured
|
||
during flatten, applied to the ext4 via a `debugfs` pass, so
|
||
setuid binaries (`sudo`, `passwd`) and root-owned config
|
||
(`/etc/shadow`, `/etc/sudoers`) end up correctly owned.
|
||
- **Pre-injected banger agents** — the pulled ext4 ships with
|
||
`banger-vsock-agent`, `banger-network.service`, and the
|
||
`banger-first-boot` unit already enabled.
|
||
- **First-boot sshd install** — a one-shot systemd service installs
|
||
`openssh-server` via the guest's package manager on first boot.
|
||
Dispatches on `/etc/os-release` → `apt-get` / `apk` / `dnf` /
|
||
`pacman` / `zypper`. Subsequent boots skip the install.
|
||
|
||
## What doesn't yet work
|
||
|
||
- **Private registries**. Anonymous pulls only. Docker Hub, GHCR
|
||
(public), quay.io (public) all work. Adding auth via
|
||
`authn.DefaultKeychain` (from `go-containerregistry`) is a cheap
|
||
follow-up when someone needs it.
|
||
- **Non-`linux/amd64`**. The kernel catalog is x86_64-only, so pulled
|
||
rootfses match. `arm64` is additive in the schema.
|
||
- **Non-systemd rootfses**. The injected units assume systemd as
|
||
PID 1. Alpine ≥3.20 ships systemd; older alpine + void + busybox-
|
||
init images won't honour the banger-* units.
|
||
- **First boot needs network access**. The first-boot sshd install
|
||
reaches out to the distro's package repo. VMs without NAT or
|
||
without the bridge reaching the internet time out. The marker file
|
||
stays in place so a later restart retries.
|
||
|
||
## Architecture
|
||
|
||
`internal/imagepull/` owns the mechanics:
|
||
|
||
- **`Pull`** wraps `go-containerregistry`'s `remote.Image` with the
|
||
`linux/amd64` platform pinned. Layer blobs cache under
|
||
`~/.cache/banger/oci/blobs/` and populate lazily during flatten.
|
||
- **`Flatten`** replays layers oldest-first into a staging directory,
|
||
applies whiteouts, rejects unsafe paths plus filenames that banger's
|
||
debugfs ownership fixup cannot encode safely. Returns a `Metadata`
|
||
map of per-file uid/gid/mode from tar headers.
|
||
- **`BuildExt4`** runs `mkfs.ext4 -F -d <staging> -E root_owner=0:0`
|
||
at the size of the pre-truncated file — no mount, no sudo, no
|
||
loopback. Requires `e2fsprogs ≥ 1.43`.
|
||
- **`ApplyOwnership`** streams a batched `set_inode_field` script to
|
||
`debugfs -w` to rewrite per-file uid/gid/mode to the captured tar-
|
||
header values.
|
||
- **`InjectGuestAgents`** uses the same `debugfs` scripting to drop
|
||
banger's guest assets into the ext4 with root ownership:
|
||
vsock agent binary, network bootstrap + unit, first-boot script +
|
||
unit, `multi-user.target.wants` symlinks, vsock modules-load
|
||
config, `/var/lib/banger/first-boot-pending` marker.
|
||
|
||
`internal/daemon/images_pull.go` orchestrates `pullFromOCI`:
|
||
|
||
1. Parse + validate the OCI ref, derive a default name when `--name`
|
||
is omitted (`debian-bookworm` from
|
||
`docker.io/library/debian:bookworm`).
|
||
2. Resolve kernel info via `resolveKernelInputs` (auto-pulls from
|
||
`kernelcat` if `--kernel-ref` names a catalog entry that isn't
|
||
yet local).
|
||
3. Stage at `<ImagesDir>/<id>.staging`; extract layers to a temp
|
||
tree under `$TMPDIR`.
|
||
4. `BuildExt4` → `ApplyOwnership` → `InjectGuestAgents`.
|
||
5. `imagemgr.StageBootArtifacts` stages the kernel triple alongside.
|
||
6. Atomic `os.Rename` publishes the artifact dir.
|
||
7. Persist a `model.Image{Managed: true, …}` record.
|
||
|
||
## Guest-side boot sequence
|
||
|
||
On first boot of a pulled image:
|
||
|
||
1. **`banger-network.service`** — brings the guest interface up with
|
||
the IP assigned by banger's VM-create lifecycle.
|
||
2. **`banger-first-boot.service`** (first boot only) — reads
|
||
`/etc/os-release`, dispatches to the native package manager,
|
||
installs `openssh-server`, enables `ssh.service`.
|
||
3. **`banger-vsock-agent.service`** — the health-check daemon banger
|
||
uses to confirm the VM is alive.
|
||
|
||
Subsequent boots skip step 2.
|
||
|
||
## Adding distro support to first-boot
|
||
|
||
`internal/imagepull/assets/first-boot.sh` is the POSIX-sh dispatch.
|
||
Add a new `ID=` branch and its install command, then rebuild banger
|
||
(the asset is `go:embed`-ed).
|
||
|
||
Supported `ID` values today: `debian`, `ubuntu`, `kali`, `raspbian`,
|
||
`linuxmint`, `pop`, `alpine`, `fedora`, `rhel`, `centos`, `rocky`,
|
||
`almalinux`, `arch`, `archlinux`, `manjaro`, `opensuse*`, `suse`.
|
||
Unknown distros fall back to `ID_LIKE`, then error cleanly.
|
||
|
||
## Paths
|
||
|
||
| What | Where |
|
||
|------|-------|
|
||
| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/<hex>` |
|
||
| Staging dir | `~/.local/state/banger/images/<id>.staging/` |
|
||
| Extraction scratch | `$TMPDIR/banger-pull-<rand>/` |
|
||
| Published image | `~/.local/state/banger/images/<id>/rootfs.ext4` |
|
||
|
||
## Tech debt
|
||
|
||
- **Auth**. When we add private-registry support, the natural path
|
||
is `authn.DefaultKeychain`, which honours `~/.docker/config.json`
|
||
and the standard credential helpers.
|
||
- **Cache eviction**. OCI layer blobs accumulate forever. A `banger
|
||
image cache prune` command is a cheap follow-up when disk usage
|
||
becomes a complaint.
|
||
- **Non-systemd rootfses**. The guest agents assume systemd. Adding
|
||
openrc / s6 / busybox-init variants means keeping parallel unit
|
||
trees keyed on `/etc/os-release`.
|
||
|
||
## Trust model
|
||
|
||
`image pull` (OCI path) delegates trust to the registry the user
|
||
selected. `go-containerregistry` verifies layer digests against the
|
||
manifest during download, so a tampered mirror can't ship modified
|
||
layers without breaking the sha256 chain. Banger does not verify OCI
|
||
image signatures (cosign/sigstore) — users who care should verify
|
||
references out-of-band.
|