banger

Author	SHA1	Message	Date
Thales Maciel	3edd7c6de7	daemon: build a work-seed during image pull, refresh doctor check Before this change `banger image pull` (both OCI-direct and bundle paths) shipped images with an empty WorkSeedPath — the BuildWorkSeedImage helper existed only behind the hidden `banger internal work-seed` CLI. Every pulled image hit ensureWorkDisk's no-seed branch, and the guest booted with a bare /root (no .bashrc, no .profile, none of the distro defaults). Pull now calls BuildWorkSeedImage after the rootfs is finalised (OCI) or fetched (bundle). The builder is behind a new `workSeedBuilder` test seam so existing pull tests don't accidentally demand sudo mount. The build failure is non-fatal: any error logs a warning and leaves WorkSeedPath empty — images stay publishable even if the pulled rootfs has no /root to extract. Verified end-to-end by wiping the cached smoke image and re-pulling: work-seed.ext4 lands in the artifact dir next to rootfs.ext4, and all 21 smoke scenarios pass. Also refreshes the "feature /root work disk" fallback tooling check — the no-seed path no longer touches mount/umount/cp after commit `0e28504`, so the doctor check now only requires truncate + mkfs.ext4. The warn copy updates from "new VM creates will be slower" to "guest /root will be empty", which matches the actual tradeoff post-refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:24:10 -03:00
Thales Maciel	d7614a3b2b	daemon split (2/5): extract ImageService service Second phase of splitting the daemon god-struct. ImageService now owns all image + kernel registry operations: register/promote/delete/pull for images (bundle + OCI paths), the six kernel commands, and the shared SSH-key/work-seed injection helpers. imageOpsMu (the publication-window lock) lives on the service; so do the three OCI pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch. The four files images.go, images_pull.go, image_seed.go, kernels.go flipped their receivers from Daemon to ImageService. FindImage moved with the service. Daemon keeps a thin FindImage forwarder so callers reading the dispatch code see the obvious facade and tests that pre-date the split still compile. flattenNestedWorkHome — called from image_seed.go, vm_authsync.go, and vm_disk.go across future service boundaries — became a package-level helper taking a CommandRunner explicitly. Daemon keeps a deprecated forwarder for now; the other services will use the package form. Lazy-init helper imageSvc() on Daemon mirrors hostNet() from Phase 1, so test literals like &Daemon{store: db, runner: r, ...} that don't spell out an ImageService still get a working one. Tests that override the image test seams (autopull_test, concurrency_test, images_pull_test, images_pull_bundle_test) now assign d.img = &ImageService{...seams...}; the two-statement pattern matches what Phase 1 established for HostNetwork. Dispatch in daemon.go is cleaner now: every image/kernel RPC handler is a single-liner forwarding to d.imageSvc().. Phase 5 will do the same for VM lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:30:32 -03:00
Thales Maciel	99d0811097	daemon: shrink createVMMu + imageOpsMu to reservation/publication windows Before: createVMMu was held across the whole of CreateVM — including image resolution (which could fire a full auto-pull) and startVMLocked (boot of multiple seconds). imageOpsMu was held across the whole of PullImage/RegisterImage/PromoteImage/DeleteImage, so any slow OCI pull, bundle download, or file copy blocked every other image mutation and every other VM create that needed to auto-pull. The async create API bought nothing if all creates serialised on the same mutex. CreateVM is now three phases: 1. Validate + resolve image (possibly auto-pulling). No global lock. 2. reserveVM: take createVMMu only long enough to re-check the name is free, allocate the next guest IP, and UpsertVM the "created" row. Milliseconds. 3. startVMLocked: run the full boot flow under the per-VM lock only. Parallel creates of different VMs now overlap on image resolution + boot; they contend only across the reservation claim. For the image surface a new publishImage helper isolates the commit atom (recheck name free, atomic rename stagingDir→finalDir, UpsertImage) under imageOpsMu. pullFromBundle + pullFromOCI do their network fetch + ext4 build + ownership fixup + agent injection outside the lock; Register moves validation + kernel resolution outside; Promote moves file copy + SSH-key seeding outside; Delete keeps a brief lock over the lookup + reference check + store delete and does file cleanup unlocked. Two concurrency tests assert the new behaviour: - TestPullImageDoesNotSerialiseOnDifferentNames fails the old code (second pull blocks on imageOpsMu and never reaches the body). - TestPullImageRejectsNameClashAtPublish confirms the publish-window recheck is what enforces name uniqueness now that the body runs unlocked — exactly one winner. ARCHITECTURE.md updated to describe the new scope explicitly instead of calling the locks "narrow". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:44:22 -03:00
Thales Maciel	e0894376ea	vm create: auto-pull image and kernel from catalogs if missing One-command sandbox: `banger vm run` on a fresh host now Just Works. No prior `banger image pull` or `banger kernel pull` needed. Changes: - Default `default_image_name` flips from "default" to "debian-bookworm" so the golden image is the implicit target when `--image` is omitted. - `CreateVM` resolves the image via a new `findOrAutoPullImage`: try the local store first, and on miss fall back to the embedded imagecat catalog + auto-pull. Emits a vm-create progress stage so the user sees "pulling from image catalog" in the create output. - `resolveKernelInputs` gains context + the same pattern via `readOrAutoPullKernel`: try the local kernelcat, and on miss look up the embedded kernelcat and auto-pull. Fires whenever a bundle's manifest references a kernel the user hasn't pulled yet, not just during image pull — any CreateVM with an image that needs a kernel not yet local will resolve it. - `--image` help text updated on both `vm run` and `vm create`. Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog error propagation, and a non-ENOENT kernel read error does NOT trigger a misleading "not in catalog" claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:10:26 -03:00
Thales Maciel	5bdc9985c2	image pull: dispatch to imagecat bundle path before OCI PullImage now checks the embedded imagecat catalog first. If the ref matches a catalog entry, it takes the bundle path: 1. Fetch the .tar.zst bundle into a staging dir (rootfs.ext4 + manifest.json). 2. Strip manifest.json (staging-only metadata). 3. Stage kernel/initrd/modules alongside rootfs.ext4. 4. Publish the staging dir and upsert the image row. Bundle rootfs is already flattened + ownership-fixed + agent- injected at build time, so the daemon-side work is strictly I/O — no flatten, no mkfs, no debugfs. Kernel resolution in the bundle path: --kernel-ref > entry.kernel_ref > --kernel/--initrd/--modules. If the ref doesn't match a catalog entry, PullImage falls through to the existing OCI path unchanged (extracted into pullFromOCI). New test seam: d.bundleFetch. Six unit tests cover happy path, --kernel-ref override, existing-name rejection, kernel-required error, fetch-failure cleanup, and the catalog → OCI fallthrough. CLI help updated: image pull now documents both forms and takes <name-or-oci-ref> instead of requiring an OCI ref. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:43:33 -03:00
Thales Maciel	491c8e1ebb	Phase B-2: pre-inject banger guest agents into pulled rootfs New imagepull.InjectGuestAgents writes banger's guest-side assets straight into the pulled ext4 so systemd will start them at first boot: /usr/local/bin/banger-vsock-agent (binary, 0755) /usr/local/libexec/banger-network-bootstrap (script, 0755) /etc/systemd/system/banger-network.service (unit, 0644) /etc/systemd/system/banger-vsock-agent.service (unit, 0644) /etc/modules-load.d/banger-vsock.conf (modules, 0644) plus enable-at-boot symlinks under /etc/systemd/system/multi-user.target.wants/ All writes + ownership + symlinks go through one `debugfs -w -f -` invocation. No sudo required because the caller owns the ext4 file. Script is deterministic: shallow-first mkdir, then write, then sif, then symlink. "File exists" errors from mkdir on already-present dirs are tolerated (debugfs keeps going past them with -f, and we filter them out of the output scan). Asset content reuses the existing guestnet.BootstrapScript / SystemdServiceUnit / ConfigPath and vsockagent.ServiceUnit / ModulesLoadConfig / GuestInstallPath — one source of truth, no duplicated systemd unit strings. Daemon wiring: new d.finalizePulledRootfs seam runs both ApplyOwnership (B-1) and InjectGuestAgents as one phase between BuildExt4 and StageBootArtifacts. The companion vsock-agent binary is resolved via paths.CompanionBinaryPath. Existing daemon tests stub the seam with a no-op to avoid needing a real companion binary + debugfs in the test harness. Tests: real-ext4 round-trip that builds a minimal ext4, runs InjectGuestAgents, then verifies every expected path is present via `debugfs stat`, plus uid=0 and mode 0755 on the vsock-agent binary. Also: missing-binary rejection, ancestor-collection order test. debugfs/mkfs.ext4 tests skip on hosts without the binaries. After B-1+B-2, any OCI image that already ships sshd boots with banger-network and banger-vsock-agent running; image pull is one step from "useful rootfs primitive". B-3 (first-boot sshd install) unlocks images that don't ship sshd. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:08:56 -03:00
Thales Maciel	43982a4ae3	Phase B-1: ownership fixup via debugfs pass imagepull.Flatten now captures per-file uid/gid/mode/type from the tar headers as it walks layers, returning a Metadata map alongside the extracted tree. Whiteouts correctly drop the victim's metadata. The returned Metadata feeds the new imagepull.ApplyOwnership, which pipes a batched `set_inode_field` script to `debugfs -w -f -`. Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so without this pass setuid binaries become setuid-nonroot and sshd refuses to start on the resulting image. With the pass, a pulled debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving intact. imagepull.BuildExt4 signature unchanged; ownership is applied as a separate step by the daemon orchestrator between BuildExt4 and StageBootArtifacts, keeping each helper focused. The seam (d.pullAndFlatten) now returns (Metadata, error) for test stubs to feed synthetic metadata. StdinRunner is a new duck-typed extension next to CommandRunner; the real system.Runner implements RunStdin, test mocks don't need to unless they exercise stdin. Prevents every existing mock from growing a new method. Tests: - TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the tar-header walk - TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip — create ext4 with runner's uid, apply synthetic metadata setting uid=0 + setuid mode, verify via `debugfs -R stat` that the inode now has uid=0 and mode 04755 - TestBuildOwnershipScriptDeterministic: sorted, well-formed sif script output Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:04:22 -03:00
Thales Maciel	a8c9983542	Phase 2: daemon PullImage orchestration (d *Daemon).PullImage downloads an OCI image, flattens it into an ext4 rootfs, and registers the result as a managed banger image. Flow (internal/daemon/images_pull.go): 1. Parse + validate the OCI ref via go-containerregistry/name. 2. Derive a friendly default name from the ref ("debian-bookworm") when --name is omitted. 3. Reject if an image with that name already exists. 4. Resolve kernel info via the new shared resolveKernelInputs helper (refactored out of RegisterImage); ValidateKernelPaths checks the kernel triple alone. 5. Acquire imageOpsMu, generate a fresh image id, and stage at <ImagesDir>/<id>.staging. 6. imagepull.Pull → cache layers under OCICacheDir; imagepull.Flatten → temp rootfs tree under os.TempDir (so the state filesystem doesn't temporarily double in size). 7. Default size: max(treeSize × 1.25, 1 GiB); --size override accepted. 8. imagepull.BuildExt4 produces the rootfs.ext4 in the staging dir. 9. imagemgr.StageBootArtifacts stages the kernel/initrd/modules into the same dir (reused unchanged). 10. Atomic os.Rename(staging, finalDir) publishes the artifact dir. 11. Persist model.Image with Managed=true. Failure at any step removes the staging dir; failure post-rename removes finalDir. The pullAndFlatten field on Daemon is the test seam: tests stub it to write a fixture tree into destDir and skip the real registry. Refactor: extracted the "kernel-ref vs direct paths" resolution out of RegisterImage into d.resolveKernelInputs so PullImage and RegisterImage share one source of truth for that policy. Split ValidateRegisterPaths into a kernel-only ValidateKernelPaths so PullImage (which produces the rootfs itself) can validate just the kernel triple without the rootfs check. API: ImagePullParams { Ref, Name, KernelPath, InitrdPath, ModulesDir, KernelRef, SizeBytes }. RPC dispatch case image.pull mirrors image.register. Tests cover: happy-path producing a managed image with all four artifacts present + staging cleaned up, name-collision rejection, missing-kernel rejection, and staging cleanup on a failed pull. defaultImageNameFromRef handles tag/digest/no-suffix cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 17:27:32 -03:00

8 commits