Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
ensureGitIdentityOnWorkDisk, writeGitIdentity, runFileSync, and
copyHostDir all dropped their mount + sudo install/mkdir/chmod/chown
scaffolding. Every write now goes through MkdirExt4,
WriteExt4FileOwned, ReadExt4File, and the new MkdirAllExt4 helper —
all sudoless against user-owned ext4 images.
Net effect with the prior two commits: ensureWorkDisk, authsync, image
seeding, git identity sync, and file_sync no longer mount the work
disk or spawn sudo mkdir/chmod/chown/cat/install. Only the
image-build path (which legitimately produces root-owned artifacts)
still touches MountTempDir.
The filesystemRunner test harness grew a small debugfs/e2cp/e2rm
emulator so the WorkspaceService tests keep exercising their real
code paths without a live ext4 image. The mock is deliberately
dumb — it only implements the subset runFileSync and writeGitIdentity
drive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The no-seed branch used to mount the base rootfs read-only, mount
the freshly mkfs'd work disk read-write, sudo-cp /root from one to
the other, then flatten any accidental /root/root/ nesting. Five
sudo call sites packed into a fallback that the common image path
doesn't even exercise.
Replace with: `mkfs.ext4 -F -E root_owner=0:0` and nothing else.
mkfs already stamps inode 2 as root:root:0755 — sshd's StrictModes
walks that dir's ownership when the work disk mounts at /root in
the guest, so getting it right from mkfs means authsync can just
write authorized_keys without any repair pass.
Tradeoff: no-seed VMs lose the base rootfs's default /root dotfiles
(.bashrc, .profile). The no-seed path is explicitly the degraded
fallback — `banger doctor` already warns about it — and users who
want those back have two documented knobs: rebuild the image with
a work-seed, or land them via [[file_sync]].
Sudo call sites removed: 5 (MountTempDir × 2, sudo cp -a,
flattenNestedWorkHome's chmod/cp/rm). flattenNestedWorkHome itself
stays alive for now — authsync + image_seed still call it — and
gets deleted in commit 5 once its last caller goes away.
While here: fix the freshly-added EnsureExt4RootPerms helper.
`set_inode_field <2> mode N` overwrites the full i_mode word
instead of preserving the type nibble, so the initial
implementation that passed just the permission bits (0755) would
reset the fs root to regular-file shape and break the next kernel
mount with "Structure needs cleaning." The corrected call OR's in
S_IFDIR (0o040000) explicitly. Test updated to match.
Smoke: 21/21 scenarios green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The daemon mounts every VM's work disk on the host via sudo, copies
files in as root, chmods+chowns them, and unmounts. That's ~18 of
banger's runtime RunSudo calls. The ext4 image is a regular file the
daemon user owns; e2cp / debugfs can write to it directly and bake
uid/gid/mode into the filesystem metadata without the caller being
root. `imagepull.ApplyOwnership` already proves this works in
production (OCI layer flattening writes 0/0/root-owned inodes from
an unprivileged daemon).
This commit adds the toolkit layer. Callers land in the next four
commits:
- MkdirExt4 — idempotent directory create + metadata reset, single
debugfs batch
- WriteExt4FileOwned — e2cp + debugfs-driven uid/gid/mode, auto-
cleans the host tempfile
- SetExt4Ownership — sif + set_inode_field batch for existing
inodes (no mkdir implied)
- EnsureExt4RootPerms — fixes inode <2> (the fs root, which is
`/root` once the work disk is mounted inside the guest), the
thing sshd's StrictModes walks
- Ext4PathExists — yes/no probe via `debugfs -R "stat ..."` with
"File not found" detection
- ReadExt4File — bytes-returning wrapper around the existing
ReadDebugFSText with the same path rejection
Design notes:
- extfsRun auto-switches Run ↔ RunSudo on imagePath's type: regular
files get the unprivileged path, block devices (dm-snapshot,
loops) get sudo. The same helper works for both patchRootOverlay
(dm device) and work-disk writes (user-owned file). No caller
flag needed — os.Stat tells us.
- debugfsScript batches set_inode_field + sif + mkdir lines into
one `debugfs -w -f -` stdin invocation on any Runner that
implements StdinRunner (production's system.Runner does). Matches
imagepull.ApplyOwnership's existing pattern; dramatically cheaper
than per-call subprocesses.
- Paths are escaped for debugfs on the way in: spaces get double-
quoted, double-quote/backslash/newline are rejected outright
(debugfs's hand-rolled parser doesn't reliably escape those and
we'd rather fail fast than silently scribble over the wrong
inode).
Tests: seven behaviour assertions via scripted + stdin-scripted
runners — existence probe (found + missing + rejection), read
passthrough, mkdir batch contents (new vs. pre-existing path), write
tempfile cleanup + mode line shape, root-inode addressing, and the
full rejectDebugfsUnsafePath matrix.
No production wiring change in this commit — the helpers land
unused. `make smoke` stays green (21/21) because nothing else
shifted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two defects compounded to make `vm create X` → `vm stop X` → `vm start X`
→ `vm ssh X` fail with `not_running: vm X is not running` even though
`vm show` reports `state=running`.
1. firecracker-go-sdk's startVMM spawns a goroutine that SIGTERMs
firecracker when the ctx passed to Machine.Start cancels — and
retains that ctx for the lifetime of the VMM, not just the boot
phase. Our Machine.Start wrapper was plumbing the caller's ctx
through, which on `vm.start` is the RPC request ctx. daemon.go's
handleConn cancels reqCtx via `defer cancel()` right after
writing the response. Net effect: firecracker is killed ~150ms
after the `vm start` RPC "completes", invisibly, and the next
`vm ssh` sees a dead PID. `vm.create` side-stepped the bug
because BeginVMCreate detaches to context.Background() before
calling startVMLocked; `vm.start` used the RPC ctx directly.
Fix: Machine.Start now passes context.Background() to the SDK.
We own firecracker lifecycle explicitly (StopVM / KillVM /
cleanupRuntime), so ctx-driven cancellation here was never
actually wired into anything useful.
2. With (1) fixed, the same scenario exposed a second defect:
patchRootOverlay's e2cp/e2rm refuses to touch the dm-snapshot
with "Inode bitmap checksum does not match bitmap" on a restart,
because the COW holds stale free-block/free-inode counters from
the previous guest boot. Kernel ext4 is fine with this; e2fsprogs
is not. Fix: run `e2fsck -fy` on the snapshot between the
dm_snapshot and patch_root_overlay stages. Idempotent on a fresh
snapshot, reconciles the bitmaps on a reused COW.
Regression coverage:
- scripts/repro-restart-bug.sh — minimal create→stop→start→ssh
reproducer with rich on-failure diagnostics (daemon log trace,
firecracker.log tail, handles.json, pgrep-by-apiSock, apiSock
stat). Exits non-zero if the bug returns.
- scripts/smoke.sh — lifecycle scenario (create/ssh/stop/start/
ssh/delete) and vm-set scenario (--vcpu 2 → stop → set --vcpu 4
→ start → assert nproc=4). Both were pulled when the bug was
first found; now restored.
Supporting:
- internal/system/system.ExitCode — extracts exec.ExitError's
code without forcing callers to import os/exec. Needed by the
e2fsck caller (policy test pins os/exec to the shell-out
packages).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
imagepull.Flatten now captures per-file uid/gid/mode/type from the
tar headers as it walks layers, returning a Metadata map alongside
the extracted tree. Whiteouts correctly drop the victim's metadata.
The returned Metadata feeds the new imagepull.ApplyOwnership, which
pipes a batched `set_inode_field` script to `debugfs -w -f -`.
Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so
without this pass setuid binaries become setuid-nonroot and sshd
refuses to start on the resulting image. With the pass, a pulled
debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving
intact.
imagepull.BuildExt4 signature unchanged; ownership is applied as a
separate step by the daemon orchestrator between BuildExt4 and
StageBootArtifacts, keeping each helper focused. The seam
(d.pullAndFlatten) now returns (Metadata, error) for test stubs to
feed synthetic metadata.
StdinRunner is a new duck-typed extension next to CommandRunner;
the real system.Runner implements RunStdin, test mocks don't need
to unless they exercise stdin. Prevents every existing mock from
growing a new method.
Tests:
- TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the
tar-header walk
- TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip —
create ext4 with runner's uid, apply synthetic metadata setting
uid=0 + setuid mode, verify via `debugfs -R stat` that the
inode now has uid=0 and mode 04755
- TestBuildOwnershipScriptDeterministic: sorted, well-formed
sif script output
Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`banger kernel import <name> --from <dir>` copies a staged kernel
bundle into the local catalog. <dir> is the output of
`make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/
or build/manual/alpine-kernel/).
kernelcat.DiscoverPaths locates artifacts under <dir>:
1. Prefers metadata.json (written by make-void-kernel.sh).
2. Falls back to globbing: boot/vmlinux-* or vmlinuz-* (Alpine
fallback), boot/initramfs-*, lib/modules/<latest>.
The daemon's KernelImport copies kernel + optional initrd via
system.CopyFilePreferClone and modules via system.CopyDirContents
(no-sudo mode — catalog lives under ~/.local/state), computes SHA256
over the kernel, and writes the manifest via kernelcat.WriteLocal.
While wiring this up, fixed a latent bug in system.CopyDirContents:
filepath.Join(sourceDir, ".") silently drops the trailing dot, so
`cp -a source source/contents target/` was copying the whole source
directory (including its basename) instead of just its contents.
Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts
(the only existing caller) silently benefits.
scripts/register-void-image.sh and scripts/register-alpine-image.sh
are rewritten to use `banger kernel import … && banger image register
--kernel-ref …` instead of the find-and-pass-paths dance. Preserves
the same user-facing commands and env vars.
Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz
fallback, kernel-missing error, round-trip copy into the catalog, and
the --from required flag.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoid the Alpine boot stall caused by kernel ip= autoconfig running before
virtio_net is available.
Split runtime and image-build boot args so managed VMs boot without kernel
network autoconfig, inject a static guest network config plus bootstrap
script into the runtime overlay, and keep image builds on the old path for
compatibility with existing base images.
Preserve executable bits when patching guest files into ext4 images and add
coverage for the new boot-arg split and guest network config generation.
Validated with go test ./..., a rebuilt Alpine image, and a fresh alp-fast
create/ssh check that brought vm.start down to about 2.7s.
Stage a complete Alpine x86_64 image stack so \ --image alpineworks like the existing manual Void path instead of relying on Debian-oriented image builds.\n\nAdd make targets plus kernel/rootfs/register helpers that download pinned Alpine artifacts, extract a Firecracker-compatible vmlinux, build a matching mkinitfs initramfs, seed OpenRC services, and register/promote a managed image named alpine.\n\nFold in the bring-up fixes discovered during boot validation: use rootfstype=ext4 in shared boot args, install libgcc/libstdc++ for the opencode binary, and give opencode more time to become ready on cold boots.\n\nValidate with go test ./..., the Alpine helper builds, image promotion, and banger vm create --image alpine --name alp --nat plus guest service and port checks.
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
Add a localhost-only web console so VM and image management no longer depends on the CLI for every inspection and lifecycle action.
Wire bangerd up to a configurable web listener, expose dashboard and async image-build state through the daemon, and serve CSRF-protected HTML pages with host-path picking, VM/image detail views, logs, ports, and progress polling for long-running operations.
Keep the browser path aligned with the existing sudo and host-owned artifact model: surface sudo readiness, print the web URL in daemon status, and document the new workflow. Polish the UI with resource usage cards, clearer clickable affordances, cancel paths, confirmation prompts, image-name links, and HTTP port links.
Validation: GOCACHE=/tmp/banger-gocache go test ./...
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.
Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.
Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.
Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.
Make iterating on a Firecracker-friendly Void guest practical without replacing the Debian default image path.
Add local Void rootfs build/register/verify plumbing, a language-agnostic dev package baseline, and guest SSH/work-disk hardening so new images use the runtime bundle key, keep a normal root bash environment, and repair stale nested /root layouts on restart.
Replace the guest PING/PONG responder with an HTTP /healthz agent over vsock, rename the runtime bundle and config surface from ping helper to agent while still accepting the legacy keys, and route the post-SSH reminder through the new vm.health path.
Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, bash -n customize.sh make-rootfs-void.sh, and git diff --check.
Beat VM create wall time without changing VM semantics.
Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly.
Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks.
Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".
Make host-integrated VM features fit a standard Go extension path instead of adding more one-off branches through vm.go. This is the enabling refactor for future work like shared mounts, not the /work feature itself.
Add a daemon capability pipeline plus a structured guest-config builder, then move the existing /root work-disk mount, built-in DNS, and NAT wiring onto those hooks. Generalize Firecracker drive config at the same time so later storage features can extend machine setup without another hardcoded path.
Add banger doctor on top of the shared readiness checks, update the docs to describe the new architecture, and cover the new seams with guest-config, capability, report, CLI, and full go test verification. Also verify make build and a real ./banger doctor run on the host.
The TUI should show VM capacity pressure at a glance instead of making users read raw numbers or drill into per-VM details.
Add a compact colored status row under the header that renders CPU, RAM, and disk usage as progress bars. CPU and RAM reflect reserved resources for running VMs, while disk reflects actual allocated overlay and work-disk bytes across all VMs against the filesystem backing banger state.
Add host resource and filesystem helpers in the system package and cover the new aggregation and rendering behavior with TUI and system tests. Verified with GOCACHE=/tmp/banger-gocache go test ./... and GOCACHE=/tmp/banger-gocache make build.
Reduce the control plane's dependency on helper scripts while keeping the hard Linux integration points in the approved shell-out layer.
Replace the bash-driven image build path with a native Go builder that clones and optionally resizes the rootfs, boots a temporary Firecracker VM, provisions the guest over SSH, installs packages and modules, and preserves the package-manifest sidecar.
Also replace a few small convenience shell-outs with Go helpers: read process stats from /proc, use os.Truncate for ext4 image growth, add file-clone and normalized-line helpers, drop the sh -c work-disk flattening path, and launch Firecracker via a direct sudo command.
Add tests for the new SSH/archive and system helpers, plus a policy test that keeps os/exec imports confined to cli/firecracker/system. Update the docs to describe customize.sh as a manual helper rather than the daemon's image-build backend.
Validated with go mod tidy, go test ./..., and make build.
Dangerous lifecycle, store, system, and RPC paths still had little or no automated confidence, and the live smoke harness failed opaquely when guest boot timing drifted. This adds targeted unit coverage for store allocation and decode failures, system helper failure ordering and cleanup, RPC error handling, and daemon lookup/reconcile/editing/stats/preflight edge cases.
It also makes verify.sh wait for daemon-observable VM readiness before SSH, reuse a bounded boot deadline for the SSH phase, and dump VM metadata, logs, tap state, socket state, and NAT rules on timeout so host-level failures are diagnosable instead of surfacing only connection refused.
Validation: go test ./..., go test ./... -cover, bash -n verify.sh. No live ./verify.sh boot was run in this environment.
Stop assuming one workstation layout for runtime artifacts, mapdns, and host tooling. The daemon and shell helpers now use portable mapdns configuration, and runtime bundles can carry bundle.json metadata for their default kernel, initrd, modules, rootfs, and helper paths.
Load bundle metadata through config with a legacy layout fallback, thread mapdns_bin/mapdns_data_file through the Go and shell paths, and add command-scoped preflight checks for VM start, NAT, image build, work-disk resize, and SSH so missing tools or artifacts fail with actionable errors.
Update the runtime-bundle manifest, docs, and tests to match the new model. Verified with go test ./..., make build, and bash -n customize.sh interactive.sh dns.sh make-rootfs.sh verify.sh.
Prevent partial VM startup failures from leaking loop devices and dm state on the host.
Move root snapshot setup into a rollback-safe helper that records loop and mapper handles incrementally, tears them down in reverse order on failure, and reuses the same dm/loop cleanup path during normal runtime teardown. Also switch the daemon runner field to a small command-runner interface so the snapshot path can be tested with injected failures.
Add failure-injection coverage for losetup, blockdev, dmsetup, partial teardown, and joined rollback errors. Validated with go test ./... and make build.
Replace the shell-only user workflow with `banger` and `bangerd`: Cobra commands, XDG/SQLite-backed state, managed VM and image lifecycle, and a Bubble Tea TUI for browsing and operating VMs.\n\nKeep Firecracker orchestration behind the daemon so VM specs become persistent objects, and add repo entrypoints for building, installing, and documenting the new flow while still delegating rootfs customization to the existing shell tooling.\n\nHarden the control plane around real usage by reclaiming Firecracker API sockets for the user, restarting stale daemons after rebuilds, and returning the correct `vm.create` payload so the CLI and TUI creation flow work reliably.\n\nValidation: `go test ./...`, `make build`, and a host-side smoke test with `./banger vm create --name codex-smoke`.