banger

Author	SHA1	Message	Date
Thales Maciel	59e48e830b	daemon: split owner daemon from root helper Move the supported systemd path to two services: an owner-user bangerd for orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop, and Firecracker ownership. This removes repeated sudo from daily vm and image flows without leaving the general daemon running as root. Add install metadata, system install/status/restart/uninstall commands, and a system-owned runtime layout. Keep user SSH/config material in the owner home, lock file_sync to the owner home, and move daemon known_hosts handling out of the old root-owned control path. Route privileged lifecycle steps through typed privilegedOps calls, harden the two systemd units, and rewrite smoke plus docs around the supported service model. Verified with make build, make test, make lint, and make smoke on the supported systemd host path.	2026-04-26 12:43:17 -03:00
Thales Maciel	6ab1a2b844	daemon: rewrite git identity sync + file_sync on ext4 toolkit ensureGitIdentityOnWorkDisk, writeGitIdentity, runFileSync, and copyHostDir all dropped their mount + sudo install/mkdir/chmod/chown scaffolding. Every write now goes through MkdirExt4, WriteExt4FileOwned, ReadExt4File, and the new MkdirAllExt4 helper — all sudoless against user-owned ext4 images. Net effect with the prior two commits: ensureWorkDisk, authsync, image seeding, git identity sync, and file_sync no longer mount the work disk or spawn sudo mkdir/chmod/chown/cat/install. Only the image-build path (which legitimately produces root-owned artifacts) still touches MountTempDir. The filesystemRunner test harness grew a small debugfs/e2cp/e2rm emulator so the WorkspaceService tests keep exercising their real code paths without a live ext4 image. The mock is deliberately dumb — it only implements the subset runFileSync and writeGitIdentity drive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:29:30 -03:00
Thales Maciel	0e28504892	daemon: rewrite ensureWorkDisk no-seed path to skip the mount + cp The no-seed branch used to mount the base rootfs read-only, mount the freshly mkfs'd work disk read-write, sudo-cp /root from one to the other, then flatten any accidental /root/root/ nesting. Five sudo call sites packed into a fallback that the common image path doesn't even exercise. Replace with: `mkfs.ext4 -F -E root_owner=0:0` and nothing else. mkfs already stamps inode 2 as root:root:0755 — sshd's StrictModes walks that dir's ownership when the work disk mounts at /root in the guest, so getting it right from mkfs means authsync can just write authorized_keys without any repair pass. Tradeoff: no-seed VMs lose the base rootfs's default /root dotfiles (.bashrc, .profile). The no-seed path is explicitly the degraded fallback — `banger doctor` already warns about it — and users who want those back have two documented knobs: rebuild the image with a work-seed, or land them via [[file_sync]]. Sudo call sites removed: 5 (MountTempDir × 2, sudo cp -a, flattenNestedWorkHome's chmod/cp/rm). flattenNestedWorkHome itself stays alive for now — authsync + image_seed still call it — and gets deleted in commit 5 once its last caller goes away. While here: fix the freshly-added EnsureExt4RootPerms helper. `set_inode_field <2> mode N` overwrites the full i_mode word instead of preserving the type nibble, so the initial implementation that passed just the permission bits (0755) would reset the fs root to regular-file shape and break the next kernel mount with "Structure needs cleaning." The corrected call OR's in S_IFDIR (0o040000) explicitly. Test updated to match. Smoke: 21/21 scenarios green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:09:32 -03:00
Thales Maciel	77043966d4	system: add ext4 toolkit for non-sudo work-disk writes The daemon mounts every VM's work disk on the host via sudo, copies files in as root, chmods+chowns them, and unmounts. That's ~18 of banger's runtime RunSudo calls. The ext4 image is a regular file the daemon user owns; e2cp / debugfs can write to it directly and bake uid/gid/mode into the filesystem metadata without the caller being root. `imagepull.ApplyOwnership` already proves this works in production (OCI layer flattening writes 0/0/root-owned inodes from an unprivileged daemon). This commit adds the toolkit layer. Callers land in the next four commits: - MkdirExt4 — idempotent directory create + metadata reset, single debugfs batch - WriteExt4FileOwned — e2cp + debugfs-driven uid/gid/mode, auto- cleans the host tempfile - SetExt4Ownership — sif + set_inode_field batch for existing inodes (no mkdir implied) - EnsureExt4RootPerms — fixes inode <2> (the fs root, which is `/root` once the work disk is mounted inside the guest), the thing sshd's StrictModes walks - Ext4PathExists — yes/no probe via `debugfs -R "stat ..."` with "File not found" detection - ReadExt4File — bytes-returning wrapper around the existing ReadDebugFSText with the same path rejection Design notes: - extfsRun auto-switches Run ↔ RunSudo on imagePath's type: regular files get the unprivileged path, block devices (dm-snapshot, loops) get sudo. The same helper works for both patchRootOverlay (dm device) and work-disk writes (user-owned file). No caller flag needed — os.Stat tells us. - debugfsScript batches set_inode_field + sif + mkdir lines into one `debugfs -w -f -` stdin invocation on any Runner that implements StdinRunner (production's system.Runner does). Matches imagepull.ApplyOwnership's existing pattern; dramatically cheaper than per-call subprocesses. - Paths are escaped for debugfs on the way in: spaces get double- quoted, double-quote/backslash/newline are rejected outright (debugfs's hand-rolled parser doesn't reliably escape those and we'd rather fail fast than silently scribble over the wrong inode). Tests: seven behaviour assertions via scripted + stdin-scripted runners — existence probe (found + missing + rejection), read passthrough, mkdir batch contents (new vs. pre-existing path), write tempfile cleanup + mode line shape, root-inode addressing, and the full rejectDebugfsUnsafePath matrix. No production wiring change in this commit — the helpers land unused. `make smoke` stays green (21/21) because nothing else shifted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 16:31:50 -03:00
Thales Maciel	b4afe13b2a	daemon: fix vm start (on a stopped VM) + regression coverage Two defects compounded to make `vm create X` → `vm stop X` → `vm start X` → `vm ssh X` fail with `not_running: vm X is not running` even though `vm show` reports `state=running`. 1. firecracker-go-sdk's startVMM spawns a goroutine that SIGTERMs firecracker when the ctx passed to Machine.Start cancels — and retains that ctx for the lifetime of the VMM, not just the boot phase. Our Machine.Start wrapper was plumbing the caller's ctx through, which on `vm.start` is the RPC request ctx. daemon.go's handleConn cancels reqCtx via `defer cancel()` right after writing the response. Net effect: firecracker is killed ~150ms after the `vm start` RPC "completes", invisibly, and the next `vm ssh` sees a dead PID. `vm.create` side-stepped the bug because BeginVMCreate detaches to context.Background() before calling startVMLocked; `vm.start` used the RPC ctx directly. Fix: Machine.Start now passes context.Background() to the SDK. We own firecracker lifecycle explicitly (StopVM / KillVM / cleanupRuntime), so ctx-driven cancellation here was never actually wired into anything useful. 2. With (1) fixed, the same scenario exposed a second defect: patchRootOverlay's e2cp/e2rm refuses to touch the dm-snapshot with "Inode bitmap checksum does not match bitmap" on a restart, because the COW holds stale free-block/free-inode counters from the previous guest boot. Kernel ext4 is fine with this; e2fsprogs is not. Fix: run `e2fsck -fy` on the snapshot between the dm_snapshot and patch_root_overlay stages. Idempotent on a fresh snapshot, reconciles the bitmaps on a reused COW. Regression coverage: - scripts/repro-restart-bug.sh — minimal create→stop→start→ssh reproducer with rich on-failure diagnostics (daemon log trace, firecracker.log tail, handles.json, pgrep-by-apiSock, apiSock stat). Exits non-zero if the bug returns. - scripts/smoke.sh — lifecycle scenario (create/ssh/stop/start/ ssh/delete) and vm-set scenario (--vcpu 2 → stop → set --vcpu 4 → start → assert nproc=4). Both were pulled when the bug was first found; now restored. Supporting: - internal/system/system.ExitCode — extracts exec.ExitError's code without forcing callers to import os/exec. Needed by the e2fsck caller (policy test pins os/exec to the shell-out packages). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 12:01:46 -03:00
Thales Maciel	f8979de58a	coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan Pure-Go tests for formatters, layout resolution, and validators — no fixtures, no external processes. Targets previously-zero functions the triage scan flagged as low-hanging fruit. cli 55% -> 65% paths 64% -> 91% system 65% -> 75% vmdns 72% -> 86% toolingplan 73% -> 78% Total 52.6% -> 54.0%.	2026-04-18 17:57:05 -03:00
Thales Maciel	43982a4ae3	Phase B-1: ownership fixup via debugfs pass imagepull.Flatten now captures per-file uid/gid/mode/type from the tar headers as it walks layers, returning a Metadata map alongside the extracted tree. Whiteouts correctly drop the victim's metadata. The returned Metadata feeds the new imagepull.ApplyOwnership, which pipes a batched `set_inode_field` script to `debugfs -w -f -`. Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so without this pass setuid binaries become setuid-nonroot and sshd refuses to start on the resulting image. With the pass, a pulled debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving intact. imagepull.BuildExt4 signature unchanged; ownership is applied as a separate step by the daemon orchestrator between BuildExt4 and StageBootArtifacts, keeping each helper focused. The seam (d.pullAndFlatten) now returns (Metadata, error) for test stubs to feed synthetic metadata. StdinRunner is a new duck-typed extension next to CommandRunner; the real system.Runner implements RunStdin, test mocks don't need to unless they exercise stdin. Prevents every existing mock from growing a new method. Tests: - TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the tar-header walk - TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip — create ext4 with runner's uid, apply synthetic metadata setting uid=0 + setuid mode, verify via `debugfs -R stat` that the inode now has uid=0 and mode 04755 - TestBuildOwnershipScriptDeterministic: sorted, well-formed sif script output Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:04:22 -03:00
Thales Maciel	7192ba24ae	Phase 3: banger kernel import bridges make--kernel.sh output `banger kernel import <name> --from <dir>` copies a staged kernel bundle into the local catalog. <dir> is the output of `make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/ or build/manual/alpine-kernel/). kernelcat.DiscoverPaths locates artifacts under <dir>: 1. Prefers metadata.json (written by make-void-kernel.sh). 2. Falls back to globbing: boot/vmlinux- or vmlinuz-* (Alpine fallback), boot/initramfs-*, lib/modules/<latest>. The daemon's KernelImport copies kernel + optional initrd via system.CopyFilePreferClone and modules via system.CopyDirContents (no-sudo mode — catalog lives under ~/.local/state), computes SHA256 over the kernel, and writes the manifest via kernelcat.WriteLocal. While wiring this up, fixed a latent bug in system.CopyDirContents: filepath.Join(sourceDir, ".") silently drops the trailing dot, so `cp -a source source/contents target/` was copying the whole source directory (including its basename) instead of just its contents. Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts (the only existing caller) silently benefits. scripts/register-void-image.sh and scripts/register-alpine-image.sh are rewritten to use `banger kernel import … && banger image register --kernel-ref …` instead of the find-and-pass-paths dance. Preserves the same user-facing commands and env vars. Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz fallback, kernel-missing error, round-trip copy into the catalog, and the --from required flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:53:49 -03:00
Thales Maciel	14d8563f3c	Stop using kernel IP autoconfig for runtime VMs Avoid the Alpine boot stall caused by kernel ip= autoconfig running before virtio_net is available. Split runtime and image-build boot args so managed VMs boot without kernel network autoconfig, inject a static guest network config plus bootstrap script into the runtime overlay, and keep image builds on the old path for compatibility with existing base images. Preserve executable bits when patching guest files into ext4 images and add coverage for the new boot-arg split and guest network config generation. Validated with go test ./..., a rebuilt Alpine image, and a fresh alp-fast create/ssh check that brought vm.start down to about 2.7s.	2026-03-21 21:54:18 -03:00
Thales Maciel	a166068fab	Add an experimental Alpine image flow Stage a complete Alpine x86_64 image stack so \ --image alpineworks like the existing manual Void path instead of relying on Debian-oriented image builds.\n\nAdd make targets plus kernel/rootfs/register helpers that download pinned Alpine artifacts, extract a Firecracker-compatible vmlinux, build a matching mkinitfs initramfs, seed OpenRC services, and register/promote a managed image named alpine.\n\nFold in the bring-up fixes discovered during boot validation: use rootfstype=ext4 in shared boot args, install libgcc/libstdc++ for the opencode binary, and give opencode more time to become ready on cold boots.\n\nValidate with go test ./..., the Alpine helper builds, image promotion, and banger vm create --image alpine --name alp --nat plus guest service and port checks.	2026-03-21 20:25:55 -03:00
Thales Maciel	572bf32424	Remove runtime-bundle image dependencies Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.	2026-03-21 18:34:53 -03:00
Thales Maciel	2362d0ae39	Serve a local web UI from bangerd Add a localhost-only web console so VM and image management no longer depends on the CLI for every inspection and lifecycle action. Wire bangerd up to a configurable web listener, expose dashboard and async image-build state through the daemon, and serve CSRF-protected HTML pages with host-path picking, VM/image detail views, logs, ports, and progress polling for long-running operations. Keep the browser path aligned with the existing sudo and host-owned artifact model: surface sudo readiness, print the web URL in daemon status, and document the new workflow. Polish the UI with resource usage cards, clearer clickable affordances, cancel paths, confirmation prompts, image-name links, and HTTP port links. Validation: GOCACHE=/tmp/banger-gocache go test ./...	2026-03-21 16:47:47 -03:00
Thales Maciel	30f0c0b54a	Manage image artifacts and show VM create progress Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create. Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack. Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096. Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.	2026-03-21 14:48:01 -03:00
Thales Maciel	3ed78fdcfc	Add experimental Void guest workflow and vsock agent Make iterating on a Firecracker-friendly Void guest practical without replacing the Debian default image path. Add local Void rootfs build/register/verify plumbing, a language-agnostic dev package baseline, and guest SSH/work-disk hardening so new images use the runtime bundle key, keep a normal root bash environment, and repair stale nested /root layouts on restart. Replace the guest PING/PONG responder with an HTTP /healthz agent over vsock, rename the runtime bundle and config surface from ping helper to agent while still accepting the legacy keys, and route the post-SSH reminder through the new vm.health path. Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, bash -n customize.sh make-rootfs-void.sh, and git diff --check.	2026-03-19 14:51:25 -03:00
Thales Maciel	c8d9a122f9	Speed up VM create with work seeds Beat VM create wall time without changing VM semantics. Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly. Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks. Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".	2026-03-18 21:22:12 -03:00
Thales Maciel	4930d82cb9	Refactor VM lifecycle around capabilities Make host-integrated VM features fit a standard Go extension path instead of adding more one-off branches through vm.go. This is the enabling refactor for future work like shared mounts, not the /work feature itself. Add a daemon capability pipeline plus a structured guest-config builder, then move the existing /root work-disk mount, built-in DNS, and NAT wiring onto those hooks. Generalize Firecracker drive config at the same time so later storage features can extend machine setup without another hardcoded path. Add banger doctor on top of the shared readiness checks, update the docs to describe the new architecture, and cover the new seams with guest-config, capability, report, CLI, and full go test verification. Also verify make build and a real ./banger doctor run on the host.	2026-03-18 19:28:26 -03:00
Thales Maciel	9e98445fa2	Add visual VM resource bars to the TUI The TUI should show VM capacity pressure at a glance instead of making users read raw numbers or drill into per-VM details. Add a compact colored status row under the header that renders CPU, RAM, and disk usage as progress bars. CPU and RAM reflect reserved resources for running VMs, while disk reflects actual allocated overlay and work-disk bytes across all VMs against the filesystem backing banger state. Add host resource and filesystem helpers in the system package and cover the new aggregation and rendering behavior with TUI and system tests. Verified with GOCACHE=/tmp/banger-gocache go test ./... and GOCACHE=/tmp/banger-gocache make build.	2026-03-18 18:05:09 -03:00
Thales Maciel	942d242c03	Move avoidable daemon shell-outs into Go Reduce the control plane's dependency on helper scripts while keeping the hard Linux integration points in the approved shell-out layer. Replace the bash-driven image build path with a native Go builder that clones and optionally resizes the rootfs, boots a temporary Firecracker VM, provisions the guest over SSH, installs packages and modules, and preserves the package-manifest sidecar. Also replace a few small convenience shell-outs with Go helpers: read process stats from /proc, use os.Truncate for ext4 image growth, add file-clone and normalized-line helpers, drop the sh -c work-disk flattening path, and launch Firecracker via a direct sudo command. Add tests for the new SSH/archive and system helpers, plus a policy test that keeps os/exec imports confined to cli/firecracker/system. Update the docs to describe customize.sh as a manual helper rather than the daemon's image-build backend. Validated with go mod tidy, go test ./..., and make build.	2026-03-17 17:13:07 -03:00
Thales Maciel	5018bc6170	Add regression coverage for VM failure paths Dangerous lifecycle, store, system, and RPC paths still had little or no automated confidence, and the live smoke harness failed opaquely when guest boot timing drifted. This adds targeted unit coverage for store allocation and decode failures, system helper failure ordering and cleanup, RPC error handling, and daemon lookup/reconcile/editing/stats/preflight edge cases. It also makes verify.sh wait for daemon-observable VM readiness before SSH, reuse a bounded boot deadline for the SSH phase, and dump VM metadata, logs, tap state, socket state, and NAT rules on timeout so host-level failures are diagnosable instead of surfacing only connection refused. Validation: go test ./..., go test ./... -cover, bash -n verify.sh. No live ./verify.sh boot was run in this environment.	2026-03-16 15:46:54 -03:00
Thales Maciel	fcedacba5c	Make runtime defaults portable Stop assuming one workstation layout for runtime artifacts, mapdns, and host tooling. The daemon and shell helpers now use portable mapdns configuration, and runtime bundles can carry bundle.json metadata for their default kernel, initrd, modules, rootfs, and helper paths. Load bundle metadata through config with a legacy layout fallback, thread mapdns_bin/mapdns_data_file through the Go and shell paths, and add command-scoped preflight checks for VM start, NAT, image build, work-disk resize, and SSH so missing tools or artifacts fail with actionable errors. Update the runtime-bundle manifest, docs, and tests to match the new model. Verified with go test ./..., make build, and bash -n customize.sh interactive.sh dns.sh make-rootfs.sh verify.sh.	2026-03-16 15:30:08 -03:00
Thales Maciel	375900cf65	Rollback partial dm snapshot startup Prevent partial VM startup failures from leaking loop devices and dm state on the host. Move root snapshot setup into a rollback-safe helper that records loop and mapper handles incrementally, tears them down in reverse order on failure, and reuses the same dm/loop cleanup path during normal runtime teardown. Also switch the daemon runner field to a small command-runner interface so the snapshot path can be tested with injected failures. Add failure-injection coverage for losetup, blockdev, dmsetup, partial teardown, and joined rollback errors. Validated with go test ./... and make build.	2026-03-16 14:06:17 -03:00
Thales Maciel	ea72ea26fe	Add Go daemon-driven VM control plane Replace the shell-only user workflow with `banger` and `bangerd`: Cobra commands, XDG/SQLite-backed state, managed VM and image lifecycle, and a Bubble Tea TUI for browsing and operating VMs.\n\nKeep Firecracker orchestration behind the daemon so VM specs become persistent objects, and add repo entrypoints for building, installing, and documenting the new flow while still delegating rootfs customization to the existing shell tooling.\n\nHarden the control plane around real usage by reclaiming Firecracker API sockets for the user, restarting stale daemons after rebuilds, and returning the correct `vm.create` payload so the CLI and TUI creation flow work reliably.\n\nValidation: `go test ./...`, `make build`, and a host-side smoke test with `./banger vm create --name codex-smoke`.	2026-03-16 12:52:54 -03:00

22 commits