Defence-in-depth pass over every helper method that touches the host
as root. Each fix narrows what a compromised owner-uid daemon could
ask the helper to do; many close concrete file-ownership and DoS
primitives that the previous validators didn't reach.
Path / identifier validation:
* priv.fsck_snapshot now requires /dev/mapper/fc-rootfs-* (was
"is the string non-empty"). e2fsck -fy on /dev/sda1 was the
motivating exploit.
* priv.kill_process and priv.signal_process now read
/proc/<pid>/cmdline and require a "firecracker" substring before
sending the signal. Killing arbitrary host PIDs (sshd, init, …)
is no longer a one-RPC primitive.
* priv.read_ext4_file and priv.write_ext4_files now require the
image path to live under StateDir or be /dev/mapper/fc-rootfs-*.
* priv.cleanup_dm_snapshot validates every non-empty Handles field:
DM name fc-rootfs-*, DM device /dev/mapper/fc-rootfs-*, loops
/dev/loopN.
* priv.remove_dm_snapshot accepts only fc-rootfs-* names or
/dev/mapper/fc-rootfs-* paths.
* priv.ensure_nat now requires a parsable IPv4 address and a
banger-prefixed tap.
* priv.sync_resolver_routing and priv.clear_resolver_routing now
require a Linux iface-name-shaped bridge name (1–15 chars, no
whitespace/'/'/':') and, for sync, a parsable resolver address.
Symlink defence:
* priv.ensure_socket_access now validates the socket path is under
RuntimeDir and not a symlink. The fcproc layer's chown/chmod
moves to unix.Open(O_PATH|O_NOFOLLOW) + Fchownat(AT_EMPTY_PATH)
+ Fchmodat via /proc/self/fd, so even a swap of the leaf into a
symlink between validation and the syscall is refused. The
local-priv (non-root) fallback uses `chown -h`.
* priv.cleanup_jailer_chroot rejects symlinks at both the leaf
(os.Lstat) and intermediate path components (filepath.EvalSymlinks
+ clean-equality). The umount sweep was rewritten from shell
`umount --recursive --lazy` to direct unix.Unmount(MNT_DETACH |
UMOUNT_NOFOLLOW) per child mount, deepest-first; the findmnt
guard remains as the rm-rf safety net. Local-priv mode falls
back to `sudo umount --lazy`.
Binary validation:
* validateRootExecutable now opens with O_PATH|O_NOFOLLOW and
Fstats through the resulting fd. Rejects path-level symlinks and
narrows the TOCTOU window between validation and the SDK's exec
to fork+exec time on a healthy host.
Daemon socket:
* The owner daemon now reads SO_PEERCRED on every accepted
connection and refuses any UID that isn't 0 or the registered
owner. Filesystem perms (0600 + ownerUID) already enforced this;
the check is belt-and-braces in case the socket FD is ever
leaked to a non-owner process.
Docs:
* docs/privileges.md walked end-to-end. Each helper RPC's
Validation gate row reflects what the code actually enforces.
New section "Running outside the system install" calls out the
looser dev-mode trust model (NOPASSWD sudoers, helper hardening
bypassed) so users don't deploy that path on shared hosts.
Trust list updated to include every new validator.
Tests added: validators (DM-loop, DM-remove-target, DM-handles,
ext4-image-path, iface-name, IPv4, resolver-addr, not-symlink,
firecracker-PID, root-executable variants), the daemon's authorize
path (non-unix conn rejection + unix conn happy path), the umount2
ordering contract (deepest-first + --lazy on the sudo branch), and
positive/negative cases for the chown-no-follow fallback.
Verified end-to-end via `make smoke JOBS=4` on a KVM host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
Factor the service + capability wiring out of Daemon.Open() into
wireServices(d), an idempotent helper that constructs HostNetwork,
ImageService, WorkspaceService, and VMService from whatever
infrastructure (runner, store, config, layout, logger, closing) is
already set on d. Open() calls it once after filling the composition
root; tests that build &Daemon{...} literals call it to get a working
service graph, preinstalling stubs on the fields they want to fake.
Drops the four lazy-init getters on *Daemon — d.hostNet(),
d.imageSvc(), d.workspaceSvc(), d.vmSvc() — whose sole purpose was
keeping test literals working. Every production call site now reads
d.net / d.img / d.ws / d.vm directly; the services are guaranteed
non-nil once Open returns. No behavior change.
Mechanical: all existing `d.xxxSvc()` calls (production + tests)
rewritten to field access; each `d := &Daemon{...}` in tests gets a
trailing wireServices(d) so the literal + wiring are side-by-side.
Tests that override a pre-built service (e.g. d.img = &ImageService{
bundleFetch: stub}) now set the override before wireServices so the
replacement propagates into VMService's peer pointer.
Also nil-guards HostNetwork.stopVMDNS and d.store in Close() so
partially-initialised daemons (pre-reconcile open failure) still
tear down cleanly — same contract the old lazy getters provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second phase of splitting the daemon god-struct. ImageService now owns
all image + kernel registry operations: register/promote/delete/pull
for images (bundle + OCI paths), the six kernel commands, and the
shared SSH-key/work-seed injection helpers. imageOpsMu (the
publication-window lock) lives on the service; so do the three OCI
pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch.
The four files images.go, images_pull.go, image_seed.go, kernels.go
flipped their receivers from *Daemon to *ImageService.
FindImage moved with the service. Daemon keeps a thin FindImage
forwarder so callers reading the dispatch code see the obvious
facade and tests that pre-date the split still compile.
flattenNestedWorkHome — called from image_seed.go, vm_authsync.go,
and vm_disk.go across future service boundaries — became a
package-level helper taking a CommandRunner explicitly. Daemon keeps
a deprecated forwarder for now; the other services will use the
package form.
Lazy-init helper imageSvc() on Daemon mirrors hostNet() from
Phase 1, so test literals like &Daemon{store: db, runner: r, ...}
that don't spell out an ImageService still get a working one.
Tests that override the image test seams (autopull_test,
concurrency_test, images_pull_test, images_pull_bundle_test) now
assign d.img = &ImageService{...seams...}; the two-statement pattern
matches what Phase 1 established for HostNetwork.
Dispatch in daemon.go is cleaner now: every image/kernel RPC handler
is a single-liner forwarding to d.imageSvc().*. Phase 5 will do the
same for VM lifecycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The web UI shipped as "experimental" and was never finished — no nav
off the dashboard, no live updates, no settled design, never a
supported surface. It was opt-in by default already; leaving the code
in the tree for v0.1.0 only invited "does this work?" questions and
kept HostSummary/BangerSummary/SudoStatus types on the public RPC
surface that nothing else uses.
Removed:
internal/webui/ (all Go + templates + assets)
internal/daemon/web.go (server start / Layout / Config / ListVMs / ListImages)
internal/daemon/dashboard.go (DashboardSummary aggregator)
Simplified:
internal/api/types.go drop WebURL on PingResult, drop
HostSummary / SudoStatus / BangerSummary /
DashboardSummary / DashboardSummaryResult
internal/model/types.go drop DaemonConfig.WebListenAddr
internal/config/config.go drop web_listen_addr from fileConfig + Load
internal/daemon/daemon.go drop webListener / webServer / webURL fields +
startWebServer() call + ping WebURL population
internal/cli/banger.go `daemon status` output no longer branches on web
internal/daemon/{doc.go,ARCHITECTURE.md} drop web UI sections
README.md drop web_listen_addr config bullet + security paragraph
Tests updated to reflect the new shape. Coverage 57.3 -> 58.9% (the
webui package was largely untested; its removal lifts the ratio
without moving the numerator). `banger daemon status` output and
--help are web-free. Lint + full suite green.
The `image build` flow spun up a transient Firecracker VM, SSHed in,
and ran a large bash provisioning script to derive a new managed
image from an existing one. It overlapped heavily with the golden-
image Dockerfile flow (same mise/docker/tmux/opencode install logic
duplicated in Go as `imagemgr.BuildProvisionScript`) and had far more
machinery: async op state, RPC begin/status/cancel, webui form +
operation page, preflight checks, API types, tests. For custom
images, writing a Dockerfile is simpler and more reproducible.
Removed end-to-end:
- CLI `image build` subcommand + `absolutizeImageBuildPaths`.
- Daemon: BuildImage method, imagebuild.go (transient-VM orchestration),
image_build_ops.go (async begin/status/cancel), imagemgr/build.go
(the 247-line provisioning script generator and all its append*
helpers), validateImageBuildPrereqs + addImageBuildPrereqs.
- RPC dispatches for image.build / .begin / .status / .cancel.
- opstate registry `imageBuildOps`, daemon seam `imageBuild`,
background pruner call.
- API types: ImageBuildParams, ImageBuildOperation, ImageBuildBeginResult,
ImageBuildStatusParams, ImageBuildStatusResult; model type
ImageBuildRequest.
- Web UI: Backend interface methods, handlers, form, routes, template
branches (images.html build form, operation.html build branch,
dashboard.html Build button).
- Tests that directly exercised BuildImage.
Doctor polish (task C):
- Drop the "image build" preflight section entirely (its raison d'être
is gone).
- Default-image check now accepts "not local but in imagecat" as OK:
vm create auto-pulls on first use. Only flag when the image is
neither locally registered nor in the catalog.
Net: 24 files touched, 1,373 lines deleted, 25 added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the old `void-exp` repository defaults with `void` so the Make targets,
registration helper, example config, verification messaging, and sample test
fixtures all line up with the new managed image name.
Keep the scope to repo-facing naming only: config overrides, helper output, and
test fixtures now expect `void`, while runtime compatibility for existing local
`void-exp` VMs remains an operational concern outside this commit.
Validation: go test ./..., make build, and a local `banger vm create --image void`
smoke boot with ssh and opencode ports up.
Treat `banger`, `bangerd`, and `banger-vsock-agent` as one release by
stamping the same version, commit SHA, and build timestamp into every
binary through a shared ldflag-backed `internal/buildinfo` package.
Add `banger version`, extend daemon ping/status to report the running
daemon's build tuple, and keep the guest helper linked to the same build
metadata without adding a new public version surface for it.
Validate with `GOCACHE=/tmp/banger-gocache go test ./...`, `make build`,
`./build/bin/banger version`, and `./build/bin/banger daemon status`
after the daemon restarts onto the new binary.
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
Separate tracked source from generated artifacts so the repo root stops accumulating helper scripts, manifests, and local runtime outputs.
Move manual shell entrypoints under scripts/, manifests under config/, and the Firecracker API reference under docs/reference/. Make build and runtimebundle now target build/bin, build/runtime, and build/dist as the canonical source-checkout paths.
Update runtime discovery, helper scripts, tests, and docs to follow the new layout while keeping legacy source-checkout runtime fallbacks for existing local bundles during migration.
Validated with bash -n on the moved scripts, make build, and GOCACHE=/tmp/banger-gocache go test ./....
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.
Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.
Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.
Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.
Make iterating on a Firecracker-friendly Void guest practical without replacing the Debian default image path.
Add local Void rootfs build/register/verify plumbing, a language-agnostic dev package baseline, and guest SSH/work-disk hardening so new images use the runtime bundle key, keep a normal root bash environment, and repair stale nested /root layouts on restart.
Replace the guest PING/PONG responder with an HTTP /healthz agent over vsock, rename the runtime bundle and config surface from ping helper to agent while still accepting the legacy keys, and route the post-SSH reminder through the new vm.health path.
Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, bash -n customize.sh make-rootfs-void.sh, and git diff --check.
Serve daemon-managed .vm names directly from bangerd on 127.0.0.1:42069 instead of shelling out to mapdns. This keeps DNS state tied to VM lifecycle and lets the daemon rebuild records from running VMs after startup or reconcile.
Add a small in-process authoritative DNS server, register and remove records from the VM start/stop/delete paths, and show the listener in daemon status. Remove the mapdns config and preflight surface, stop helper-flow DNS publishing in customize.sh and interactive.sh, drop dns.sh from the runtime bundle, and update docs/tests for the new local-resolver integration model.
Validated with GOCACHE=/tmp/banger-gocache go test ./..., GOCACHE=/tmp/banger-gocache make build, and bash -n customize.sh interactive.sh.
Stop long-running daemon operations from running under context.Background by\nthreading a request-scoped context from handleConn into dispatch. The daemon\nnow cancels in-flight handlers when the client socket goes away, and the RPC\nclient closes its Unix connection when the caller context is canceled so that\ninterrupts actually reach the daemon boundary.\n\nAdd regression coverage for both sides of the path: canceled dispatch calls,\nclient disconnects during handleConn, watcher EOF cancellation, and context\ncancellation without an RPC deadline.\n\nValidated with GOCACHE=/tmp/banger-gocache go test ./... and\nGOCACHE=/tmp/banger-gocache make build.
Upgrades from the pre-bundle layout can leave the unmanaged default image pointing at repo-root artifacts that no longer exist, causing vm start preflight to fail forever. Treat the default image as reconcilable state instead of bailing out when a record exists.
Compute the desired default image from current runtime/config defaults, update an existing unmanaged default in place when its paths diverge, keep managed defaults untouched, and preserve the original ID so existing VMs keep referencing it. Added regression tests covering creation, no-op, reconciliation, managed-image skip, and missing-artifact behavior.
Stop assuming one workstation layout for runtime artifacts, mapdns, and host tooling. The daemon and shell helpers now use portable mapdns configuration, and runtime bundles can carry bundle.json metadata for their default kernel, initrd, modules, rootfs, and helper paths.
Load bundle metadata through config with a legacy layout fallback, thread mapdns_bin/mapdns_data_file through the Go and shell paths, and add command-scoped preflight checks for VM start, NAT, image build, work-disk resize, and SSH so missing tools or artifacts fail with actionable errors.
Update the runtime-bundle manifest, docs, and tests to match the new model. Verified with go test ./..., make build, and bash -n customize.sh interactive.sh dns.sh make-rootfs.sh verify.sh.
Fix the misleading make install path where banger and bangerd still depended on a repo checkout for Firecracker, guest artifacts, image builds, and the SSH key.
Replace repo-root inference with an explicit runtime bundle model: resolve a runtime_dir from env/config/install layout, derive concrete artifact paths from it, and update the daemon, CLI, and image-build flow to use those paths. Keep repo_root only as an explicit compatibility alias instead of auto-detecting it.
Teach customize.sh to run from a read-only bundled runtime tree while writing transient state under XDG/BANGER_STATE_DIR, and make make install copy the runtime assets into PREFIX/lib/banger so installed binaries stay usable outside the repo.
Validate with go test ./..., make build, bash -n customize.sh, and make install DESTDIR=/tmp/banger-install PREFIX=/usr. An out-of-repo installed-binary smoke test was attempted, but this sandbox blocked bangerd from binding its Unix socket (setsockopt: operation not permitted).