Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.
Bug fixes:
1. Disable the firecracker SDK's signal-forwarding goroutine. The
default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
SIGABRT] installs a handler in the helper that propagates the
helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
root.service`) to every running firecracker child. Set
ForwardSignals to an empty (non-nil) slice so setupSignals
short-circuits at len()==0.
2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
limits the initial SIGTERM to the helper main, but systemd
still SIGKILLs leftover cgroup processes during the
FinalKillSignal stage unless SendSIGKILL=no.
3. Route restart-helper / restart-daemon / wait-daemon-ready
failures through rollbackAndRestart instead of rollbackAndWrap.
rollbackAndWrap restored .previous binaries but didn't re-
restart the failed unit, leaving the helper dead with the
rolled-back binary on disk after a failed update.
Testing infrastructure (production binaries unaffected):
- Hidden --manifest-url and --pubkey-file flags on `banger update`
let the smoke harness redirect the updater at locally-built
release artefacts. Marked Hidden in cobra; not advertised in
--help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
FetchAndVerifySignatureWithKey export the existing logic against
caller-supplied URL / pubkey. The default entry points still
call them with the embedded canonical values.
Smoke scenarios:
- update_check: --check against fake manifest reports update
available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
bangerd that passes --check-migrations but exits 1 as the
daemon. The post-swap `systemctl restart bangerd` fails,
rollbackAndRestart fires, the .previous binaries are restored
and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
`bangerd --check-migrations --system`, `sudo banger daemon
stop`
The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.1.4 fixed the binary-level reconcile path for jailer'd VMs but
left a hole at the systemd layer: bangerd.service and bangerd-root.service
both defaulted to RuntimeDirectoryPreserve=no, so /run/banger was
wiped on every daemon stop. The api-sock symlinks the helper creates
for live VMs (`/run/banger/fc-<id>.sock` → `<chroot>/firecracker.socket`)
went with it, and findByJailerPidfile — which derives the chroot
from the symlink target — couldn't resolve them. Reconcile then fell
through to "stale_vm" and tore down the surviving FC's dm-snapshot.
Add RuntimeDirectoryPreserve=yes to both unit templates so the
symlinks survive the restart window. Live-verified end-to-end on
the dev host: started a VM under v0.1.5, restarted helper +
daemon, confirmed the FC PID was unchanged and `banger vm ssh`
returned the same boot_id pre and post.
Daemon-lifecycle tests updated to assert the new directive is
present in both rendered units so future regressions show up at
test time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:
1. Unit templates set `KillMode=process` on bangerd.service and
bangerd-root.service. The default control-group behaviour sent
SIGKILL to every process in the cgroup on stop/restart — including
jailer-spawned firecracker children, since fork/exec doesn't
escape a systemd cgroup. With process mode only the unit's main
PID is signalled; FC children stay alive in the (unowned)
cgroup until the new helper instance starts up and re-claims them.
2. `fcproc.FindPID` falls back to the jailer-written pidfile at
`<chroot>/firecracker.pid` (sibling of the api-sock target) when
`pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
jailer'd FCs because their cmdline only carries the chroot-relative
`--api-sock /firecracker.socket`, not the host-side path. The
pidfile is jailer's actual record of the post-exec FC PID, so
reconcile can verify the surviving process is the right one
(comm == "firecracker") and re-seed handles.json without tearing
down the VM's dm-snapshot.
Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.
Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.
CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each VM's firecracker now runs inside a per-VM chroot dropped to the
registered owner UID via firecracker-jailer. Closes the broad ambient-
sudo escalation surface that survived Phase A: the helper still needs
caps for tap/bridge/dm/loop/iptables, but the VMM itself no longer
runs as root in the host root filesystem.
The host helper stages each chroot up front: hard-links the kernel
and (optional) initrd, mknods block-device drives + /dev/vhost-vsock,
copies in the firecracker binary (jailer opens it O_RDWR so a ro bind
fails with EROFS), and bind-mounts /usr/lib + /lib trees read-only so
the dynamic linker can resolve. Self-binds the chroot first so the
findmnt-guarded cleanup can recurse safely.
AF_UNIX sun_path is 108 bytes; the chroot path easily blows past that.
Daemon-side launch pre-symlinks the short request socket path to the
long chroot socket before Machine.Start so the SDK's poll/connect
sees the short path while the kernel resolves to the chroot socket.
--new-pid-ns is intentionally disabled — jailer's PID-namespace fork
makes the SDK see the parent exit and tear the API socket down too
early.
CapabilityBoundingSet for the helper expands to add CAP_FOWNER,
CAP_KILL, CAP_MKNOD, CAP_SETGID, CAP_SETUID, CAP_SYS_CHROOT alongside
the existing CAP_CHOWN/CAP_DAC_OVERRIDE/CAP_NET_ADMIN/CAP_NET_RAW/
CAP_SYS_ADMIN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.