banger/internal
Thales Maciel cec7291184
Survive banger update with running VMs
Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:

1. Unit templates set `KillMode=process` on bangerd.service and
   bangerd-root.service. The default control-group behaviour sent
   SIGKILL to every process in the cgroup on stop/restart — including
   jailer-spawned firecracker children, since fork/exec doesn't
   escape a systemd cgroup. With process mode only the unit's main
   PID is signalled; FC children stay alive in the (unowned)
   cgroup until the new helper instance starts up and re-claims them.

2. `fcproc.FindPID` falls back to the jailer-written pidfile at
   `<chroot>/firecracker.pid` (sibling of the api-sock target) when
   `pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
   jailer'd FCs because their cmdline only carries the chroot-relative
   `--api-sock /firecracker.socket`, not the host-side path. The
   pidfile is jailer's actual record of the post-exec FC PID, so
   reconcile can verify the surviving process is the right one
   (comm == "firecracker") and re-seed handles.json without tearing
   down the VM's dm-snapshot.

Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.

Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.

CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:09:15 -03:00
..
api opstate,daemon: list in-flight operations via daemon.operations.list 2026-04-28 18:14:57 -03:00
buildinfo Stamp shared build metadata into banger binaries 2026-03-22 17:14:06 -03:00
cli Survive banger update with running VMs 2026-04-29 17:09:15 -03:00
config firecracker: adopt firecracker-jailer for VM launch (Phase B) 2026-04-28 14:38:07 -03:00
daemon Survive banger update with running VMs 2026-04-29 17:09:15 -03:00
download download: shared FetchVerified helper for capped + hashed downloads 2026-04-28 18:44:27 -03:00
firecracker doctor: pin firecracker version range, distro-aware install hint 2026-04-28 17:47:42 -03:00
guest ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
guestconfig Refactor VM lifecycle around capabilities 2026-03-18 19:28:26 -03:00
guestnet Stop using kernel IP autoconfig for runtime VMs 2026-03-21 21:54:18 -03:00
hostnat coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
imagecat imagecat,kernelcat: bound staged download, hash before extract 2026-04-28 16:09:55 -03:00
imagepull imagepull: reject symlink ancestors during OCI flatten 2026-04-28 15:20:46 -03:00
installmeta updater: download/stage/swap/rollback flow steps 2026-04-29 12:30:22 -03:00
kernelcat imagecat,kernelcat: bound staged download, hash before extract 2026-04-28 16:09:55 -03:00
model model,cli,docs: medium-effort polish for v0.1.0 2026-04-28 17:36:03 -03:00
namegen coverage: make targets + close zero-cov gaps (namegen, sessionstream) 2026-04-18 17:44:37 -03:00
paths daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
policy Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00
roothelper cli,docs: trivial polish for v0.1.0 2026-04-28 17:31:54 -03:00
rpc daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
store store,bangerd: add --check-migrations flag for pre-swap schema check 2026-04-28 18:41:31 -03:00
system system: add AtomicReplace + Rollback for binary swap 2026-04-28 18:43:04 -03:00
toolingplan coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
updater updater: embed real cosign public key for v0.1.0 release signing 2026-04-29 12:50:52 -03:00
vmdns coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vsockagent Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00