Survive banger update with running VMs

Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:

1. Unit templates set `KillMode=process` on bangerd.service and
   bangerd-root.service. The default control-group behaviour sent
   SIGKILL to every process in the cgroup on stop/restart — including
   jailer-spawned firecracker children, since fork/exec doesn't
   escape a systemd cgroup. With process mode only the unit's main
   PID is signalled; FC children stay alive in the (unowned)
   cgroup until the new helper instance starts up and re-claims them.

2. `fcproc.FindPID` falls back to the jailer-written pidfile at
   `<chroot>/firecracker.pid` (sibling of the api-sock target) when
   `pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
   jailer'd FCs because their cmdline only carries the chroot-relative
   `--api-sock /firecracker.socket`, not the host-side path. The
   pidfile is jailer's actual record of the post-exec FC PID, so
   reconcile can verify the surviving process is the right one
   (comm == "firecracker") and re-seed handles.json without tearing
   down the VM's dm-snapshot.

Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.

Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.

CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-29 17:09:15 -03:00
parent 9c2e6a4647
commit cec7291184
No known key found for this signature in database
GPG key ID: 33112E6833C34679
5 changed files with 310 additions and 3 deletions

View file

@ -300,6 +300,13 @@ func renderSystemdUnit(meta installmeta.Metadata) string {
"ExecStart=" + systemBangerdBin + " --system",
"Restart=on-failure",
"RestartSec=1s",
// KillMode=process: only signal the main PID on stop/restart.
// The default (control-group) sends SIGKILL to every process in
// the unit's cgroup, including descendants — and during `banger
// update` we restart this unit, which would terminate any
// in-flight subprocesses spawned by the daemon. The daemon
// shuts its own children down explicitly when needed.
"KillMode=process",
"Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"Environment=TMPDIR=/run/banger",
"UMask=0077",
@ -350,6 +357,18 @@ func renderRootHelperSystemdUnit() string {
"ExecStart=" + systemBangerdBin + " --root-helper",
"Restart=on-failure",
"RestartSec=1s",
// KillMode=process is load-bearing: the helper unit's cgroup is
// where every banger-launched firecracker process lives (see
// validateFirecrackerPID). Without this, `systemctl restart
// bangerd-root.service` — which `banger update` runs — would
// SIGKILL every in-flight VM along with the helper because
// systemd's default KillMode=control-group nukes the whole cgroup.
// With process mode, only the helper PID is signaled; firecracker
// children survive, the new helper instance re-attaches via the
// helper RPC, daemon reconcile re-seeds in-memory state, VM keeps
// running. `banger system uninstall` and the daemon's vm-stop
// path explicitly stop firecracker processes when actually needed.
"KillMode=process",
"Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"Environment=TMPDIR=" + installmeta.DefaultRootHelperRuntimeDir,
"UMask=0077",

View file

@ -142,6 +142,7 @@ func TestRenderSystemdUnitIncludesHardeningDirectives(t *testing.T) {
"Wants=network-online.target bangerd-root.service",
"After=bangerd-root.service",
"Requires=bangerd-root.service",
"KillMode=process",
"UMask=0077",
"Environment=TMPDIR=/run/banger",
"NoNewPrivileges=yes",
@ -176,6 +177,7 @@ func TestRenderRootHelperSystemdUnitIncludesRequiredCapabilities(t *testing.T) {
for _, want := range []string{
"ExecStart=/usr/local/bin/bangerd --root-helper",
"KillMode=process",
"Environment=TMPDIR=/run/banger-root",
"NoNewPrivileges=yes",
"PrivateTmp=yes",