banger/internal/daemon
Thales Maciel cec7291184
Survive banger update with running VMs
Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:

1. Unit templates set `KillMode=process` on bangerd.service and
   bangerd-root.service. The default control-group behaviour sent
   SIGKILL to every process in the cgroup on stop/restart — including
   jailer-spawned firecracker children, since fork/exec doesn't
   escape a systemd cgroup. With process mode only the unit's main
   PID is signalled; FC children stay alive in the (unowned)
   cgroup until the new helper instance starts up and re-claims them.

2. `fcproc.FindPID` falls back to the jailer-written pidfile at
   `<chroot>/firecracker.pid` (sibling of the api-sock target) when
   `pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
   jailer'd FCs because their cmdline only carries the chroot-relative
   `--api-sock /firecracker.socket`, not the host-side path. The
   pidfile is jailer's actual record of the post-exec FC PID, so
   reconcile can verify the surviving process is the right one
   (comm == "firecracker") and re-seed handles.json without tearing
   down the VM's dm-snapshot.

Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.

Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.

CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:09:15 -03:00
..
dmsnap test: cover imagemgr + dmsnap helpers 2026-04-28 15:13:49 -03:00
fcproc Survive banger update with running VMs 2026-04-29 17:09:15 -03:00
imagemgr test: cover imagemgr + dmsnap helpers 2026-04-28 15:13:49 -03:00
opstate opstate,daemon: list in-flight operations via daemon.operations.list 2026-04-28 18:14:57 -03:00
workspace seams: move the last four package globals onto instance fields 2026-04-22 12:07:14 -03:00
ARCHITECTURE.md daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
autopull_test.go daemon: build a work-seed during image pull, refresh doctor check 2026-04-23 20:24:10 -03:00
capabilities.go daemon: surface previously-swallowed errors at warn 2026-04-26 22:30:51 -03:00
capabilities_test.go daemon: doctor passes vm dns when banger itself owns the port 2026-04-26 18:57:27 -03:00
concurrency_test.go daemon: build a work-seed during image pull, refresh doctor check 2026-04-23 20:24:10 -03:00
daemon.go cli,docs: trivial polish for v0.1.0 2026-04-28 17:31:54 -03:00
daemon_test.go roothelper: tighten input validation across privileged RPCs 2026-04-28 14:39:41 -03:00
daemon_testing_test.go test: add newTestDaemon harness + options 2026-04-22 17:45:43 -03:00
dispatch.go update: docs + publish script for the self-update feature 2026-04-29 12:43:46 -03:00
dispatch_test.go opstate,daemon: list in-flight operations via daemon.operations.list 2026-04-28 18:14:57 -03:00
dns_routing.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
dns_routing_test.go seams: move the last four package globals onto instance fields 2026-04-22 12:07:14 -03:00
doc.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
doctor.go cli,doctor: --version flag + CLI/install drift check 2026-04-28 17:53:32 -03:00
doctor_test.go update: docs + publish script for the self-update feature 2026-04-29 12:43:46 -03:00
fake_firecracker_test.go remove vm session feature 2026-04-20 12:47:58 -03:00
fastpath_test.go daemon: build the work disk fresh instead of cloning the seed file 2026-04-26 20:42:10 -03:00
guest_ssh.go remove vm session feature 2026-04-20 12:47:58 -03:00
host_network.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
image_cache.go image: add banger image cache prune for OCI cache cleanup 2026-04-28 16:32:57 -03:00
image_cache_test.go image: add banger image cache prune for OCI cache cleanup 2026-04-28 16:32:57 -03:00
image_seed.go daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh 2026-04-27 17:24:11 -03:00
image_service.go daemon: tighten concurrency around pulls, cleanup, and handle persistence 2026-04-27 19:32:43 -03:00
images.go daemon: tighten concurrency around pulls, cleanup, and handle persistence 2026-04-27 19:32:43 -03:00
images_helpers_test.go coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
images_pull.go daemon: build a work-seed during image pull, refresh doctor check 2026-04-23 20:24:10 -03:00
images_pull_bundle_test.go daemon: build a work-seed during image pull, refresh doctor check 2026-04-23 20:24:10 -03:00
images_pull_test.go daemon: build a work-seed during image pull, refresh doctor check 2026-04-23 20:24:10 -03:00
kernels.go daemon: tighten concurrency around pulls, cleanup, and handle persistence 2026-04-27 19:32:43 -03:00
kernels_test.go daemon split (6/n): extract wireServices + drop lazy service getters 2026-04-21 15:55:28 -03:00
lifecycle_flow_test.go test: end-to-end VMService lifecycle flow harness 2026-04-22 17:55:04 -03:00
logger.go cli,docs: trivial polish for v0.1.0 2026-04-28 17:31:54 -03:00
logger_test.go seams: move the last four package globals onto instance fields 2026-04-22 12:07:14 -03:00
nat.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
nat_capability_test.go daemon: persist tap device on VM.Runtime so NAT teardown survives handle-cache loss 2026-04-23 14:21:13 -03:00
nat_test.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
open_close_test.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
operations.go opstate,daemon: list in-flight operations via daemon.operations.list 2026-04-28 18:14:57 -03:00
preflight.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
privileged_ops.go firecracker: adopt firecracker-jailer for VM launch (Phase B) 2026-04-28 14:38:07 -03:00
runtime_assets.go daemon split (4/5): extract *VMService service 2026-04-20 20:57:05 -03:00
snapshot.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
snapshot_test.go daemon split (6/n): extract wireServices + drop lazy service getters 2026-04-21 15:55:28 -03:00
ssh_client_config.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
ssh_client_config_test.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
sshd_config_test.go daemon: delete flattenNestedWorkHome and normaliseHomeDirPerms 2026-04-23 18:33:06 -03:00
stats_service.go daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
stats_service_test.go daemon: extract StatsService sibling; shrink VMService's surface 2026-04-23 15:46:59 -03:00
tap_pool.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
vm.go firecracker: adopt firecracker-jailer for VM launch (Phase B) 2026-04-28 14:38:07 -03:00
vm_authsync.go daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
vm_create.go daemon: tighten concurrency around pulls, cleanup, and handle persistence 2026-04-27 19:32:43 -03:00
vm_create_ops.go daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
vm_create_test.go model: validate VM names as DNS labels at CLI + daemon 2026-04-23 14:06:40 -03:00
vm_disk.go system: mkfs work disks with lazy_itable_init + lazy_journal_init 2026-04-26 21:32:57 -03:00
vm_handles.go daemon: tighten concurrency around pulls, cleanup, and handle persistence 2026-04-27 19:32:43 -03:00
vm_handles_test.go daemon: persist teardown fallbacks and reject unsafe import paths 2026-04-23 16:21:59 -03:00
vm_lifecycle.go daemon: sync guest over ssh before stop to preserve workspace writes 2026-04-27 15:41:32 -03:00
vm_lifecycle_steps.go firecracker: adopt firecracker-jailer for VM launch (Phase B) 2026-04-28 14:38:07 -03:00
vm_lifecycle_steps_test.go daemon: extract startVMLocked into step runner with per-step rollback 2026-04-23 15:34:34 -03:00
vm_locks.go Move subsystem state/locks off Daemon into owning types 2026-04-15 15:58:33 -03:00
vm_service.go daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
vm_set.go daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
vm_test.go roothelper: tighten input validation across privileged RPCs 2026-04-28 14:39:41 -03:00
workspace.go feat(vm): add vm exec command with workspace dirty detection 2026-04-26 23:53:45 -03:00
workspace_rejection_test.go tests: targeted coverage for doctor, workspace rejections, and nat capability 2026-04-22 12:58:12 -03:00
workspace_service.go daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
workspace_test.go feat(vm): add vm exec command with workspace dirty detection 2026-04-26 23:53:45 -03:00