banger/internal
Thales Maciel 2606bfbabb
update: VMs survive banger update and rollback
Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.

Bug fixes:

1. Disable the firecracker SDK's signal-forwarding goroutine. The
   default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
   SIGABRT] installs a handler in the helper that propagates the
   helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
   root.service`) to every running firecracker child. Set
   ForwardSignals to an empty (non-nil) slice so setupSignals
   short-circuits at len()==0.

2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
   limits the initial SIGTERM to the helper main, but systemd
   still SIGKILLs leftover cgroup processes during the
   FinalKillSignal stage unless SendSIGKILL=no.

3. Route restart-helper / restart-daemon / wait-daemon-ready
   failures through rollbackAndRestart instead of rollbackAndWrap.
   rollbackAndWrap restored .previous binaries but didn't re-
   restart the failed unit, leaving the helper dead with the
   rolled-back binary on disk after a failed update.

Testing infrastructure (production binaries unaffected):

- Hidden --manifest-url and --pubkey-file flags on `banger update`
  let the smoke harness redirect the updater at locally-built
  release artefacts. Marked Hidden in cobra; not advertised in
  --help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
  FetchAndVerifySignatureWithKey export the existing logic against
  caller-supplied URL / pubkey. The default entry points still
  call them with the embedded canonical values.

Smoke scenarios:

- update_check: --check against fake manifest reports update
  available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
  boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
  bangerd that passes --check-migrations but exits 1 as the
  daemon. The post-swap `systemctl restart bangerd` fails,
  rollbackAndRestart fires, the .previous binaries are restored
  and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
  `bangerd --check-migrations --system`, `sudo banger daemon
  stop`

The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:08 -03:00
..
api opstate,daemon: list in-flight operations via daemon.operations.list 2026-04-28 18:14:57 -03:00
buildinfo Stamp shared build metadata into banger binaries 2026-03-22 17:14:06 -03:00
cli update: VMs survive banger update and rollback 2026-05-01 12:08:08 -03:00
config firecracker: adopt firecracker-jailer for VM launch (Phase B) 2026-04-28 14:38:07 -03:00
daemon Survive banger update with running VMs 2026-04-29 17:09:15 -03:00
download download: shared FetchVerified helper for capped + hashed downloads 2026-04-28 18:44:27 -03:00
firecracker update: VMs survive banger update and rollback 2026-05-01 12:08:08 -03:00
guest ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
guestconfig Refactor VM lifecycle around capabilities 2026-03-18 19:28:26 -03:00
guestnet Stop using kernel IP autoconfig for runtime VMs 2026-03-21 21:54:18 -03:00
hostnat coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
imagecat imagecat,kernelcat: bound staged download, hash before extract 2026-04-28 16:09:55 -03:00
imagepull imagepull: reject symlink ancestors during OCI flatten 2026-04-28 15:20:46 -03:00
installmeta test: add installmeta tests 2026-04-30 10:49:22 -03:00
kernelcat imagecat,kernelcat: bound staged download, hash before extract 2026-04-28 16:09:55 -03:00
model model,cli,docs: medium-effort polish for v0.1.0 2026-04-28 17:36:03 -03:00
namegen coverage: make targets + close zero-cov gaps (namegen, sessionstream) 2026-04-18 17:44:37 -03:00
paths daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
policy Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00
roothelper cli,docs: trivial polish for v0.1.0 2026-04-28 17:31:54 -03:00
rpc daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
store store,bangerd: add --check-migrations flag for pre-swap schema check 2026-04-28 18:41:31 -03:00
system system: add AtomicReplace + Rollback for binary swap 2026-04-28 18:43:04 -03:00
toolingplan coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
updater update: VMs survive banger update and rollback 2026-05-01 12:08:08 -03:00
vmdns coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vsockagent Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00