Commit graph

4 commits

Author SHA1 Message Date
2606bfbabb
update: VMs survive banger update and rollback
Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.

Bug fixes:

1. Disable the firecracker SDK's signal-forwarding goroutine. The
   default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
   SIGABRT] installs a handler in the helper that propagates the
   helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
   root.service`) to every running firecracker child. Set
   ForwardSignals to an empty (non-nil) slice so setupSignals
   short-circuits at len()==0.

2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
   limits the initial SIGTERM to the helper main, but systemd
   still SIGKILLs leftover cgroup processes during the
   FinalKillSignal stage unless SendSIGKILL=no.

3. Route restart-helper / restart-daemon / wait-daemon-ready
   failures through rollbackAndRestart instead of rollbackAndWrap.
   rollbackAndWrap restored .previous binaries but didn't re-
   restart the failed unit, leaving the helper dead with the
   rolled-back binary on disk after a failed update.

Testing infrastructure (production binaries unaffected):

- Hidden --manifest-url and --pubkey-file flags on `banger update`
  let the smoke harness redirect the updater at locally-built
  release artefacts. Marked Hidden in cobra; not advertised in
  --help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
  FetchAndVerifySignatureWithKey export the existing logic against
  caller-supplied URL / pubkey. The default entry points still
  call them with the embedded canonical values.

Smoke scenarios:

- update_check: --check against fake manifest reports update
  available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
  boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
  bangerd that passes --check-migrations but exits 1 as the
  daemon. The post-swap `systemctl restart bangerd` fails,
  rollbackAndRestart fires, the .previous binaries are restored
  and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
  `bangerd --check-migrations --system`, `sudo banger daemon
  stop`

The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:08 -03:00
d867d61eb3
update: refresh install.toml commit + built_at from new binary
After `banger update` swaps binaries, install.toml needs to reflect
the just-installed identity. The previous code passed
buildinfo.Current().{Commit,BuiltAt} into installmeta.UpdateBuildInfo
— but buildinfo.Current() in the running CLI is the OLD pre-swap
binary's identity (we're it), not the staged one. install.toml's
version field got refreshed to target.Version while commit and
built_at stayed pinned at the previous release. `banger doctor`
compares the running CLI's three fields against install.toml's
three fields and so raised a false-positive drift warning on
every update.

Fix: after the swap, exec /usr/local/bin/banger version, parse the
three-line output, and write all three fields to install.toml. If
the exec fails for any reason we fall back to the old behaviour
(version + stale commit/built_at) with a warning, since install.toml
drift is a doctor warning not a broken host — same posture as
before for the failure path.

The parser is split out (parseVersionOutput) and table-tested:
happy path, whitespace-tolerance, missing-field rejection, empty
input rejection, ignoring unrelated lines.

Caught by running v0.1.0 → v0.1.1 live as the first end-to-end
smoke test of the self-update flow, which was the whole point of
that exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:38:59 -03:00
8ed351ea47
updater: cosign-blob signature verification on SHA256SUMS
Closes the v0.1.0 cosign requirement. Every banger update download
now goes through ECDSA-P256 verification before any binary is
trusted: SHA256SUMS.sig is fetched, base64-decoded, and verified
against the embedded BangerReleasePublicKey.

  * BangerReleasePublicKey: PEM-encoded ECDSA public key embedded
    at compile time. The current value is a sentinel PLACEHOLDER —
    the maintainer must replace it with the output of
    `cosign generate-key-pair`'s cosign.pub before cutting v0.1.0,
    and re-cut. Until they do, every `banger update` refuses with
    ErrSignatureRequired ("the maintainer must replace it and
    re-cut a release before update can proceed"). Loud refusal
    beats silent acceptance.
  * VerifyBlobSignature: parses the embedded public key, base64-
    decodes the signature, computes SHA256(body), runs ecdsa
    .VerifyASN1. cosign sign-blob produces the format
    VerifyASN1 verifies natively (ASN.1-DER encoded ECDSA over
    a SHA256 digest), so no third-party crypto deps needed.
  * FetchAndVerifySignature: pulls the signature URL from the
    release manifest entry, fetches it (1 KiB cap), and verifies
    against sumsBody. Refuses outright when sha256sums_sig_url is
    empty — v0.1.0 contract requires every release to be signed,
    and an unsigned release is a manifest publishing bug we'd
    rather catch loudly than silently accept.
  * Wired into banger update: sumsBody captured from
    DownloadRelease, immediately fed into FetchAndVerifySignature.
    A failed verification removes the staged tarball before
    returning so it can't be reused.
  * BangerReleasePublicKey is var (not const) only to support tests
    that swap in a generated keypair; production sets it at compile
    time and never mutates it.

Tests: placeholder-key path returns ErrSignatureRequired; happy
path with a fresh in-test ECDSA keypair verifies a real
sign-then-verify; tampered body, wrong key, and three malformed
signature shapes (not-base64, empty, garbage-DER) all reject.

Maintainer-cut workflow documented in BangerReleasePublicKey's
comment: cosign generate-key-pair → paste cosign.pub into the
constant → at release time, cosign sign-blob --key cosign.key
SHA256SUMS > SHA256SUMS.sig and publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:37:53 -03:00
92ca1aa96f
cli: add banger update command
Wires updater + the existing system-install helpers into a single
operator-facing flow:

  1. FetchManifest, resolve target release (default: latest_stable;
     override with --to vX.Y.Z).
  2. --check exits with a one-line "up to date" / "update available".
     Same as `banger update --check` style for tools polling on a
     timer.
  3. requireRoot beyond this point — we're about to write
     /usr/local/bin and talk to systemctl.
  4. daemon.operations.list → refuse if any operation isn't Done.
     --force overrides; per the v0.1.0 plan there's no drain wait.
  5. PrepareCleanStaging + DownloadRelease + StageTarball into
     /var/cache/banger/updates/.
  6. Sanity-run the staged binaries: `banger --version` must mention
     the expected version; `bangerd --check-migrations --system`
     must exit 0 (compatible) or 1 (will auto-migrate). Exit 2
     (incompatible) aborts before the swap.
  7. --dry-run stops here with a one-line plan, leaves staging.
  8. Swap (vsock → bangerd → banger) → restart bangerd-root then
     bangerd → waitForDaemonReady on the system socket.
  9. Run `banger doctor` against the JUST-INSTALLED CLI binary
     (not d.doctor in-process — we want to exercise the new binary
     end-to-end). FAIL triggers auto-rollback: restore .previous
     backups, restart services, surface the original failure with
     "(rolled back to previous install)".
  10. UpdateBuildInfo on /etc/banger/install.toml. CleanupBackups.
     Wipe staging dir.

rollbackAndWrap / rollbackAndRestart split: the former is for
failures BEFORE the systemctl restart (old binaries are still on
disk under .previous; the OLD daemon is still running because the
restart never happened). The latter is for failures AFTER, where
rollback ALSO needs another systemctl restart so the OLD versions
take over again. If even rollback's restart fails, we surface
everything we know — the install is broken and the operator gets
the breadcrumbs to fix it manually.

Existing TestNewBangerCommandHasExpectedSubcommands updated to
include "update" in the expected ordering.

Live exercise against the empty bucket today errors as expected:
$ banger update --check
banger: discover: fetch manifest: HTTP 404 Not Found  # exit 1
once the user publishes the first manifest the same command will
report "up to date" or "update available".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:35:04 -03:00