Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.
Bug fixes:
1. Disable the firecracker SDK's signal-forwarding goroutine. The
default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
SIGABRT] installs a handler in the helper that propagates the
helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
root.service`) to every running firecracker child. Set
ForwardSignals to an empty (non-nil) slice so setupSignals
short-circuits at len()==0.
2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
limits the initial SIGTERM to the helper main, but systemd
still SIGKILLs leftover cgroup processes during the
FinalKillSignal stage unless SendSIGKILL=no.
3. Route restart-helper / restart-daemon / wait-daemon-ready
failures through rollbackAndRestart instead of rollbackAndWrap.
rollbackAndWrap restored .previous binaries but didn't re-
restart the failed unit, leaving the helper dead with the
rolled-back binary on disk after a failed update.
Testing infrastructure (production binaries unaffected):
- Hidden --manifest-url and --pubkey-file flags on `banger update`
let the smoke harness redirect the updater at locally-built
release artefacts. Marked Hidden in cobra; not advertised in
--help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
FetchAndVerifySignatureWithKey export the existing logic against
caller-supplied URL / pubkey. The default entry points still
call them with the embedded canonical values.
Smoke scenarios:
- update_check: --check against fake manifest reports update
available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
bangerd that passes --check-migrations but exits 1 as the
daemon. The post-swap `systemctl restart bangerd` fails,
rollbackAndRestart fires, the .previous binaries are restored
and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
`bangerd --check-migrations --system`, `sudo banger daemon
stop`
The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After `banger update` swaps binaries, install.toml needs to reflect
the just-installed identity. The previous code passed
buildinfo.Current().{Commit,BuiltAt} into installmeta.UpdateBuildInfo
— but buildinfo.Current() in the running CLI is the OLD pre-swap
binary's identity (we're it), not the staged one. install.toml's
version field got refreshed to target.Version while commit and
built_at stayed pinned at the previous release. `banger doctor`
compares the running CLI's three fields against install.toml's
three fields and so raised a false-positive drift warning on
every update.
Fix: after the swap, exec /usr/local/bin/banger version, parse the
three-line output, and write all three fields to install.toml. If
the exec fails for any reason we fall back to the old behaviour
(version + stale commit/built_at) with a warning, since install.toml
drift is a doctor warning not a broken host — same posture as
before for the failure path.
The parser is split out (parseVersionOutput) and table-tested:
happy path, whitespace-tolerance, missing-field rejection, empty
input rejection, ignoring unrelated lines.
Caught by running v0.1.0 → v0.1.1 live as the first end-to-end
smoke test of the self-update flow, which was the whole point of
that exercise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the v0.1.0 cosign requirement. Every banger update download
now goes through ECDSA-P256 verification before any binary is
trusted: SHA256SUMS.sig is fetched, base64-decoded, and verified
against the embedded BangerReleasePublicKey.
* BangerReleasePublicKey: PEM-encoded ECDSA public key embedded
at compile time. The current value is a sentinel PLACEHOLDER —
the maintainer must replace it with the output of
`cosign generate-key-pair`'s cosign.pub before cutting v0.1.0,
and re-cut. Until they do, every `banger update` refuses with
ErrSignatureRequired ("the maintainer must replace it and
re-cut a release before update can proceed"). Loud refusal
beats silent acceptance.
* VerifyBlobSignature: parses the embedded public key, base64-
decodes the signature, computes SHA256(body), runs ecdsa
.VerifyASN1. cosign sign-blob produces the format
VerifyASN1 verifies natively (ASN.1-DER encoded ECDSA over
a SHA256 digest), so no third-party crypto deps needed.
* FetchAndVerifySignature: pulls the signature URL from the
release manifest entry, fetches it (1 KiB cap), and verifies
against sumsBody. Refuses outright when sha256sums_sig_url is
empty — v0.1.0 contract requires every release to be signed,
and an unsigned release is a manifest publishing bug we'd
rather catch loudly than silently accept.
* Wired into banger update: sumsBody captured from
DownloadRelease, immediately fed into FetchAndVerifySignature.
A failed verification removes the staged tarball before
returning so it can't be reused.
* BangerReleasePublicKey is var (not const) only to support tests
that swap in a generated keypair; production sets it at compile
time and never mutates it.
Tests: placeholder-key path returns ErrSignatureRequired; happy
path with a fresh in-test ECDSA keypair verifies a real
sign-then-verify; tampered body, wrong key, and three malformed
signature shapes (not-base64, empty, garbage-DER) all reject.
Maintainer-cut workflow documented in BangerReleasePublicKey's
comment: cosign generate-key-pair → paste cosign.pub into the
constant → at release time, cosign sign-blob --key cosign.key
SHA256SUMS > SHA256SUMS.sig and publish.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires updater + the existing system-install helpers into a single
operator-facing flow:
1. FetchManifest, resolve target release (default: latest_stable;
override with --to vX.Y.Z).
2. --check exits with a one-line "up to date" / "update available".
Same as `banger update --check` style for tools polling on a
timer.
3. requireRoot beyond this point — we're about to write
/usr/local/bin and talk to systemctl.
4. daemon.operations.list → refuse if any operation isn't Done.
--force overrides; per the v0.1.0 plan there's no drain wait.
5. PrepareCleanStaging + DownloadRelease + StageTarball into
/var/cache/banger/updates/.
6. Sanity-run the staged binaries: `banger --version` must mention
the expected version; `bangerd --check-migrations --system`
must exit 0 (compatible) or 1 (will auto-migrate). Exit 2
(incompatible) aborts before the swap.
7. --dry-run stops here with a one-line plan, leaves staging.
8. Swap (vsock → bangerd → banger) → restart bangerd-root then
bangerd → waitForDaemonReady on the system socket.
9. Run `banger doctor` against the JUST-INSTALLED CLI binary
(not d.doctor in-process — we want to exercise the new binary
end-to-end). FAIL triggers auto-rollback: restore .previous
backups, restart services, surface the original failure with
"(rolled back to previous install)".
10. UpdateBuildInfo on /etc/banger/install.toml. CleanupBackups.
Wipe staging dir.
rollbackAndWrap / rollbackAndRestart split: the former is for
failures BEFORE the systemctl restart (old binaries are still on
disk under .previous; the OLD daemon is still running because the
restart never happened). The latter is for failures AFTER, where
rollback ALSO needs another systemctl restart so the OLD versions
take over again. If even rollback's restart fails, we surface
everything we know — the install is broken and the operator gets
the breadcrumbs to fix it manually.
Existing TestNewBangerCommandHasExpectedSubcommands updated to
include "update" in the expected ordering.
Live exercise against the empty bucket today errors as expected:
$ banger update --check
banger: discover: fetch manifest: HTTP 404 Not Found # exit 1
once the user publishes the first manifest the same command will
report "up to date" or "update available".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>