banger/internal/updater/manifest.go
Thales Maciel 2606bfbabb
update: VMs survive banger update and rollback
Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.

Bug fixes:

1. Disable the firecracker SDK's signal-forwarding goroutine. The
   default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
   SIGABRT] installs a handler in the helper that propagates the
   helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
   root.service`) to every running firecracker child. Set
   ForwardSignals to an empty (non-nil) slice so setupSignals
   short-circuits at len()==0.

2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
   limits the initial SIGTERM to the helper main, but systemd
   still SIGKILLs leftover cgroup processes during the
   FinalKillSignal stage unless SendSIGKILL=no.

3. Route restart-helper / restart-daemon / wait-daemon-ready
   failures through rollbackAndRestart instead of rollbackAndWrap.
   rollbackAndWrap restored .previous binaries but didn't re-
   restart the failed unit, leaving the helper dead with the
   rolled-back binary on disk after a failed update.

Testing infrastructure (production binaries unaffected):

- Hidden --manifest-url and --pubkey-file flags on `banger update`
  let the smoke harness redirect the updater at locally-built
  release artefacts. Marked Hidden in cobra; not advertised in
  --help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
  FetchAndVerifySignatureWithKey export the existing logic against
  caller-supplied URL / pubkey. The default entry points still
  call them with the embedded canonical values.

Smoke scenarios:

- update_check: --check against fake manifest reports update
  available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
  boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
  bangerd that passes --check-migrations but exits 1 as the
  daemon. The post-swap `systemctl restart bangerd` fails,
  rollbackAndRestart fires, the .previous binaries are restored
  and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
  `bangerd --check-migrations --system`, `sudo banger daemon
  stop`

The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:08 -03:00

177 lines
6.9 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// Package updater drives `banger update`: discover a new release,
// download + verify it, swap binaries atomically, restart the systemd
// units, run doctor, roll back on failure. The package is split across
// files by responsibility — manifest.go owns the release-discovery
// shape, the rest is in their own files.
package updater
import (
"context"
"encoding/json"
"fmt"
"io"
"net/http"
"strings"
"time"
)
// manifestURL is the canonical URL of banger's release manifest on
// the Cloudflare R2 bucket. Hardcoded (rather than pulling from
// config) so a compromised daemon config can't redirect the updater
// to a different bucket. Var (not const) only because tests need to
// point at an httptest.Server; production never mutates it.
//
// The bucket lives at releases.thaloco.com; the path /banger/ scopes
// it inside the bucket so the same host can serve other projects'
// release artifacts later.
var manifestURL = "https://releases.thaloco.com/banger/manifest.json"
// ManifestURL exposes the configured URL for callers that want to
// surface it in user-facing output (e.g. `banger update --check`).
func ManifestURL() string { return manifestURL }
// MaxManifestBytes caps the manifest download size. The manifest is
// JSON with a small bounded shape (10s of releases × ~200 bytes
// each); 1 MiB is generous and protects us from a server that
// accidentally serves an arbitrary file.
const MaxManifestBytes int64 = 1 << 20
// MaxSHA256SumsBytes caps the SHA256SUMS download. One line per
// release artifact (today: one line for the tarball); 16 KiB is
// orders of magnitude over what we'd ever publish.
const MaxSHA256SumsBytes int64 = 16 * 1024
// MaxTarballBytes caps the release-tarball download. Banger's three
// binaries plus a SHA256SUMS file fit comfortably under this; if a
// future release approaches the cap, bump intentionally and ship a
// note in CHANGELOG.
const MaxTarballBytes int64 = 256 * 1024 * 1024
// Manifest is the top-level shape of releases.thaloco.com/banger/manifest.json.
// SchemaVersion lets us evolve the structure without breaking older
// CLIs — a CLI that doesn't recognise its current SchemaVersion
// refuses to update rather than guessing.
type Manifest struct {
SchemaVersion int `json:"schema_version"`
LatestStable string `json:"latest_stable"`
Releases []Release `json:"releases"`
}
// Release describes one published banger build. The tarball + the
// SHA256SUMS file (and optionally its cosign signature) live at the
// URLs listed here; the actual binary hashes come from SHA256SUMS,
// not from the manifest, so manifest tampering can't substitute a
// hash for a known-good tarball.
type Release struct {
Version string `json:"version"`
TarballURL string `json:"tarball_url"`
SHA256SumsURL string `json:"sha256sums_url"`
SHA256SumsSigURL string `json:"sha256sums_sig_url,omitempty"`
ReleasedAt time.Time `json:"released_at"`
}
// ManifestSchemaVersion is the SchemaVersion this CLI knows how to
// parse. Bumped together with any breaking change in Manifest /
// Release.
const ManifestSchemaVersion = 1
// FetchManifest downloads the release manifest from the embedded
// canonical URL and validates its shape. Returns an error if the
// server is unreachable, returns non-2xx, exceeds the size cap, or
// the schema_version is newer than this CLI knows.
func FetchManifest(ctx context.Context, client *http.Client) (Manifest, error) {
return FetchManifestFrom(ctx, client, manifestURL)
}
// FetchManifestFrom is FetchManifest against an explicit URL. Used by
// the smoke suite (via `banger update --manifest-url …`) to drive the
// updater against a locally-served fake manifest. Production callers
// stick with FetchManifest.
func FetchManifestFrom(ctx context.Context, client *http.Client, url string) (Manifest, error) {
if client == nil {
client = http.DefaultClient
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return Manifest{}, err
}
resp, err := client.Do(req)
if err != nil {
return Manifest{}, fmt.Errorf("fetch manifest: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return Manifest{}, fmt.Errorf("fetch manifest: HTTP %s", resp.Status)
}
if resp.ContentLength > MaxManifestBytes {
return Manifest{}, fmt.Errorf("manifest is %d bytes, exceeds %d-byte cap", resp.ContentLength, MaxManifestBytes)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, MaxManifestBytes+1))
if err != nil {
return Manifest{}, fmt.Errorf("read manifest: %w", err)
}
if int64(len(body)) > MaxManifestBytes {
return Manifest{}, fmt.Errorf("manifest body exceeded %d-byte cap", MaxManifestBytes)
}
return ParseManifest(body)
}
// ParseManifest unmarshals manifest bytes and validates the schema
// version. Exposed as a separate function so tests can drive it
// without an HTTP server.
func ParseManifest(body []byte) (Manifest, error) {
var m Manifest
if err := json.Unmarshal(body, &m); err != nil {
return Manifest{}, fmt.Errorf("parse manifest: %w", err)
}
if m.SchemaVersion == 0 {
return Manifest{}, fmt.Errorf("manifest missing schema_version")
}
if m.SchemaVersion > ManifestSchemaVersion {
return Manifest{}, fmt.Errorf("manifest schema_version %d is newer than this CLI knows (%d); upgrade banger to read it", m.SchemaVersion, ManifestSchemaVersion)
}
if strings.TrimSpace(m.LatestStable) == "" && len(m.Releases) > 0 {
return Manifest{}, fmt.Errorf("manifest missing latest_stable")
}
for i, r := range m.Releases {
if strings.TrimSpace(r.Version) == "" {
return Manifest{}, fmt.Errorf("release[%d]: empty version", i)
}
if strings.TrimSpace(r.TarballURL) == "" {
return Manifest{}, fmt.Errorf("release[%d] (%s): empty tarball_url", i, r.Version)
}
if strings.TrimSpace(r.SHA256SumsURL) == "" {
return Manifest{}, fmt.Errorf("release[%d] (%s): empty sha256sums_url", i, r.Version)
}
}
return m, nil
}
// LookupRelease finds the release with the given version (e.g.
// "v0.1.0") in the manifest. Returns an error when no match exists —
// helpful when a user passes `--to v9.9.9` against a manifest that
// hasn't seen v9.9.9 yet.
func (m Manifest) LookupRelease(version string) (Release, error) {
wanted := strings.TrimSpace(version)
if wanted == "" {
return Release{}, fmt.Errorf("version is required")
}
for _, r := range m.Releases {
if r.Version == wanted {
return r, nil
}
}
available := make([]string, 0, len(m.Releases))
for _, r := range m.Releases {
available = append(available, r.Version)
}
return Release{}, fmt.Errorf("release %q not found in manifest (available: %s)", wanted, strings.Join(available, ", "))
}
// Latest returns the release matching the manifest's latest_stable
// pointer. Errors when the pointer doesn't reference any listed
// release — that's a manifest publishing bug worth surfacing rather
// than silently picking some other release.
func (m Manifest) Latest() (Release, error) {
return m.LookupRelease(m.LatestStable)
}