Commit graph

326 commits

Author SHA1 Message Date
ae3466b944
release: prep v0.1.10 changelog
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:08:48 -03:00
3dceacd40a
readme: add demo gif
Recording script committed at assets/demo.tape — renders with
charmbracelet/vhs against a real Linux+Firecracker host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 18:08:42 -03:00
f1b17f6f8e
install: surface ssh-config --install in next-steps blurb
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:57:26 -03:00
05439d2325
daemon: cut vm stop latency
Three changes to stopVMLocked, biggest win first:

- Skip waitForExit on the SSH-success path. sync inside the guest
  already flushed root.ext4, so cleanupRuntime's SIGKILL is safe
  immediately. Saves up to gracefulShutdownWait (10s) per stop.
- Drop the SendCtrlAltDel + 10s wait fallback when SSH is
  unreachable. On Debian, ctrl+alt+del routes to reboot.target so
  FC never exits on it — the wait was pure latency.
- Shrink the SSH dial timeout 5s → 2s. A reachable guest dials in
  single-digit milliseconds; if it doesn't, fail fast and SIGKILL.

Worst-case (broken SSH) goes ~15s → ~2s + cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:51:22 -03:00
c352aba50a
daemon: parallelize tap-pool warmup
Pool warmup ran createTap calls sequentially (one per loop iteration),
so warming N taps cold took N times the per-tap cost. Each releaseTap
also fired its own ensureTapPool goroutine, racing on n.tapPool.next.

Reserve a batch of names under the lock, then run up to
maxConcurrentTapWarmup createTap RPCs in parallel — root helper already
handles each connection in its own goroutine, so multiple in-flight
priv.create_tap requests don't contend at the wire level. Add a
warming flag to dedupe concurrent ensureTapPool invocations triggered
by parallel releases.

Bail-on-first-error semantics preserved: if every goroutine in a
batch fails (e.g. host out of taps, kernel limit), the loop exits
rather than burning monotonic indices forever.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 15:54:07 -03:00
71e073ac49
fix: land .hushlogin on work disk so vm run is quiet
The work disk mounts at /root, so the .hushlogin written to the
rootfs overlay was shadowed and never reached the guest — pam_motd
kept printing the Debian banner on `banger vm run`. Move the write
to the work disk root inode (= /root in the guest) and run it from
PrepareHost so existing VMs pick it up on next start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 14:39:46 -03:00
696593b365
release: prep v0.1.9 changelog
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:38:27 -03:00
9ed44bfd75
port smoke to go 2026-05-01 19:34:44 -03:00
b0a9d64f4a
fix: drop /root/repo fallback in vm exec for unbound VMs
vm exec defaulted execGuestPath to /root/repo whenever the VM had no
recorded workspace, so running it against a plain VM (one that never
had vm workspace prepare / vm run ./repo) blew up with
'cd: /root/repo: No such file or directory' — surfaced via the login
shell's mise activate hook because bash -lc sources profile.d before
the explicit cd. Now auto-cd only fires when --guest-path is passed
or the VM actually has a workspace recorded; otherwise the command
runs from root's home. Mise wrapping unchanged — without a .mise.toml
it's a no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:06:46 -03:00
9400bab6fd
fix: accept host:port in validateResolverAddr; release v0.1.8
The root helper's resolver-address validator only accepted bare IPs,
so `resolvectl dns <bridge> 127.0.0.1:42069` — banger's own auto-wire
call to point systemd-resolved at the in-process DNS server — was
rejected before it ever reached resolvectl. The auto-wire is
best-effort and only logs a warning on failure, so .vm resolution
silently broke on the NSS path: dig @127.0.0.1 worked, curl <vm>.vm
didn't.

Validator now allows both bare IPs and IP:port (matching what
`resolvectl dns` itself accepts), with new test coverage for the
port'd form.

Existing installs need a one-time `sudo banger system restart` after
updating to v0.1.8 so the daemon re-runs the auto-wire with the
fixed validator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:42:11 -03:00
403f60dbbf
update README.md 2026-05-01 15:24:31 -03:00
b736702260
release: prep v0.1.7 changelog
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:18:21 -03:00
b9b3505e34
smoke: cover -d/--detach and bootstrap NAT precondition
Two new pure scenarios:

* detach_run: -d --rm and -d -- <cmd> combos rejected before VM
  creation; bare -d leaves the VM running and ssh-able afterward.

* bootstrap_precondition: workspace with a .mise.toml is refused
  without --nat; --no-bootstrap bypasses the precondition and the
  run completes normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:05:27 -03:00
aaf49fc1b1
vm run: add -d/--detach + transparent tooling bootstrap
The mise tooling bootstrap was failing silently when --nat wasn't
set: the VM came up, the user landed in ssh, and tools were missing
with no obvious cause. Two coupled fixes:

* `-d`/`--detach`: create + prep + bootstrap, exit without attaching
  to ssh. Reconnect later with `banger vm ssh <name>`. Rejects the
  ambiguous combos `-d --rm` and `-d -- <cmd>`.

* NAT precondition: when the workspace has a .mise.toml or
  .tool-versions, vm run now refuses before VM creation if --nat
  isn't set. Error message points at --nat or --no-bootstrap.

* `--no-bootstrap`: explicit opt-out for users who want a vanilla
  VM with their workspace and no tooling install.

Detached bootstrap runs synchronously (foreground tee'd to the log
file) so the CLI only returns once installs finish. Interactive
mode keeps today's nohup'd background behaviour so the ssh session
starts promptly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:51:16 -03:00
9b5cbed32d
doctor: collapse healthy output to one line, add --verbose
A healthy host triggered ~20 PASS rows with details — too noisy for
the common case. Default now prints only fail/warn rows plus a
summary footer; an all-pass run collapses to a single line. Pass
--verbose / -v for the full per-check output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:18:09 -03:00
09a3ef812f
style: gofmt internal/firecracker/client.go
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:18:04 -03:00
59e58878ef
update README.md 2026-05-01 14:10:46 -03:00
759fa20602
docs: add release-process runbook
Captures the cut-and-publish workflow currently encoded only in
scripts/publish-banger-release.sh and the CHANGELOG patterns. Covers:

- Release artefacts + R2 paths + the install.sh-at-bucket-root
  contract.
- Trust model recap (cosign pubkey pinned in both verify_signature.go
  and scripts/install.sh; drift check enforced by the publish script).
- Pre-flight checklist: green smoke, CHANGELOG entry with the right
  Keep-a-Changelog headings, link-table bump, explicit callout when
  unit files changed (banger update swaps binaries, not units).
- Cut order: publish first, tag after, verify from a clean machine.
- Verification-release rule: any fix to runUpdate / unit templates /
  helper-daemon restart sequencing requires an immediate no-op +1
  release so a host on the buggy version can update to it and observe
  the fix live with the new binary in the driver seat. v0.1.3 and
  v0.1.5 are the existing examples.
- Patch vs minor: minor = exposed API/contract change (vsock guest-
  agent protocol, CLI flag removal, RPC shape, non-forward-compatible
  store schema); everything else is patch.
- Sibling catalogs: kernel + golden-image entries are go:embed-ed,
  so they piggyback on the next banger release.
- Mid-release recovery for signature drift, partial rclone, re-cut,
  and bad-tag cleanup (never reuse a version).

AGENTS.md gets a one-liner pointer so the maintainer guide surfaces
the runbook without duplicating it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:25:36 -03:00
02a1472dd4
test: cover absolutizePaths, lastID, runCheckMigrations
Adds focused unit tests for previously-uncovered cli helpers:

- TestAbsolutizePaths covers the path-vararg helper's empty,
  absolute, and relative branches; complements the existing
  TestAbsolutizeImageRegisterPaths.
- TestLastID is table-driven across nil/empty/sorted/unsorted/
  duplicates/negative inputs.
- TestRunCheckMigrations* exercises every Compatibility branch
  (compatible / migrations needed / incompatible / inspect
  error) by stubbing bangerdExit and pointing the layout at a
  temp-dir DB seeded directly with the schema_migrations table.
- TestNewBangerdCommandSubcommands pins the flag set against
  accidental drift.

Lifts internal/cli coverage from 71% to 76% combined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:19 -03:00
2606bfbabb
update: VMs survive banger update and rollback
Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.

Bug fixes:

1. Disable the firecracker SDK's signal-forwarding goroutine. The
   default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
   SIGABRT] installs a handler in the helper that propagates the
   helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
   root.service`) to every running firecracker child. Set
   ForwardSignals to an empty (non-nil) slice so setupSignals
   short-circuits at len()==0.

2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
   limits the initial SIGTERM to the helper main, but systemd
   still SIGKILLs leftover cgroup processes during the
   FinalKillSignal stage unless SendSIGKILL=no.

3. Route restart-helper / restart-daemon / wait-daemon-ready
   failures through rollbackAndRestart instead of rollbackAndWrap.
   rollbackAndWrap restored .previous binaries but didn't re-
   restart the failed unit, leaving the helper dead with the
   rolled-back binary on disk after a failed update.

Testing infrastructure (production binaries unaffected):

- Hidden --manifest-url and --pubkey-file flags on `banger update`
  let the smoke harness redirect the updater at locally-built
  release artefacts. Marked Hidden in cobra; not advertised in
  --help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
  FetchAndVerifySignatureWithKey export the existing logic against
  caller-supplied URL / pubkey. The default entry points still
  call them with the embedded canonical values.

Smoke scenarios:

- update_check: --check against fake manifest reports update
  available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
  boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
  bangerd that passes --check-migrations but exits 1 as the
  daemon. The post-swap `systemctl restart bangerd` fails,
  rollbackAndRestart fires, the .previous binaries are restored
  and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
  `bangerd --check-migrations --system`, `sudo banger daemon
  stop`

The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 12:08:08 -03:00
7e528f30b3
test: add installmeta tests 2026-04-30 10:49:22 -03:00
dea655ce95
docs: fix paths from local to system install 2026-04-30 10:49:10 -03:00
93ba233a12
simplify post install instructions 2026-04-30 10:41:06 -03:00
ae13f288e0
remove explicit stale version from readme 2026-04-30 10:39:47 -03:00
596dc67556
install.sh: expand the pre-sudo summary beyond just networking
The previous one-liner ("banger needs permission to manage network
access for the VMs you launch") was honest but understated; banger
also needs sudo for storage (rootfs snapshots, loop devices, image
files), launching/stopping firecracker under jailer isolation, and
installing binaries + systemd units. Spell those out as a short
bulleted list at the moment of decision so the user is authorising
a known scope rather than a euphemism.

Wording stays plain-language — no capability names, no jargon —
since the target audience may not know networking or container
terminology.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:25:16 -03:00
1be90a7af5
Preserve runtime dir across restart so reconcile re-finds VMs
v0.1.4 fixed the binary-level reconcile path for jailer'd VMs but
left a hole at the systemd layer: bangerd.service and bangerd-root.service
both defaulted to RuntimeDirectoryPreserve=no, so /run/banger was
wiped on every daemon stop. The api-sock symlinks the helper creates
for live VMs (`/run/banger/fc-<id>.sock` → `<chroot>/firecracker.socket`)
went with it, and findByJailerPidfile — which derives the chroot
from the symlink target — couldn't resolve them. Reconcile then fell
through to "stale_vm" and tore down the surviving FC's dm-snapshot.

Add RuntimeDirectoryPreserve=yes to both unit templates so the
symlinks survive the restart window. Live-verified end-to-end on
the dev host: started a VM under v0.1.5, restarted helper +
daemon, confirmed the FC PID was unchanged and `banger vm ssh`
returned the same boot_id pre and post.

Daemon-lifecycle tests updated to assert the new directive is
present in both rendered units so future regressions show up at
test time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:17:25 -03:00
e1acb0384b
CHANGELOG: v0.1.5 verification release
No code changes. Cuts a fresh release purely so a host on v0.1.4
can run `banger update` and confirm v0.1.4's running-VMs-survive
fix actually works when v0.1.4 is the code driving the update —
during the v0.1.3→v0.1.4 update the buggy v0.1.3 reconcile was
still in the driver seat and tore down the running VM as documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:12:13 -03:00
cec7291184
Survive banger update with running VMs
Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:

1. Unit templates set `KillMode=process` on bangerd.service and
   bangerd-root.service. The default control-group behaviour sent
   SIGKILL to every process in the cgroup on stop/restart — including
   jailer-spawned firecracker children, since fork/exec doesn't
   escape a systemd cgroup. With process mode only the unit's main
   PID is signalled; FC children stay alive in the (unowned)
   cgroup until the new helper instance starts up and re-claims them.

2. `fcproc.FindPID` falls back to the jailer-written pidfile at
   `<chroot>/firecracker.pid` (sibling of the api-sock target) when
   `pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
   jailer'd FCs because their cmdline only carries the chroot-relative
   `--api-sock /firecracker.socket`, not the host-side path. The
   pidfile is jailer's actual record of the post-exec FC PID, so
   reconcile can verify the surviving process is the right one
   (comm == "firecracker") and re-seed handles.json without tearing
   down the VM's dm-snapshot.

Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.

Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.

CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:09:15 -03:00
9c2e6a4647
CHANGELOG: v0.1.3 verification release
No code changes. Cuts a fresh release purely so a host on v0.1.2
can run `banger update` and confirm v0.1.2's install.toml-refresh
fix actually works when v0.1.2 is the code driving the update —
during the v0.1.1→v0.1.2 update the buggy v0.1.1 code was still
in the driver seat, so live verification of the fix needs one more
cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:45:35 -03:00
d867d61eb3
update: refresh install.toml commit + built_at from new binary
After `banger update` swaps binaries, install.toml needs to reflect
the just-installed identity. The previous code passed
buildinfo.Current().{Commit,BuiltAt} into installmeta.UpdateBuildInfo
— but buildinfo.Current() in the running CLI is the OLD pre-swap
binary's identity (we're it), not the staged one. install.toml's
version field got refreshed to target.Version while commit and
built_at stayed pinned at the previous release. `banger doctor`
compares the running CLI's three fields against install.toml's
three fields and so raised a false-positive drift warning on
every update.

Fix: after the swap, exec /usr/local/bin/banger version, parse the
three-line output, and write all three fields to install.toml. If
the exec fails for any reason we fall back to the old behaviour
(version + stale commit/built_at) with a warning, since install.toml
drift is a doctor warning not a broken host — same posture as
before for the failure path.

The parser is split out (parseVersionOutput) and table-tested:
happy path, whitespace-tolerance, missing-field rejection, empty
input rejection, ignoring unrelated lines.

Caught by running v0.1.0 → v0.1.1 live as the first end-to-end
smoke test of the self-update flow, which was the whole point of
that exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:38:59 -03:00
a0b5c7fa3c
CHANGELOG: v0.1.1 release notes
Captures the install.sh + BANGER_INSTALL_NONINTERACTIVE changes
that landed in 1004331 and 3c29af5. v0.1.1 is being cut now to
exercise the self-update path against a real released second
version — `banger update` has never run live before, only against
unit-test fixtures, so this release doubles as the smoke test of
the update flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:33:12 -03:00
1004331c14
install.sh: drop --user, add BANGER_INSTALL_NONINTERACTIVE env var
Surveyed the install scripts of comparable systemd-installing tools
(Docker, k3s, Tailscale, Ollama, Determinate Systems Nix, flyctl):
none of the daemon installers offer a --user staging mode, because
the resulting install isn't useful — banger inherits that. The
"--user just stages binaries you can't actually use yet" UX was a
trap; remove it before users hit it.

In its place, adopt the cross-tool convention for non-interactive
runs: the BANGER_INSTALL_NONINTERACTIVE=1 env var is friendlier
through a curl|bash pipe than `bash -s -- --yes` because the env
var can sit on the same line:

  curl -fsSL ...install.sh | env BANGER_INSTALL_NONINTERACTIVE=1 bash

The --yes flag still works for direct script invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:15:36 -03:00
3c29af55a2
Add curl|bash installer + wire upload into publish script
scripts/install.sh is the one-command installer end users run as

  curl -fsSL https://releases.thaloco.com/banger/install.sh | bash

Design choices:

* Runs as the invoking user. All network work + signature verification
  happens unprivileged; sudo is only re-execed for the actual install
  step that writes to /usr/local and creates systemd units.
* Right before the sudo prompt, the script prints a plain-language
  summary of exactly what's about to happen — the file paths it will
  create and a one-line "why sudo" — so the user authorises a known
  scope rather than the whole pipeline. Detail link in the docs.
* Uses openssl (universally available) for signature verification, not
  cosign. cosign is needed only by the *signer*, never the verifier.
* No jq dependency. The latest_stable field is extracted from the
  manifest with grep+sed, since the manifest shape is well-defined and
  we control it.
* /dev/tty fallback for the confirmation prompt so it works through
  the curl|bash pipe.
* --yes for non-interactive CI use, --user for installing into
  ~/.local/bin without touching system paths, --version vX.Y.Z to pin.

publish-banger-release.sh now uploads install.sh to the bucket root
on every publish, so the curl URL is stable but the script logic
matches the latest verified release. It also runs a key-drift check:
if scripts/install.sh's embedded cosign public key differs from the
one in internal/updater/verify_signature.go, publishing aborts. The
two copies must stay in sync or one of them ends up rejecting every
release.

README's Quick start now leads with the installer one-liner and
documents the audit-first variant alongside it; building from source
moves below.

Smoke-tested end to end against the live bucket with --user mode:
manifest fetch → tarball download → cosign signature verify → hash
verify → extract → install. The installed binary reports v0.1.0 at
commit 6fdebd9, matching the published artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:06:34 -03:00
d1c4619a01
Add CHANGELOG.md with v0.1.0 release notes
First-release changelog following the Keep a Changelog + SemVer
convention. The v0.1.0 section groups by capability area (sandbox
VMs, images, kernels, host networking, system install, self-update,
trust model, CLI surface) rather than by package, so it reads as
release notes for users deciding whether to install rather than as
a commit log. Includes a Compatibility section calling out the
informal vsock-protocol stability promise (stable across patches,
not minors) and the forward-only schema policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:45:44 -03:00
6fdebd929e
publish-script: split RCLONE_BUCKET out of BUCKET_PATH
The previous form passed rclone paths like releases:banger/v0.1.0/,
which rclone parses as bucket=banger, key=v0.1.0/... — wrong, because
the actual R2 bucket is named "releases" (BUCKET_PATH was meant as
an in-bucket key prefix only). Uploads 403'd because the token has
no view of a bucket called "banger".

Introduce RCLONE_BUCKET as a separate env var (default: "releases")
and route every rclone copy through ${RCLONE_REMOTE}:${RCLONE_BUCKET}/${BUCKET_PATH}.
The public URLs in the manifest stay unchanged: BASE_URL is the
bucket's public custom domain, so the bucket name is implicit there.

The defaults now resolve to the live setup:
  rclone target:  releases:releases/banger/<version>/<file>
  public URL:     https://releases.thaloco.com/banger/<version>/<file>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:35:53 -03:00
12f7a92bb4
publish-script: don't clobber COSIGN_PASSWORD with empty default
The previous form did

  COSIGN_PASSWORD="${COSIGN_PASSWORD:-}" cosign sign-blob ...

which set COSIGN_PASSWORD to "" when the caller hadn't exported one.
cosign sees an explicit empty password and tries to decrypt with
it instead of prompting interactively, so any real password-protected
offline key fails with "decryption failed".

Drop the prefix entirely. If COSIGN_PASSWORD is already in env, it
gets inherited normally; if it isn't, cosign prompts on the terminal
— which is the right UX for a maintainer running the publish script
locally with the offline private key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:27:23 -03:00
3d748b87c8
publish-script: fix pubkey extraction and cosign v3 compatibility
Two bugs found while dry-running the publish flow end-to-end:

1. The awk pipeline that pulled BangerReleasePublicKey out of
   verify_signature.go didn't strip Go's raw-string-literal wrapping
   (`var ... = ` + backtick on the BEGIN line, trailing backtick on
   the END line). The "verify against embedded pub key" step thus
   compared sigs against a malformed PEM. Replaced with a sed pair
   that yields a clean PEM block byte-identical to cosign.pub.

2. cosign v3.x defaults sign-blob to a new bundle format and
   pushes signatures to Rekor; both are incompatible with banger's
   "embedded pub key, raw ASN.1 DER signature" trust model.
   Add --use-signing-config=false / --tlog-upload=false /
   --new-bundle-format=false to opt out, and --insecure-ignore-tlog
   on verify-blob. These flags also work on cosign v2.x, so the
   script is forward- and backward-compatible across the v2→v3
   boundary.

Validated by an end-to-end dry-run on this machine: built binaries,
tarred, sha256summed, cosign-signed, verified against the embedded
pub key, then re-verified through internal/updater's
crypto/ecdsa.VerifyASN1 path — all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 13:23:09 -03:00
b7c9661c99
updater: embed real cosign public key for v0.1.0 release signing
The placeholder in BangerReleasePublicKey is replaced with the
production cosign public key (P-256 ECDSA). The matching private
key is stored offline by the maintainer; this is the public half
that every banger CLI baked from this commit forward will use to
verify SHA256SUMS signatures.

cosign.pub is also committed at the repo root so external auditors
can re-verify a release without parsing the Go source.

The placeholder-refuses test now swaps the embedded key for a
synthetic placeholder for the duration of the test, since the
default value is no longer a placeholder.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:50:52 -03:00
fae28e3d8b
update: docs + publish script for the self-update feature
README gets a top-level Updating section; docs/privileges.md gains
a step-by-step trust-model writeup of `banger update`. The new
scripts/publish-banger-release.sh drives the manual release cut:
build, tar, sha256sum, cosign sign-blob, verify against the embedded
public key, jq-merge into manifest.json, rclone upload to the R2
bucket. Refuses outright if the embedded key is still the placeholder
so we can't accidentally publish an unverifiable release. Also folds
in gofmt drift accumulated across the updater package and a few
sibling files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:43:46 -03:00
8ed351ea47
updater: cosign-blob signature verification on SHA256SUMS
Closes the v0.1.0 cosign requirement. Every banger update download
now goes through ECDSA-P256 verification before any binary is
trusted: SHA256SUMS.sig is fetched, base64-decoded, and verified
against the embedded BangerReleasePublicKey.

  * BangerReleasePublicKey: PEM-encoded ECDSA public key embedded
    at compile time. The current value is a sentinel PLACEHOLDER —
    the maintainer must replace it with the output of
    `cosign generate-key-pair`'s cosign.pub before cutting v0.1.0,
    and re-cut. Until they do, every `banger update` refuses with
    ErrSignatureRequired ("the maintainer must replace it and
    re-cut a release before update can proceed"). Loud refusal
    beats silent acceptance.
  * VerifyBlobSignature: parses the embedded public key, base64-
    decodes the signature, computes SHA256(body), runs ecdsa
    .VerifyASN1. cosign sign-blob produces the format
    VerifyASN1 verifies natively (ASN.1-DER encoded ECDSA over
    a SHA256 digest), so no third-party crypto deps needed.
  * FetchAndVerifySignature: pulls the signature URL from the
    release manifest entry, fetches it (1 KiB cap), and verifies
    against sumsBody. Refuses outright when sha256sums_sig_url is
    empty — v0.1.0 contract requires every release to be signed,
    and an unsigned release is a manifest publishing bug we'd
    rather catch loudly than silently accept.
  * Wired into banger update: sumsBody captured from
    DownloadRelease, immediately fed into FetchAndVerifySignature.
    A failed verification removes the staged tarball before
    returning so it can't be reused.
  * BangerReleasePublicKey is var (not const) only to support tests
    that swap in a generated keypair; production sets it at compile
    time and never mutates it.

Tests: placeholder-key path returns ErrSignatureRequired; happy
path with a fresh in-test ECDSA keypair verifies a real
sign-then-verify; tampered body, wrong key, and three malformed
signature shapes (not-base64, empty, garbage-DER) all reject.

Maintainer-cut workflow documented in BangerReleasePublicKey's
comment: cosign generate-key-pair → paste cosign.pub into the
constant → at release time, cosign sign-blob --key cosign.key
SHA256SUMS > SHA256SUMS.sig and publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:37:53 -03:00
92ca1aa96f
cli: add banger update command
Wires updater + the existing system-install helpers into a single
operator-facing flow:

  1. FetchManifest, resolve target release (default: latest_stable;
     override with --to vX.Y.Z).
  2. --check exits with a one-line "up to date" / "update available".
     Same as `banger update --check` style for tools polling on a
     timer.
  3. requireRoot beyond this point — we're about to write
     /usr/local/bin and talk to systemctl.
  4. daemon.operations.list → refuse if any operation isn't Done.
     --force overrides; per the v0.1.0 plan there's no drain wait.
  5. PrepareCleanStaging + DownloadRelease + StageTarball into
     /var/cache/banger/updates/.
  6. Sanity-run the staged binaries: `banger --version` must mention
     the expected version; `bangerd --check-migrations --system`
     must exit 0 (compatible) or 1 (will auto-migrate). Exit 2
     (incompatible) aborts before the swap.
  7. --dry-run stops here with a one-line plan, leaves staging.
  8. Swap (vsock → bangerd → banger) → restart bangerd-root then
     bangerd → waitForDaemonReady on the system socket.
  9. Run `banger doctor` against the JUST-INSTALLED CLI binary
     (not d.doctor in-process — we want to exercise the new binary
     end-to-end). FAIL triggers auto-rollback: restore .previous
     backups, restart services, surface the original failure with
     "(rolled back to previous install)".
  10. UpdateBuildInfo on /etc/banger/install.toml. CleanupBackups.
     Wipe staging dir.

rollbackAndWrap / rollbackAndRestart split: the former is for
failures BEFORE the systemctl restart (old binaries are still on
disk under .previous; the OLD daemon is still running because the
restart never happened). The latter is for failures AFTER, where
rollback ALSO needs another systemctl restart so the OLD versions
take over again. If even rollback's restart fails, we surface
everything we know — the install is broken and the operator gets
the breadcrumbs to fix it manually.

Existing TestNewBangerCommandHasExpectedSubcommands updated to
include "update" in the expected ordering.

Live exercise against the empty bucket today errors as expected:
$ banger update --check
banger: discover: fetch manifest: HTTP 404 Not Found  # exit 1
once the user publishes the first manifest the same command will
report "up to date" or "update available".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:35:04 -03:00
91af367208
updater: download/stage/swap/rollback flow steps
The pure-logic core of `banger update`. No CLI yet; this commit
ships the steps the next commit's command will orchestrate.

  * download.go — DownloadRelease fetches SHA256SUMS, parses it,
    looks up the tarball's basename, then streams the tarball
    through download.FetchVerified so the hash is checked on the
    fly. Returns the SHA256SUMS bytes alongside so a future
    cosign-verification step can validate them against an embedded
    public key before trusting the hashes inside.
    Also: fetchBounded for small bounded GETs (manifest, sums file,
    future signature), DefaultStagingDir, EnsureStagingDir,
    PrepareCleanStaging.
  * stage.go — StageTarball reads gzip+tar, validates the entry
    set is exactly {banger, bangerd, banger-vsock-agent} (no
    extras, no missing, no path traversal, no non-regular files),
    extracts at mode 0755 regardless of what the tarball claims.
    StagedRelease records the resulting paths.
  * swap.go — InstallTargets pins the canonical install paths
    (/usr/local/bin/banger, /usr/local/bin/bangerd,
    /usr/local/lib/banger/banger-vsock-agent). Swap orders the
    three replacements vsock → bangerd → banger so the most
    impactful binary (the CLI) goes last; each step uses
    system.AtomicReplace and accumulates a SwapResult so partial
    failures can be rolled back cleanly. Rollback unwinds in
    reverse, joining errors so a half-rolled-back state surfaces
    enough info for an operator to fix manually. CleanupBackups
    removes the .previous trail after `banger doctor` confirms
    the new install is healthy.
  * installmeta.UpdateBuildInfo — small helper that refreshes
    Version/Commit/BuiltAt on /etc/banger/install.toml without
    re-running the full system install. Preserves OwnerUser/UID/
    GID/Home and the original InstalledAt timestamp.

Tests: stage rejects extra entries / missing entries / path
traversal / non-regular files; happy-path stages all three at 0755
with correct contents. Swap+Rollback covers the all-three-succeed
path (then verifies .previous backups exist + rollback restores
old contents) AND the partial-failure path (third swap blocked by
a non-dir parent → SwappedTargets = 2 → rollback unwinds those
two cleanly). DownloadRelease covers happy path, tarball-not-in-
SHA256SUMS, and propagated sha256 mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:30:22 -03:00
fb6d2b1dae
updater: manifest + SHA256SUMS parsing scaffolding
First slice of the `banger update` package. No CLI yet — this just
defines the wire shape and parsers the rest of the flow will plug
into.

  * internal/updater/manifest.go — Manifest / Release types,
    ManifestSchemaVersion = 1, the hardcoded URL
    https://releases.thaloco.com/banger/manifest.json (var instead
    of const so tests can point at httptest), and FetchManifest /
    ParseManifest / Manifest.LookupRelease / Manifest.Latest.
    The manifest only references URLs (tarball, SHA256SUMS, optional
    signature); actual binary hashes come from SHA256SUMS itself,
    so manifest tampering can't substitute a hash for a known-good
    tarball.
    SchemaVersion gates forward-compat: a CLI that doesn't know its
    server's schema_version refuses to update rather than guessing.
  * internal/updater/sha256sums.go — ParseSHA256Sums tolerates both
    GNU `<digest>  <file>` (with optional `*` binary prefix) and
    BSD `SHA256 (file) = <digest>` formats. Comments and blank
    lines are skipped; malformed lines that LOOK like entries are
    rejected (silent skipping is the wrong failure mode for a
    security-relevant input). Digests are lowercased so the caller
    can `==`-compare without worrying about case.

Caps: 1 MiB on the manifest body, 16 KiB on SHA256SUMS, 256 MiB on
release tarballs. Generous-but-bounded; bumping requires a code
change so a server-side mistake can't fill the disk.

Tests: ParseManifest happy path, schema-version-too-new rejection,
five malformed-input cases. ParseSHA256Sums covers GNU + BSD +
star-prefix + comments-and-blanks, six malformed-input rejections,
case-insensitive digest normalisation. FetchManifest end-to-end via
httptest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:24:36 -03:00
abd5d6f5ab
download: shared FetchVerified helper for capped + hashed downloads
imagecat.Fetch and kernelcat.Fetch each implement the same pattern:
HTTP GET with a Content-Length pre-check, an io.LimitReader cap on
the body, on-the-fly sha256 hashing, and refusal on either the cap
trip or a hash mismatch. The about-to-arrive `banger update` flow
makes a third caller, which is the right number to factor.

  * internal/download.FetchVerified(ctx, client, url, expectedSHA256,
    maxBytes, dstPath): streams the body to dstPath through a
    sha256 hasher, capped at maxBytes+1 bytes so an oversize body
    is detected before the hash check fires. On any failure
    (HTTP error, ContentLength > cap, body exceeds cap, write
    error, hash mismatch) the partial file is removed before
    returning so callers don't have to disambiguate "did we leave
    bytes on disk?".

Imagecat and kernelcat are NOT migrated to this helper in this
commit — they each have their own destination-dir layout and
post-verify decompress/extract steps that don't fit a one-size
helper. Lift them later if it stays clean; for now the helper
is sized for the updater's "fetch tarball + sha256SUMS" need.

Tests cover happy path, hash mismatch, advertised Content-Length
over cap, lying server (chunked, no Content-Length, but oversize
body), HTTP non-2xx, and the two arg-validation rejections (empty
expected hash, non-positive maxBytes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 18:44:27 -03:00
fa3a7a3e31
system: add AtomicReplace + Rollback for binary swap
Prerequisite for `banger update`'s swap step. The updater renames a
staged binary into place and needs (a) atomicity per file (no
half-written bytes for a process that's about to systemctl restart
into the new binary) and (b) a backup it can restore from when
post-restart doctor reports FAIL.

  * AtomicReplace(newSrc, dst, suffixPrevious): if dst exists,
    move it to dst+suffixPrevious. Then os.Rename newSrc → dst.
    Atomic on a single fs (the only case relevant to the updater —
    everything is staged under /var/cache/banger and then renamed
    into /usr/local/bin, but those should be on the same fs in a
    typical install). On rename failure, restore the backup so we
    don't leave the caller without their binary.
  * AtomicReplaceRollback(dst, suffixPrevious): symmetric inverse.
    Removes dst, renames dst+suffixPrevious back to dst. Tolerant
    of a missing backup (fresh-install case) so the updater can
    call it unconditionally on failure paths without tracking
    backup state itself.
  * Refuses an empty suffix at compile-time-style guard: an empty
    suffix would silently no-op the backup AND break rollback.

Six tests cover: happy path, fresh install (no prior dst), stale
.previous from a half-finished prior run, empty-suffix rejection,
rollback restores, rollback tolerant of no-backup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 18:43:04 -03:00
ec6fc9d185
store,bangerd: add --check-migrations flag for pre-swap schema check
Prerequisite for `banger update`. Before swapping a staged binary
into place, the updater needs to confirm the new bangerd recognises
the running install's DB schema. Without this, an operator could end
up with a service that won't open its store after the binary swap +
restart.

  * store.InspectSchemaState(path): opens the DB read-only (reusing
    OpenReadOnly's mode=ro DSN), reads the schema_migrations table,
    and classifies the relationship between applied and known IDs:
    SchemaCompatible (lockstep), SchemaMigrationsNeeded (binary
    newer, will auto-migrate on first Open), or SchemaIncompatible
    (DB has applied IDs the binary doesn't know about).
    Missing schema_migrations table is treated as "all migrations
    pending" rather than an error — matches the fresh-install case.
  * bangerd --check-migrations: opens the configured DB read-only,
    prints a one-line classification, and exits 0/1/2. The exit
    code is the contract:
        0 — compatible
        1 — migrations needed (binary newer; safe to swap)
        2 — incompatible (binary older than DB; abort the swap)
    Honours --system to pick between system StateDir and user mode.
  * bangerdExit indirection so future tests can capture the exit
    code without terminating the test process. Production points
    at os.Exit.

Tests cover the four classifications: compatible (fully migrated
DB), migrations-needed (only baseline applied), incompatible
(synthetic id=99 inserted), and missing-table (fresh DB). Live
exercise on this dev host returned `migrations needed: pending [3]
(binary will apply on first Open)` and exit 1, matching the
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 18:41:31 -03:00
3c0af3a2de
opstate,daemon: list in-flight operations via daemon.operations.list
Prerequisite for `banger update`'s preflight, which refuses to swap
binaries while anything is in flight. Today's opstate.Registry
exposes Insert/Get/Prune but no iteration; without a snapshot
accessor the update flow can't tell whether a vm.create is
mid-prepare-work-disk.

  * opstate.Registry.List(): returns a freshly-allocated snapshot
    of every entry. Mutating the slice doesn't poison the
    registry. Pinned by tests covering the snapshot semantics
    and the empty case.
  * api.OperationSummary / OperationsListResult: a public-shape
    record per op. Today the Kind is always "vm.create" — the
    field exists so future async kinds (image.pull, kernel.pull)
    plug in without an API change.
  * Daemon.ListOperations + daemon.operations.list RPC:
    walks vmService.createOps and emits OperationSummary entries.
    Done ops are included in the snapshot; the update preflight
    filters by Done itself.
  * dispatch_test's documented-methods list updated.

No behaviour change for existing flows; this is a read-only
addition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 18:14:57 -03:00
775525b592
cli,doctor: --version flag + CLI/install drift check
Two pre-release polish items on the version-display surface.

  * --version on both binaries: cobra's Version field on the banger
    and bangerd roots renders a one-line summary (banger v0.1.0
    (commit abcd1234, built 2026-04-28T20:45:50Z)). The
    SetVersionTemplate override drops cobra's "{{.Name}} version"
    prefix — our string is already a complete sentence. The
    multi-line `banger version` subcommand is unchanged for callers
    that want the full SHA / built_at on separate lines.
  * Doctor "banger version" row: prints the running CLI's version +
    short commit + built-at, plus what /etc/banger/install.toml
    recorded at install time. Disagreement is the most common
    version-skew pitfall (stale CLI against fresh daemon, or vice
    versa) and a one-line warn is friendlier than tracking that down
    from a launch failure.
    Drift detection is suppressed when either side is dev/unknown
    (untagged build) — comparing a dev CLI against a tagged install
    is the developer-machine case, not a real problem.

formatVersionLine is in internal/cli (banger.go) and reused by
bangerd.go via a strings.Replace because bangerd's version line
should say "bangerd" not "banger". Slightly tilt-feeling but cheaper
than parameterising the helper for one caller.

Tests: TestVersionsDriftToleratesDevAndUnknown pins the four
branches (match, version diff, commit diff, dev-suppression). The
existing version-format test already runs through formatVersionLine
indirectly.

Live exercise:
  $ banger --version
  banger dev (commit 1c1ca7d6, built 2026-04-28T20:52:33Z)
  $ bangerd --version
  bangerd dev (commit 1c1ca7d6, built 2026-04-28T20:52:33Z)
  $ banger doctor | head
  ...
  PASS	banger version
    - CLI dev (commit 1c1ca7d6, built 2026-04-28T20:52:33Z)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:53:32 -03:00
1c1ca7d6a4
doctor: pin firecracker version range, distro-aware install hint
Pre-release polish: be explicit about which firecracker versions
banger has been validated against, and give users a one-line install
suggestion when the binary is missing rather than the previous
generic "install firecracker or set firecracker_bin".

internal/firecracker/version.go (new):
  * MinSupportedVersion = "1.5.0" — the floor banger refuses to
    launch below. Bumping this is a deliberate decision, paired
    with whatever helper feature started requiring the newer
    firecracker.
  * KnownTestedVersion = "1.14.1" — what banger's smoke suite
    actually runs against today.
  * SemVer + Compare + ParseVersionOutput, table-tested. The parser
    tolerates the trailing "exiting successfully" log line that
    firecracker tacks onto --version; only the canonical
    "Firecracker vX.Y.Z" line matters.
  * QueryVersion shells `<bin> --version` through a CommandRunner-
    shaped interface; doesn't import internal/system to keep the
    firecracker package leaf-clean.

internal/daemon/doctor.go:
  * New addFirecrackerVersionCheck replaces the previous bare
    RequireExecutable preflight for firecracker. Three outcomes:
    PASS within [Min, Tested], WARN above Tested (newer firecracker
    usually works but is outside the tested window), FAIL below Min
    or when the binary is missing.
  * On missing binary, surfaces a distro-aware install command via
    parseOSReleaseIDs(/etc/os-release) → guessFirecrackerInstall
    Command. Pinned suggestions for debian (apt), arch/manjaro
    (paru), and nixos (nix-env). Other distros get only the upstream
    Releases URL — guessing wrong sends users on a wild goose chase.
  * runtimeChecks no longer includes the firecracker preflight; the
    new check subsumes it.

README.md:
  * Requirements line now spells out the tested-against version
    (v1.14.1) and the supported floor (≥ v1.5.0), and points at
    `banger doctor` for the version check + install hint.

Tests: ParseVersionOutput across canonical/prerelease/garbage inputs,
SemVer.Compare across major/minor/patch boundaries, MustParseSemVer
panics on malformed inputs. Doctor-side: PASS on tested version,
FAIL below Min, WARN above Tested, FAIL with upstream URL when
missing, install-hint dispatch table covering debian/ubuntu (via
ID_LIKE)/arch/manjaro/nixos/fedora-fallback/missing-os-release.
The renamed TestDoctorReport_MissingFirecrackerFails... now asserts
against the new check name. Live `banger doctor` reports
"v1.14.1 at /usr/bin/firecracker (within tested range; min v1.5.0,
tested v1.14.1)" against the smoke host.

Smoke bare_run still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 17:47:42 -03:00
f7a6832ebf
Merge model,cli,docs polish for v0.1.0
# Conflicts:
#	internal/cli/commands_image.go
2026-04-28 17:36:47 -03:00