Two new pure scenarios:
* detach_run: -d --rm and -d -- <cmd> combos rejected before VM
creation; bare -d leaves the VM running and ssh-able afterward.
* bootstrap_precondition: workspace with a .mise.toml is refused
without --nat; --no-bootstrap bypasses the precondition and the
run completes normally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three load-bearing fixes that together let `banger update` (and its
auto-rollback path) restart the helper + daemon without killing
every running VM. New smoke scenarios prove the property end-to-end.
Bug fixes:
1. Disable the firecracker SDK's signal-forwarding goroutine. The
default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP,
SIGABRT] installs a handler in the helper that propagates the
helper's SIGTERM (sent by systemd on `systemctl stop bangerd-
root.service`) to every running firecracker child. Set
ForwardSignals to an empty (non-nil) slice so setupSignals
short-circuits at len()==0.
2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process
limits the initial SIGTERM to the helper main, but systemd
still SIGKILLs leftover cgroup processes during the
FinalKillSignal stage unless SendSIGKILL=no.
3. Route restart-helper / restart-daemon / wait-daemon-ready
failures through rollbackAndRestart instead of rollbackAndWrap.
rollbackAndWrap restored .previous binaries but didn't re-
restart the failed unit, leaving the helper dead with the
rolled-back binary on disk after a failed update.
Testing infrastructure (production binaries unaffected):
- Hidden --manifest-url and --pubkey-file flags on `banger update`
let the smoke harness redirect the updater at locally-built
release artefacts. Marked Hidden in cobra; not advertised in
--help.
- FetchManifestFrom / VerifyBlobSignatureWithKey /
FetchAndVerifySignatureWithKey export the existing logic against
caller-supplied URL / pubkey. The default entry points still
call them with the embedded canonical values.
Smoke scenarios:
- update_check: --check against fake manifest reports update
available
- update_to_unknown: --to v9.9.9 fails before any host mutation
- update_no_root: refuses without sudo, install untouched
- update_dry_run: stages + verifies, no swap, version unchanged
- update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same
boot_id) answers SSH after the daemon restart
- update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a
bangerd that passes --check-migrations but exits 1 as the
daemon. The post-swap `systemctl restart bangerd` fails,
rollbackAndRestart fires, the .previous binaries are restored
and re-restarted; the same VM still answers SSH afterwards
- daemon_admin (separate prep): covers `banger daemon socket`,
`bangerd --check-migrations --system`, `sudo banger daemon
stop`
The smoke release builder generates a fresh ECDSA P-256 keypair
with openssl, signs SHA256SUMS cosign-compatibly, and serves
artefacts from a backgrounded python http.server.
verify_smoke_check_test.go pins the openssl/cosign signature
equivalence so the smoke release builder can't silently drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous one-liner ("banger needs permission to manage network
access for the VMs you launch") was honest but understated; banger
also needs sudo for storage (rootfs snapshots, loop devices, image
files), launching/stopping firecracker under jailer isolation, and
installing binaries + systemd units. Spell those out as a short
bulleted list at the moment of decision so the user is authorising
a known scope rather than a euphemism.
Wording stays plain-language — no capability names, no jargon —
since the target audience may not know networking or container
terminology.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surveyed the install scripts of comparable systemd-installing tools
(Docker, k3s, Tailscale, Ollama, Determinate Systems Nix, flyctl):
none of the daemon installers offer a --user staging mode, because
the resulting install isn't useful — banger inherits that. The
"--user just stages binaries you can't actually use yet" UX was a
trap; remove it before users hit it.
In its place, adopt the cross-tool convention for non-interactive
runs: the BANGER_INSTALL_NONINTERACTIVE=1 env var is friendlier
through a curl|bash pipe than `bash -s -- --yes` because the env
var can sit on the same line:
curl -fsSL ...install.sh | env BANGER_INSTALL_NONINTERACTIVE=1 bash
The --yes flag still works for direct script invocation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/install.sh is the one-command installer end users run as
curl -fsSL https://releases.thaloco.com/banger/install.sh | bash
Design choices:
* Runs as the invoking user. All network work + signature verification
happens unprivileged; sudo is only re-execed for the actual install
step that writes to /usr/local and creates systemd units.
* Right before the sudo prompt, the script prints a plain-language
summary of exactly what's about to happen — the file paths it will
create and a one-line "why sudo" — so the user authorises a known
scope rather than the whole pipeline. Detail link in the docs.
* Uses openssl (universally available) for signature verification, not
cosign. cosign is needed only by the *signer*, never the verifier.
* No jq dependency. The latest_stable field is extracted from the
manifest with grep+sed, since the manifest shape is well-defined and
we control it.
* /dev/tty fallback for the confirmation prompt so it works through
the curl|bash pipe.
* --yes for non-interactive CI use, --user for installing into
~/.local/bin without touching system paths, --version vX.Y.Z to pin.
publish-banger-release.sh now uploads install.sh to the bucket root
on every publish, so the curl URL is stable but the script logic
matches the latest verified release. It also runs a key-drift check:
if scripts/install.sh's embedded cosign public key differs from the
one in internal/updater/verify_signature.go, publishing aborts. The
two copies must stay in sync or one of them ends up rejecting every
release.
README's Quick start now leads with the installer one-liner and
documents the audit-first variant alongside it; building from source
moves below.
Smoke-tested end to end against the live bucket with --user mode:
manifest fetch → tarball download → cosign signature verify → hash
verify → extract → install. The installed binary reports v0.1.0 at
commit 6fdebd9, matching the published artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous form passed rclone paths like releases:banger/v0.1.0/,
which rclone parses as bucket=banger, key=v0.1.0/... — wrong, because
the actual R2 bucket is named "releases" (BUCKET_PATH was meant as
an in-bucket key prefix only). Uploads 403'd because the token has
no view of a bucket called "banger".
Introduce RCLONE_BUCKET as a separate env var (default: "releases")
and route every rclone copy through ${RCLONE_REMOTE}:${RCLONE_BUCKET}/${BUCKET_PATH}.
The public URLs in the manifest stay unchanged: BASE_URL is the
bucket's public custom domain, so the bucket name is implicit there.
The defaults now resolve to the live setup:
rclone target: releases:releases/banger/<version>/<file>
public URL: https://releases.thaloco.com/banger/<version>/<file>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous form did
COSIGN_PASSWORD="${COSIGN_PASSWORD:-}" cosign sign-blob ...
which set COSIGN_PASSWORD to "" when the caller hadn't exported one.
cosign sees an explicit empty password and tries to decrypt with
it instead of prompting interactively, so any real password-protected
offline key fails with "decryption failed".
Drop the prefix entirely. If COSIGN_PASSWORD is already in env, it
gets inherited normally; if it isn't, cosign prompts on the terminal
— which is the right UX for a maintainer running the publish script
locally with the offline private key.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs found while dry-running the publish flow end-to-end:
1. The awk pipeline that pulled BangerReleasePublicKey out of
verify_signature.go didn't strip Go's raw-string-literal wrapping
(`var ... = ` + backtick on the BEGIN line, trailing backtick on
the END line). The "verify against embedded pub key" step thus
compared sigs against a malformed PEM. Replaced with a sed pair
that yields a clean PEM block byte-identical to cosign.pub.
2. cosign v3.x defaults sign-blob to a new bundle format and
pushes signatures to Rekor; both are incompatible with banger's
"embedded pub key, raw ASN.1 DER signature" trust model.
Add --use-signing-config=false / --tlog-upload=false /
--new-bundle-format=false to opt out, and --insecure-ignore-tlog
on verify-blob. These flags also work on cosign v2.x, so the
script is forward- and backward-compatible across the v2→v3
boundary.
Validated by an end-to-end dry-run on this machine: built binaries,
tarred, sha256summed, cosign-signed, verified against the embedded
pub key, then re-verified through internal/updater's
crypto/ecdsa.VerifyASN1 path — all green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README gets a top-level Updating section; docs/privileges.md gains
a step-by-step trust-model writeup of `banger update`. The new
scripts/publish-banger-release.sh drives the manual release cut:
build, tar, sha256sum, cosign sign-blob, verify against the embedded
public key, jq-merge into manifest.json, rclone upload to the R2
bucket. Refuses outright if the embedded key is still the placeholder
so we can't accidentally publish an unverifiable release. Also folds
in gofmt drift accumulated across the updater package and a few
sibling files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three quality-of-life improvements now that the daemon-side races
that gated parallel mode are fixed:
1. **Smol VMs by default.** Smoke installs a tuned config.toml at
/etc/banger/config.toml between `system install` and `system
restart` so the respawned daemon picks up:
vcpu = 2
memory_mib = 1024
disk_size = "2G"
system_overlay_size = "2G"
Smoke scenarios assert behavior, not capacity — they don't need
4 vCPU / 8 GiB / 8 GiB / 8 GiB. Per-VM RAM cost drops from 8 GiB
to 1 GiB; nominal disk drops from 16 GiB to 4 GiB (sparse, so
actual use is small either way, but the new ceiling is gentler
on hosts that can't overcommit). Scenarios that test
reconfiguration (vm_set's --vcpu 2 → 4) still pass --vcpu
explicitly, so this default doesn't perturb their assertions.
2. **JOBS defaults to nproc.** The Makefile resolves JOBS to
`$(shell nproc)` if unset; the smoke script's existing cap of 8
keeps the parallel pool sane on bigger hosts. The script always
passes --jobs N now, so behavior is consistent. Override with
`make smoke JOBS=1` for a fully serial run.
3. **Help text catches up.** --help no longer flags parallelism as
experimental (the underlying daemon races are fixed) and now
describes the small-VM default. `make help` mentions the new
default and how to override.
Verified: `make smoke` (no JOBS) on a 32-core box auto-runs with
JOBS=8, smol VMs, 21/21 PASS in 172s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem
from `vm.create` paths assuming single-caller semantics:
1. **Kernel auto-pull manifest race.** Parallel `vm.create` calls that
each need to auto-pull the same kernel ref both run kernelcat.Fetch
in parallel against the same /var/lib/banger/kernels/<name>/. Fetch
writes manifest.json non-atomically (truncate + write); the peer
reads it back mid-write and trips
"parse manifest for X: unexpected end of JSON input".
Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock).
`KernelPull` and `readOrAutoPullKernel` both acquire it and re-check
`kernelcat.ReadLocal` after the lock so a peer who finished while we
waited is treated as success — `readOrAutoPullKernel` does NOT call
`s.KernelPull` because that path errors with "already pulled" on a
peer-success, which would be wrong for auto-pull. Different kernels
stay parallel.
2. **Image auto-pull race.** Same shape as the kernel race but on the
image side: parallel `vm.create` calls both run pullFromBundle /
pullFromOCI for the missing image (each ~minutes of OCI fetch +
ext4 build). The publishImage atom under imageOpsMu only protects
the rename + UpsertImage commit, so the loser does all the work
only to fail at the recheck with "image already exists".
Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock).
`findOrAutoPullImage` acquires it, re-checks FindImage, and only
then calls PullImage. Loser short-circuits with the
freshly-published image instead of redoing minutes of work.
PullImage's own publishImage recheck stays as defense-in-depth
for callers that bypass the auto-pull path.
3. **Work-seed refresh race.** When the host's SSH key has rotated
since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk`
triggers `refreshManagedWorkSeedFingerprint`, which rewrote the
shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create`
calls doing parallel `MaterializeWorkDisk` rdumps observed a torn
ext4 image — "Superblock checksum does not match superblock".
Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`)
and atomic-rename. Concurrent readers either have the file open
(kernel keeps the pre-rename inode alive) or open after the rename
(see the new inode) — never observe a partial state. Two parallel
refreshes are idempotent (same daemon, same SSH key) so unique tmp
names are enough; whichever rename lands last wins, with identical
content. UpsertImage runs after the rename so the recorded
fingerprint always matches what's on disk.
Plus one smoke harness fix: reclassify `vm_prune` from `pure` to
`global`. `vm prune -f` removes ALL stopped VMs system-wide, not just
the ones the scenario created — so a parallel peer scenario that
happens to have its VM in `created`/`stopped` momentarily gets wiped.
Moving prune to the post-pool serial phase keeps it from racing with
in-flight scenarios.
After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s
(serial baseline 141s; the small overhead is the buffered-output and
`wait -n` semaphore cost — well worth the parallelism for fast-iter
work on a 32-core box).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`scripts/smoke.sh` was a 600-line linear script: no way to see what it
covers without reading the whole thing, and no way to run a single
scenario when iterating. Every iteration paid the full ~5-10 min suite,
which made fast feedback loops painful enough to avoid the suite.
Refactor into a registry + per-scenario functions:
- Top-of-file SMOKE_SCENARIOS (ordered) + SMOKE_DESCS (one-line desc per
scenario) + SMOKE_CLASS (pure / repodir / global) drive both listing
and dispatch. The 21 existing scenario blocks become scenario_<name>
functions. Bodies are the inline blocks verbatim, modulo the workspace
fixture move described below.
- New CLI: --list (cheap discovery, no install / no env-vars),
--scenario NAME (or NAME,NAME,...), --jobs N (parallel dispatch),
-h / --help.
- New setup_fixtures runs once after the install/doctor/restart preamble
and produces the throwaway git repo at $repodir that 'repodir'-class
scenarios consume. Lifted out of scenario_workspace_run so single-
scenario invocations (e.g. --scenario workspace_dryrun) get the
fixture even when the scenario that historically built it isn't
selected.
- Wipe ~/.local/state/banger/ssh/known_hosts in the install preamble.
`system uninstall --purge` clears /var/lib/banger but the user-side
known_hosts persists by design — and smoke creates VMs that reuse
guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a
leftover entry trips StrictHostKeyChecking and the daemon's wait-
for-ssh sees only timeouts. This was the real cause of the "guest
ssh did not come up" flakes that surface across smoke iterations.
Parallel dispatch:
- --jobs N opts into a slot-limited pool: 'pure' scenarios fan out as
individual jobs; 'repodir' scenarios fuse into a single serial chain
(since they mutate $repodir in registry order); 'global' scenarios
run serially after the pool, one at a time.
- Cap is min(N, 8) — each parallel slot runs an 8 GiB VM, so RAM is
the binding constraint.
- Parallel-mode stdout/stderr per scenario buffer to per-scenario
logs and emit one PASS/FAIL line on completion; on FAIL the buffer
is dumped. Serial mode (--jobs 1, the default) keeps stdout
unbuffered exactly as before.
- Parallelism is documented as experimental in --help: it surfaces
real daemon-side concurrency bugs (image auto-pull manifest race,
work-seed-refresh race on the shared work-seed.ext4) that don't
appear in serial mode and that need their own fix in the daemon.
Serial (--jobs 1) is the reliable path; --jobs N is for fast-
iteration dev work where occasional re-runs are acceptable.
Exit codes: 0 ok, 1 assertion failed, 2 usage error (unknown
scenario, missing SCENARIO=), 77 explicit selection skipped (NAT
when sudo iptables is unavailable AND nat is the only selected
scenario; soft-skip otherwise).
Makefile additions:
- `make smoke-list` — cheap discovery, no smoke-build dep, no env vars.
- `make smoke-one SCENARIO=name` — single-scenario run, full preamble.
MAKECMDGOALS guard catches missing SCENARIO= before any rebuild.
- `make smoke JOBS=N` — passes through to the script's --jobs N.
- Help text covers all three.
Verified: serial full suite passes 21/21 in ~140s on this host;
make smoke-one SCENARIO=workspace_restart runs the recently-added
regression test alone in ~50s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VM stop has been quietly losing data freshly written via
`vm workspace prepare`: stop+start of a workspace-prepared VM would
come back with /root/repo wiped on the work disk.
Root cause is firecracker + Debian's systemd defaults. FC's
SendCtrlAltDel (the only "graceful shutdown" action FC exposes) just
delivers the keystroke; what the guest does with it is its choice.
Debian routes ctrl-alt-del.target -> reboot.target, so the guest
reboots, FC stays alive, the daemon's 10s wait_for_exit window
expires, and the SIGKILL fallback drops anything still in FC's
userspace I/O path. For an idle VM that's invisible. For one that
just took 100s of small writes through a workspace prepare, it's
data loss.
Fix is to dial the guest over SSH inside StopVM and run
`sync; systemctl --no-block poweroff || /sbin/poweroff -f &` before
the existing SendCtrlAltDel path. The synchronous `sync` is the
load-bearing piece — it blocks until every dirty page hits virtio-blk
and lands in the on-host root.ext4. Whether poweroff completes
before SIGKILL fires is incidental; sync has already run. SSH
unreachable falls back to the old SendCtrlAltDel behaviour so a
broken-network guest can't make stop hang.
Bounded by a 5s SSH-dial timeout so a half-broken guest can't extend
the overall stop window past gracefulShutdownWait.
Also adds two smoke scenarios:
- `workspace + stop/start`: prepare -> stop -> start -> assert
marker survives. This is the regression that caught the bug.
- `vm exec`: end-to-end coverage for d59425a — auto-cd into the
prepared workspace, exit-code propagation, dirty-host warning,
--auto-prepare resync, refusal on stopped VM.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three small but high-leverage presentation tweaks for v0.1:
1. internal/cli/style is a new ~70 LOC package with Pass/Fail/Warn/
Dim/Bold helpers. Each is TTY-gated and obeys NO_COLOR. No
external dep. Wired into the doctor PASS/FAIL/WARN status, the
"banger:" error prefix on stderr, and the dim 'ready in <elapsed>'
line.
2. internal/cli/errors translates rpc.ErrorResponse into user-facing
text. operation_failed becomes invisible (the message wins);
not_found, already_exists, bad_request, bad_version, unauthorized,
unknown_method get short labels; unknown codes pass through. The
daemon-attached op_id lands in dim parens — paste into
journalctl --grep to find the daemon log line that produced the
failure.
3. Tabwriter config converges on (0, 8, 2, ' ', 0) across every
list/table command. The vm prune confirmation table picked up the
right config; system install + system status switched from bare
"key: value\n" lines to tabular form. printVMSpecLine drops its
Unicode middle dot for an ASCII '|' so terminals without UTF-8
render cleanly.
Tests cover translateRPCError for every code, style helpers no-op
on non-TTY and under NO_COLOR. Smoke status greps switch from
"key: value" to "key value" to match the new format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
A VM name flows into five places that all have narrower grammars
than "arbitrary string":
- the guest's /etc/hostname (vm_disk.patchRootOverlay)
- the guest's /etc/hosts (same)
- the <name>.vm DNS record (vmdns.RecordName)
- the kernel command line (system.BuildBootArgs*)
- VM-dir file-path fragments (layout.VMsDir/<id>, etc.)
Nothing in the chain was validating the input. A name with
whitespace, newline, dot, slash, colon, or = would produce broken
hostnames, weird DNS labels, smuggled kernel cmdline tokens, or
(in the worst case) surprising traversal through the on-disk
layout. Not host shell injection — we already avoid shelling out
with the raw name — but a real correctness and supportability bug.
New: model.ValidateVMName. Rules:
- 1..63 chars (DNS label max per RFC 1123; also a comfortable
/etc/hostname cap)
- lowercase ASCII letters, digits, '-' only
- no leading or trailing '-'
- no normalization — the name is the user-visible identifier
(store key, `ssh <name>.vm`, `vm show`); silently rewriting
"MyVM" → "myvm" would hand the user back something different
than they typed
Called from two places:
- internal/cli/commands_vm.go vmCreateParamsFromFlags — rejects
bad `--name` values before any RPC. Empty name still passes
through so the daemon can generate one.
- internal/daemon/vm_create.go reserveVM — defense in depth for
any non-CLI RPC caller (SDK, direct JSON over the socket).
Tests:
- internal/model/vm_name_test.go — exhaustive character-class
matrix (space, newline, tab, dot, slash, colon, equals, quote,
control chars, unicode letters, uppercase, leading/trailing
hyphen, over-length, max-length-exact, digits-only).
- internal/cli TestVMCreateParamsFromFlagsRejectsInvalidName —
CLI wire-through + empty-name passthrough.
- internal/daemon TestReserveVMRejectsInvalidName — daemon
defense-in-depth (including `box/../evil` path-traversal).
- scripts/smoke.sh — end-to-end rejection + no-leaked-row
assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
--readonly ran `chmod -R a-w` over the workspace after copying, but
every banger guest boots as root, and root bypasses DAC mode checks.
So a user running `vm workspace prepare ... --readonly` got the
mode bits set to 0444 but `echo x >> file` in the guest still
succeeded. The flag promised enforcement it couldn't deliver.
The feature also doesn't match the product model: workspaces are
prepared precisely so the guest CAN edit them, and `workspace
export` exists to pull those edits back as a patch. A
"read-only workspace" contradicts that loop.
Removed:
- CLI flag `--readonly` on `vm workspace prepare`
- api.VMWorkspacePrepareParams.ReadOnly field
- model.WorkspacePrepareResult.ReadOnly field
- daemon chmod dispatch in prepareVMWorkspaceGuestIO
- smoke scenario pinning the (advisory) mode-bit behavior
- misleading "exportbox-readonly" VM name in an unrelated export
test (the test is about not mutating the real git index;
renamed to exportbox-noindex-mutation)
If real enforcement becomes a user need later, the right primitive
is `chattr +i` (immutable bit — root CAN'T write) or a ro bind-mount.
Reintroducing a new flag is cheaper than debugging what the current
one actually guarantees.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of banger's advertised CLI surface vs. what smoke was exercising
turned up several gaps where a regression would have shipped silently.
New scenarios:
- NAT: asserts the per-VM POSTROUTING MASQUERADE rule is installed
with --nat (scoped to the guest /32), idempotent across stop/start,
and torn down on delete. End-to-end curl tests don't work here
because the bridge IP and uplink IP both belong to the host — a
guest reaching the uplink lands on host-local loopback whether
MASQUERADE is set up or not — so the test pins the iptables rule
itself. Skipped if passwordless `sudo iptables` isn't available.
- vm ports: sshd :22 must be visible with the <name>.vm endpoint
(not localhost, not the raw guest IP — the daemon prefers the
DNS record when one exists).
- vm restart: dedicated verb, not a stop+start alias. Asserts a
fresh boot_id to prove the kernel actually recycled.
- vm kill --signal KILL: forceful termination path (distinct from
`vm stop`'s graceful Ctrl-Alt-Del). Post-kill state must be
'stopped' (not 'error') and the dm-snapshot must be cleaned up.
- vm prune -f: batch delete of non-running VMs while preserving any
that are still running. Regression guard for the case where prune
could wipe a live session.
- workspace prepare --readonly: mode bits on /root/repo/<file>
must drop all write bits. Enforcement is advisory against a root
guest, so the test asserts the bits, not EACCES.
- workspace prepare --mode full_copy: alternate transfer path
(tarred into rootfs, no overlay) still lands the repo contents
at /root/repo.
- workspace export --base-commit: guest-side commits captured in
the patch when the pre-commit SHA is pinned. The feature's whole
reason for existing; it had zero coverage. Includes a control
assertion that the plain (no --base-commit) export does NOT see
the committed file.
- ssh-config --install / --uninstall: HOME-isolated to a smoke
tempdir so we don't touch the invoking user's ~/.ssh/config.
Seeds a pre-existing config to catch any regression where
install clobbers instead of appending. Asserts idempotency
(second install doesn't duplicate the Include line) and clean
round-trip (uninstall leaves the user's own content intact).
Coverage deltas from smoke (vs the last run):
internal/hostnat 14.1% → 64.1% (+50pp — NAT rule dance)
internal/daemon/opstate 56.2% → 87.5% (+31pp)
internal/daemon 43.4% → 49.4% (+6pp)
internal/cli 36.1% → 40.4% (+4pp)
internal/daemon/workspace 64.1% → 67.5% (+3pp)
Scenario count: 12 → 21.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two defects compounded to make `vm create X` → `vm stop X` → `vm start X`
→ `vm ssh X` fail with `not_running: vm X is not running` even though
`vm show` reports `state=running`.
1. firecracker-go-sdk's startVMM spawns a goroutine that SIGTERMs
firecracker when the ctx passed to Machine.Start cancels — and
retains that ctx for the lifetime of the VMM, not just the boot
phase. Our Machine.Start wrapper was plumbing the caller's ctx
through, which on `vm.start` is the RPC request ctx. daemon.go's
handleConn cancels reqCtx via `defer cancel()` right after
writing the response. Net effect: firecracker is killed ~150ms
after the `vm start` RPC "completes", invisibly, and the next
`vm ssh` sees a dead PID. `vm.create` side-stepped the bug
because BeginVMCreate detaches to context.Background() before
calling startVMLocked; `vm.start` used the RPC ctx directly.
Fix: Machine.Start now passes context.Background() to the SDK.
We own firecracker lifecycle explicitly (StopVM / KillVM /
cleanupRuntime), so ctx-driven cancellation here was never
actually wired into anything useful.
2. With (1) fixed, the same scenario exposed a second defect:
patchRootOverlay's e2cp/e2rm refuses to touch the dm-snapshot
with "Inode bitmap checksum does not match bitmap" on a restart,
because the COW holds stale free-block/free-inode counters from
the previous guest boot. Kernel ext4 is fine with this; e2fsprogs
is not. Fix: run `e2fsck -fy` on the snapshot between the
dm_snapshot and patch_root_overlay stages. Idempotent on a fresh
snapshot, reconciles the bitmaps on a reused COW.
Regression coverage:
- scripts/repro-restart-bug.sh — minimal create→stop→start→ssh
reproducer with rich on-failure diagnostics (daemon log trace,
firecracker.log tail, handles.json, pgrep-by-apiSock, apiSock
stat). Exits non-zero if the bug returns.
- scripts/smoke.sh — lifecycle scenario (create/ssh/stop/start/
ssh/delete) and vm-set scenario (--vcpu 2 → stop → set --vcpu 4
→ start → assert nproc=4). Both were pulled when the bug was
first found; now restored.
Supporting:
- internal/system/system.ExitCode — extracts exec.ExitError's
code without forcing callers to import os/exec. Needed by the
e2fsck caller (policy test pins os/exec to the shell-out
packages).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The export round-trip (`vm create` → `workspace prepare` → guest edit →
`workspace export`) exposed a reproducible failure on Debian bookworm
guests: `git read-tree HEAD --index-output=/tmp/...` returns exit 128
"unable to write new index file" when the target lives on tmpfs while
`.git` is on the workspace overlay. Move the temp index into
`$(git rev-parse --git-dir)` so it shares a filesystem with `.git/index`
and the lockfile + rename + hardlink dance git does internally works.
Alongside:
- new workspace-export smoke scenario that would have caught this at
the boundary between daemon and guest git
- `make smoke-fresh` = `smoke-clean && smoke` for release-time runs
that want first-install paths (migrations, image pull) stamped into
the coverage report
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five new smoke scenarios layered on top of the existing bare + workspace
vm-run tests:
- exit-code propagation: `sh -c 'exit 42'` must rc=42
- workspace dry-run: --dry-run lists tracked files without a VM
- workspace --include-untracked: opt-in ships files outside the git
index (regression guard on the security-default flip from review 4)
- concurrent vm runs: two --rm invocations in parallel both succeed
(stresses per-VM locks, createVMMu reservation window, tap pool)
- invalid spec rejection: --vcpu 0 must fail with no VM row left
behind (the "cleanup on partial failure" path the review flagged)
The exit-code scenario caught a real bug on first run:
`banger vm run --rm -- sh -c 'exit 42'` returned rc=0, not 42.
Root cause in internal/cli/ssh.go's sshCommandArgs: extra args were
appended to the ssh argv verbatim, relying on ssh(1)'s implicit
space-join to deliver the remote command. That works for single
tokens (echo hello) but re-tokenises multi-word commands on the
remote side: `ssh host sh -c 'exit 42'` becomes remote
`sh -c exit 42`, where `42` is $0 for the already-completed `exit`,
and the exit code the user asked for is lost.
Fix: shell-quote every element of extra (`'sh'` `'-c'` `'exit 42'`)
and join them into a single trailing argv entry. ssh's space-join
then produces exactly the command the user typed on the remote
shell. TestSSHCommandArgs was updated to pin the quoting; the
existing TestRunVMRunCommandModePropagatesExitCode test needed a
one-word quote tweak (`false` → `'false'`).
Smoke run after fix passes all seven scenarios in ~2 min on warm
state. cmd/banger coverage jumped to 100% (the invalid-spec
scenario hits the error-reporting path that wasn't covered
before).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit + integration tests can't cross machine.Start — the SDK
boundary would need a fake firecracker that reimplements the
control-plane HTTP API, and the ongoing maintenance cost of keeping
that fake honest with upstream kills the value. Instead, add a
pre-release smoke target that drives REAL Firecracker + real KVM,
captures coverage from the -cover-instrumented binaries, and
surfaces per-package deltas so regressions in the boot path don't
ship silently.
scripts/smoke.sh:
- Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't
touch real user state (state/cache persist under build/smoke/xdg
for fast reruns; runtime is mktemp'd fresh per-run because
sockets can't be reused)
- Preflight: `banger doctor` must pass; UDP :42069 must be free
(otherwise the user's real daemon is up and the smoke daemon
can't bind its DNS listener — fail with an actionable message)
- Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok`
exercises create → start → socket ownership chown → machine.Start
→ SDK waitForSocket race → vsock agent readiness → guest SSH
wait → exec → cleanup → delete
- Scenario 2 — workspace: creates a throwaway git repo, runs
`banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`,
verifies the tracked file reached the guest (exercises
workDisk capability PrepareHost + workspace.prepare)
- `banger daemon stop` at the end so instrumented binaries flush
GOCOVERDIR pods before the script exits
Makefile additions:
- smoke-build: builds banger/bangerd under build/smoke/bin/ with
`go build -cover`
- smoke: runs the script with GOCOVERDIR set, reports per-package
coverage via `go tool covdata percent`
- smoke-coverage-html: textfmt + go tool cover for a browsable
report
- smoke-clean: nukes build/smoke/ including the persisted XDG
state
Bonus fix uncovered during the first smoke run: doctor treated a
missing state.db as a FAIL ("out of memory" from SQLite
SQLITE_CANTOPEN), which red-flagged every fresh install. Split
the store check: DB file absent → PASS with "will be created on
first daemon start" detail; DB present but unreadable → FAIL as
before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall
pins the behaviour.
Concrete coverage delta from the first successful smoke run
(compared to `make coverage-total`'s unit-test-only 37.8%):
internal/firecracker 43.6% → 75.0%
internal/daemon/workspace 33.8% → 60.8%
internal/store 40.1% → 56.3%
internal/guest 63.7% → 57.4% (different mix: smoke
exercises real SSH;
unit tests cover more
error branches)
The packages the review flagged are the ones that moved most —
which is the point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three independent hardenings, addressing a review finding that the
kernel and image build pipelines were relying on HTTPS alone for
artifact integrity.
scripts/make-generic-kernel.sh
- Fetch the detached PGP signature (linux-<ver>.tar.sign) alongside
the tarball and verify it with gpg before extraction. An isolated
$GNUPGHOME under the tempdir keeps the kernel signers out of the
invoking user's keyring.
- Import the three kernel.org release signing keys (Greg KH / Linus /
Sasha Levin) from keyserver.ubuntu.com, falling back to
keys.openpgp.org. Ubuntu comes first because keys.openpgp.org strips
unverified UIDs on upload, leaving gpg with UID-less keys it
refuses to trust.
- Require VALIDSIG (cryptographic proof) rather than GOODSIG
(printed even for expired keys) before proceeding. Verified
end-to-end against a clean tarball (accepts) and a byte-flipped
tampered copy (rejects with BADSIG).
- gpg + gpgv + xz added to the required-tools check.
images/golden/Dockerfile
- Pin Docker's apt signing key by fingerprint. After downloading
/etc/apt/keyrings/docker.asc we gpg --show-keys --with-colons it,
extract the fpr, and compare against the expected
9DC858229FC7DD38854AE2D88D81803C0EBFCD88. A tampered or swapped key
aborts the build before any apt repo metadata is fetched.
- Replace `curl https://mise.run | sh` with a pinned GitHub release
binary (mise v2026.4.18, linux-x64) verified against its published
sha256. Refuses to build on unknown architectures rather than
silently installing a binary we have no hash for.
- Add gnupg to the ESSENTIAL apt-get install so the fingerprint check
has gpg available.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The script carried a python3 dep for one json.dumps on a VM name
that's always alphanumeric-plus-dashes anyway, it was never wired
into CI or docs, and `time banger vm create` covers the same need
ad hoc when anyone wants to measure create latency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The golden-image Dockerfile + catalog pipeline replaces the entire
manual rootfs-build stack. With that shipped, the per-distro shell
flows are dead code.
Removed:
- scripts/customize.sh, scripts/interactive.sh, scripts/verify.sh
- scripts/make-rootfs{,-void,-alpine}.sh
- scripts/register-{void,alpine}-image.sh
- scripts/make-{void,alpine}-kernel.sh
- internal/imagepreset/ (only consumer was `banger internal packages`,
which fed customize.sh)
- examples/{void,alpine}.config.toml
- Makefile targets: rootfs, rootfs-void, rootfs-alpine, void-kernel,
alpine-kernel, void-register, alpine-register, void-vm, alpine-vm,
verify-void, verify-alpine, plus the ALPINE_RELEASE / *_IMAGE_NAME
/ *_VM_NAME variables
The void-6.12 kernel catalog entry is also gone — golden image pairs
with generic-6.12 and nothing else in the catalog depended on it.
Consolidated: imagemgr now holds the small DebianBasePackages list +
package-hash helper inline, so the `image build --from-image` flow
(still supported) no longer pulls from a separate imagepreset package.
Net: 3,815 lines deleted, 59 added. No runtime functionality removed
beyond the `banger internal packages` subcommand (hidden, used only
by the deleted customize.sh).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Embed the sha256 prefix in the uploaded filename so every rebuild
lives at a unique URL. Cloudflare's edge cache (and any similar CDN
in front of R2) can never serve stale bytes for the URL the catalog
points at. The R2 console offers no per-URL purge for this bucket
layout, so making the URL itself content-addressed is the only
durable fix.
Also republishes the debian-bookworm catalog entry with the new
filename.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First entry in the image catalog. Verified end-to-end:
- https://images.thaloco.com/debian-bookworm-x86_64.tar.zst reachable
- sha256 071495e6... matches
- bundle unpacks to rootfs.ext4 (4 GiB) + manifest.json with the
expected name/distro/arch/kernel_ref.
publish-golden-image.sh tweaks:
- default RCLONE_REMOTE from 'r2' to 'banger-images' (matches the
rclone config actually in use here).
- rclone copyto now passes --s3-no-check-bucket and --no-check-dest
so scoped R2 tokens without HeadBucket/HeadObject permission
still upload cleanly.
To use: restart bangerd so it picks up the new embedded catalog,
then `banger image pull debian-bookworm`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the OCI-push flow with a bundle-based one that mirrors the
kernel catalog (publish-kernel.sh / kernelcat).
- scripts/make-golden-bundle.sh: docker build → docker create → docker
export | banger internal make-bundle → .tar.zst. Defaults target
debian-bookworm / generic-6.12 / x86_64; pinned --size 4G to leave
headroom for first-boot installs and in-VM apt use.
- scripts/publish-golden-image.sh: rewritten to call make-golden-bundle,
rclone upload to R2 (banger-images bucket, images.thaloco.com), and
jq-patch internal/imagecat/catalog.json with URL / sha256 / size.
--skip-upload stops after bundle build and copies to dist/.
make-bundle default ext4 sizing also bumped from +25% to +50% headroom
(mkfs.ext4 needs room for inode tables, block-group metadata, journal,
and the default 5% reserved-blocks margin). The old 25% was too tight
for the ~950 MB golden rootfs and aborted with "Could not allocate
block".
End-to-end smoke (local): golden Dockerfile → 286 MB tar.zst bundle
with correct manifest, valid ext4, and all banger units + vsock agent
present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Debian bookworm with two clearly-labeled sections:
- ESSENTIAL: systemd, openssh-server, ca-certificates, curl, iproute2.
- OPINION: git, jq, ripgrep, fd, build-essential, shellcheck, mise,
Docker CE (+ Compose v2 + buildx), tmux, htop, and friends.
Per-VM identity stripped at build time: /etc/machine-id cleared,
SSH host keys removed with a ssh.service drop-in that runs
`ssh-keygen -A` on first start so each VM gets a unique set.
The script is a parameterized wrapper around `docker build`; it also
supports `--push` to an OCI registry, which will be removed once the
bundle pipeline is in place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the full arc: banger kernel pull + image pull + vm create + vm ssh
now works end-to-end against docker.io/library/debian:bookworm with zero
manual image building.
Generic kernel:
- New scripts/make-generic-kernel.sh builds vmlinux from upstream
kernel.org sources using Firecracker's official minimal config
(configs/firecracker-x86_64-6.1.config). All critical drivers
(virtio_blk, virtio_net, ext4, vsock) compiled in — no modules,
no initramfs needed.
- Published as generic-6.12 in the catalog (kernels.thaloco.com).
- catalog.json updated with the new entry.
Direct-boot init= override (vm_lifecycle.go):
- For images without an initrd (direct-boot / OCI-pulled), banger now
passes init=/usr/local/libexec/banger-first-boot on the kernel
cmdline. The script runs as PID 1, mounts /proc /sys /dev /run,
checks for systemd — if present execs it immediately; if not
(container images), installs systemd-sysv + openssh-server via the
guest's package manager, then execs systemd.
- Also passes kernel-level ip= parameter via BuildBootArgsWithKernelIP
so the kernel configures the network interface before init runs
(container images don't ship iproute2, so the userspace bootstrap
script can't call ip(8)).
- Masks dev-ttyS0.device and dev-vdb.device systemd units that
otherwise wait 90s for udev events that never fire in Firecracker
guests started from container rootfses.
first-boot.sh rewritten as universal init wrapper:
- Works as PID 1 (mounts essential filesystems) OR as a systemd
oneshot (existing behavior).
- Installs both systemd-sysv AND openssh-server (container images
have neither).
- Dispatch updated: debian, alpine, fedora, arch, opensuse families
+ ID_LIKE fallback. All tests updated.
Opencode capability skip for direct-boot images:
- The opencode readiness check (WaitReady on vsock port 4096) now
returns nil for images without an initrd, since pulled container
images don't ship the opencode service. Without this, the VM
would be marked as error for lacking an opinionated add-on.
Docs: README and kernel-catalog.md updated to recommend generic-6.12
as the default kernel for OCI-pulled images. AGENTS.md notes the new
build script.
Verified live:
- banger kernel pull generic-6.12
- banger image pull docker.io/library/debian:bookworm --kernel-ref generic-6.12
- banger vm create --image debian-bookworm --name testbox --nat
- banger vm ssh testbox -- "id; uname -r; systemctl is-active banger-vsock-agent"
→ uid=0(root), kernel 6.12.8, Debian bookworm, vsock-agent active,
sshd running, SSH working.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Manual publish flow for the kernel catalog, designed for the current
no-CI, private-repo state of banger.
scripts/publish-kernel.sh <name>:
- Reads $BANGER_KERNELS_DIR/<name>/ (the canonical layout produced by
`banger kernel import`).
- Pulls distro / arch / kernel_version from the local manifest.
- Packages vmlinux + optional initrd.img + optional modules/ as
<name>-<arch>.tar.zst with zstd -19.
- Computes sha256 + size.
- rclone copyto -> r2:banger-kernels/<file>.
- HEAD-checks https://kernels.thaloco.com/<file> to catch
public-access misconfig before declaring success.
- jq-patches internal/kernelcat/catalog.json: replaces any prior
entry with the same name, then sorts entries by name.
- Prints next-step git+make commands; does not commit or rebuild
automatically.
Environment overrides RCLONE_REMOTE / RCLONE_BUCKET / BASE_URL /
BANGER_KERNELS_DIR for non-default setups.
docs/kernel-catalog.md covers the architecture (embedded JSON +
external tarballs), end-user flow, the add/update/remove playbook,
naming and tarball-layout conventions, the trust model (sha256 in
embedded catalog catches transport/swap; no signing yet), and where
the bucket lives.
README.md gains a kernel-catalog example next to the existing image
register example. AGENTS.md points at publish-kernel.sh and the docs.
.gitignore now excludes .env so accidental drops of R2 credentials
don't follow into commits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`banger kernel import <name> --from <dir>` copies a staged kernel
bundle into the local catalog. <dir> is the output of
`make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/
or build/manual/alpine-kernel/).
kernelcat.DiscoverPaths locates artifacts under <dir>:
1. Prefers metadata.json (written by make-void-kernel.sh).
2. Falls back to globbing: boot/vmlinux-* or vmlinuz-* (Alpine
fallback), boot/initramfs-*, lib/modules/<latest>.
The daemon's KernelImport copies kernel + optional initrd via
system.CopyFilePreferClone and modules via system.CopyDirContents
(no-sudo mode — catalog lives under ~/.local/state), computes SHA256
over the kernel, and writes the manifest via kernelcat.WriteLocal.
While wiring this up, fixed a latent bug in system.CopyDirContents:
filepath.Join(sourceDir, ".") silently drops the trailing dot, so
`cp -a source source/contents target/` was copying the whole source
directory (including its basename) instead of just its contents.
Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts
(the only existing caller) silently benefits.
scripts/register-void-image.sh and scripts/register-alpine-image.sh
are rewritten to use `banger kernel import … && banger image register
--kernel-ref …` instead of the find-and-pass-paths dance. Preserves
the same user-facing commands and env vars.
Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz
fallback, kernel-missing error, round-trip copy into the catalog, and
the --from required flag.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Provisioning was still installing `claude` and `pi` through a separate
npm-global prefix even after the guest images had switched to `mise` for
Node and opencode. That left two competing install paths and made the
runtime layout harder to reason about.
Switch the Debian and Void image setup flows to install `claude` and `pi`
as `mise` npm tools, assert their shims exist after `mise reshim`, and
symlink `node`, `npm`, `opencode`, `claude`, and `pi` directly from the
mise shim directory into `/usr/local/bin`.
Update the imagebuild test expectations and bump the Void rootfs default
size to 4G so the larger default toolset still fits reliably.
Add daemon-backed workspace and guest-session primitives so host
orchestrators can prepare /root/repo, launch long-lived guest commands,
and attach to pipe-mode sessions over the local stdio mux bridge.
Persist richer session metadata and launch diagnostics, preflight guest
cwd/command requirements, make pipe-mode attach rehydratable from guest
state after daemon restart, and allow submodules when workspace prepare
runs in full_copy mode.
At the same time, stop vm run from auto-attaching opencode, make it
print next-step commands instead, and make glibc guest images more
agent-ready by installing node, opencode, claude, and pi while syncing
opencode/claude/pi auth files into work disks on VM start.
Validation:
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
- banger vm workspace prepare --help
- banger vm session --help
- banger vm session start --help
- banger vm session attach --help
Replace the old `void-exp` repository defaults with `void` so the Make targets,
registration helper, example config, verification messaging, and sample test
fixtures all line up with the new managed image name.
Keep the scope to repo-facing naming only: config overrides, helper output, and
test fixtures now expect `void`, while runtime compatibility for existing local
`void-exp` VMs remains an operational concern outside this commit.
Validation: go test ./..., make build, and a local `banger vm create --image void`
smoke boot with ssh and opencode ports up.
Replace the stale `RUNTIME_DIR` mkdir in the experimental Void kernel helper with
creation of the parent directory for `OUT_DIR`, which is the current
BANGER_MANUAL_DIR/custom --out-dir flow used by the Make target.
This restores `make void-kernel` without requiring an extra environment override.
Validation: make void-kernel ARGS='--out-dir /tmp/banger-void-kernel-verify-$$'.
Make vm create wait for the guest-side vsock /healthz endpoint instead of only waiting for the host socket path, so the wait_vsock_agent stage reflects actual guest readiness.
Start banger-vsock-agent earlier in the Alpine OpenRC graph and report later /ports failures as guest-service waits rather than vsock-agent waits, which makes the progress output match what the guest is really doing.
Validate with go test ./..., a rebuilt managed alpine image, and a fresh vm create --image alpine --name alp --nat that now progresses through wait_vsock_agent -> wait_guest_ready -> wait_opencode -> ready.
Stage a complete Alpine x86_64 image stack so \ --image alpineworks like the existing manual Void path instead of relying on Debian-oriented image builds.\n\nAdd make targets plus kernel/rootfs/register helpers that download pinned Alpine artifacts, extract a Firecracker-compatible vmlinux, build a matching mkinitfs initramfs, seed OpenRC services, and register/promote a managed image named alpine.\n\nFold in the bring-up fixes discovered during boot validation: use rootfstype=ext4 in shared boot args, install libgcc/libstdc++ for the opencode binary, and give opencode more time to become ready on cold boots.\n\nValidate with go test ./..., the Alpine helper builds, image promotion, and banger vm create --image alpine --name alp --nat plus guest service and port checks.
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
Separate tracked source from generated artifacts so the repo root stops accumulating helper scripts, manifests, and local runtime outputs.
Move manual shell entrypoints under scripts/, manifests under config/, and the Firecracker API reference under docs/reference/. Make build and runtimebundle now target build/bin, build/runtime, and build/dist as the canonical source-checkout paths.
Update runtime discovery, helper scripts, tests, and docs to follow the new layout while keeping legacy source-checkout runtime fallbacks for existing local bundles during migration.
Validated with bash -n on the moved scripts, make build, and GOCACHE=/tmp/banger-gocache go test ./....
Beat VM create wall time without changing VM semantics.
Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly.
Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks.
Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".