banger

Author	SHA1	Message	Date
Thales Maciel	f1b17f6f8e	install: surface ssh-config --install in next-steps blurb Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:57:26 -03:00
Thales Maciel	9ed44bfd75	port smoke to go	2026-05-01 19:34:44 -03:00
Thales Maciel	b9b3505e34	smoke: cover -d/--detach and bootstrap NAT precondition Two new pure scenarios: * detach_run: -d --rm and -d -- <cmd> combos rejected before VM creation; bare -d leaves the VM running and ssh-able afterward. * bootstrap_precondition: workspace with a .mise.toml is refused without --nat; --no-bootstrap bypasses the precondition and the run completes normally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 15:05:27 -03:00
Thales Maciel	2606bfbabb	update: VMs survive `banger update` and rollback Three load-bearing fixes that together let `banger update` (and its auto-rollback path) restart the helper + daemon without killing every running VM. New smoke scenarios prove the property end-to-end. Bug fixes: 1. Disable the firecracker SDK's signal-forwarding goroutine. The default ForwardSignals = [SIGINT, SIGQUIT, SIGTERM, SIGHUP, SIGABRT] installs a handler in the helper that propagates the helper's SIGTERM (sent by systemd on `systemctl stop bangerd- root.service`) to every running firecracker child. Set ForwardSignals to an empty (non-nil) slice so setupSignals short-circuits at len()==0. 2. Add SendSIGKILL=no to bangerd-root.service. KillMode=process limits the initial SIGTERM to the helper main, but systemd still SIGKILLs leftover cgroup processes during the FinalKillSignal stage unless SendSIGKILL=no. 3. Route restart-helper / restart-daemon / wait-daemon-ready failures through rollbackAndRestart instead of rollbackAndWrap. rollbackAndWrap restored .previous binaries but didn't re- restart the failed unit, leaving the helper dead with the rolled-back binary on disk after a failed update. Testing infrastructure (production binaries unaffected): - Hidden --manifest-url and --pubkey-file flags on `banger update` let the smoke harness redirect the updater at locally-built release artefacts. Marked Hidden in cobra; not advertised in --help. - FetchManifestFrom / VerifyBlobSignatureWithKey / FetchAndVerifySignatureWithKey export the existing logic against caller-supplied URL / pubkey. The default entry points still call them with the embedded canonical values. Smoke scenarios: - update_check: --check against fake manifest reports update available - update_to_unknown: --to v9.9.9 fails before any host mutation - update_no_root: refuses without sudo, install untouched - update_dry_run: stages + verifies, no swap, version unchanged - update_keeps_vm_alive: real swap to v0.smoke.0; same VM (same boot_id) answers SSH after the daemon restart - update_rollback_keeps_vm_alive: v0.smoke.broken-bangerd ships a bangerd that passes --check-migrations but exits 1 as the daemon. The post-swap `systemctl restart bangerd` fails, rollbackAndRestart fires, the .previous binaries are restored and re-restarted; the same VM still answers SSH afterwards - daemon_admin (separate prep): covers `banger daemon socket`, `bangerd --check-migrations --system`, `sudo banger daemon stop` The smoke release builder generates a fresh ECDSA P-256 keypair with openssl, signs SHA256SUMS cosign-compatibly, and serves artefacts from a backgrounded python http.server. verify_smoke_check_test.go pins the openssl/cosign signature equivalence so the smoke release builder can't silently drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 12:08:08 -03:00
Thales Maciel	93ba233a12	simplify post install instructions	2026-04-30 10:41:06 -03:00
Thales Maciel	596dc67556	install.sh: expand the pre-sudo summary beyond just networking The previous one-liner ("banger needs permission to manage network access for the VMs you launch") was honest but understated; banger also needs sudo for storage (rootfs snapshots, loop devices, image files), launching/stopping firecracker under jailer isolation, and installing binaries + systemd units. Spell those out as a short bulleted list at the moment of decision so the user is authorising a known scope rather than a euphemism. Wording stays plain-language — no capability names, no jargon — since the target audience may not know networking or container terminology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 17:25:16 -03:00
Thales Maciel	1004331c14	install.sh: drop --user, add BANGER_INSTALL_NONINTERACTIVE env var Surveyed the install scripts of comparable systemd-installing tools (Docker, k3s, Tailscale, Ollama, Determinate Systems Nix, flyctl): none of the daemon installers offer a --user staging mode, because the resulting install isn't useful — banger inherits that. The "--user just stages binaries you can't actually use yet" UX was a trap; remove it before users hit it. In its place, adopt the cross-tool convention for non-interactive runs: the BANGER_INSTALL_NONINTERACTIVE=1 env var is friendlier through a curl\|bash pipe than `bash -s -- --yes` because the env var can sit on the same line: curl -fsSL ...install.sh \| env BANGER_INSTALL_NONINTERACTIVE=1 bash The --yes flag still works for direct script invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:15:36 -03:00
Thales Maciel	3c29af55a2	Add curl\|bash installer + wire upload into publish script scripts/install.sh is the one-command installer end users run as curl -fsSL https://releases.thaloco.com/banger/install.sh \| bash Design choices: * Runs as the invoking user. All network work + signature verification happens unprivileged; sudo is only re-execed for the actual install step that writes to /usr/local and creates systemd units. * Right before the sudo prompt, the script prints a plain-language summary of exactly what's about to happen — the file paths it will create and a one-line "why sudo" — so the user authorises a known scope rather than the whole pipeline. Detail link in the docs. * Uses openssl (universally available) for signature verification, not cosign. cosign is needed only by the signer, never the verifier. * No jq dependency. The latest_stable field is extracted from the manifest with grep+sed, since the manifest shape is well-defined and we control it. * /dev/tty fallback for the confirmation prompt so it works through the curl\|bash pipe. * --yes for non-interactive CI use, --user for installing into ~/.local/bin without touching system paths, --version vX.Y.Z to pin. publish-banger-release.sh now uploads install.sh to the bucket root on every publish, so the curl URL is stable but the script logic matches the latest verified release. It also runs a key-drift check: if scripts/install.sh's embedded cosign public key differs from the one in internal/updater/verify_signature.go, publishing aborts. The two copies must stay in sync or one of them ends up rejecting every release. README's Quick start now leads with the installer one-liner and documents the audit-first variant alongside it; building from source moves below. Smoke-tested end to end against the live bucket with --user mode: manifest fetch → tarball download → cosign signature verify → hash verify → extract → install. The installed binary reports v0.1.0 at commit `6fdebd9`, matching the published artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 14:06:34 -03:00
Thales Maciel	6fdebd929e	publish-script: split RCLONE_BUCKET out of BUCKET_PATH The previous form passed rclone paths like releases:banger/v0.1.0/, which rclone parses as bucket=banger, key=v0.1.0/... — wrong, because the actual R2 bucket is named "releases" (BUCKET_PATH was meant as an in-bucket key prefix only). Uploads 403'd because the token has no view of a bucket called "banger". Introduce RCLONE_BUCKET as a separate env var (default: "releases") and route every rclone copy through ${RCLONE_REMOTE}:${RCLONE_BUCKET}/${BUCKET_PATH}. The public URLs in the manifest stay unchanged: BASE_URL is the bucket's public custom domain, so the bucket name is implicit there. The defaults now resolve to the live setup: rclone target: releases:releases/banger/<version>/<file> public URL: https://releases.thaloco.com/banger/<version>/<file> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:35:53 -03:00
Thales Maciel	12f7a92bb4	publish-script: don't clobber COSIGN_PASSWORD with empty default The previous form did COSIGN_PASSWORD="${COSIGN_PASSWORD:-}" cosign sign-blob ... which set COSIGN_PASSWORD to "" when the caller hadn't exported one. cosign sees an explicit empty password and tries to decrypt with it instead of prompting interactively, so any real password-protected offline key fails with "decryption failed". Drop the prefix entirely. If COSIGN_PASSWORD is already in env, it gets inherited normally; if it isn't, cosign prompts on the terminal — which is the right UX for a maintainer running the publish script locally with the offline private key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:27:23 -03:00
Thales Maciel	3d748b87c8	publish-script: fix pubkey extraction and cosign v3 compatibility Two bugs found while dry-running the publish flow end-to-end: 1. The awk pipeline that pulled BangerReleasePublicKey out of verify_signature.go didn't strip Go's raw-string-literal wrapping (`var ... = ` + backtick on the BEGIN line, trailing backtick on the END line). The "verify against embedded pub key" step thus compared sigs against a malformed PEM. Replaced with a sed pair that yields a clean PEM block byte-identical to cosign.pub. 2. cosign v3.x defaults sign-blob to a new bundle format and pushes signatures to Rekor; both are incompatible with banger's "embedded pub key, raw ASN.1 DER signature" trust model. Add --use-signing-config=false / --tlog-upload=false / --new-bundle-format=false to opt out, and --insecure-ignore-tlog on verify-blob. These flags also work on cosign v2.x, so the script is forward- and backward-compatible across the v2→v3 boundary. Validated by an end-to-end dry-run on this machine: built binaries, tarred, sha256summed, cosign-signed, verified against the embedded pub key, then re-verified through internal/updater's crypto/ecdsa.VerifyASN1 path — all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 13:23:09 -03:00
Thales Maciel	fae28e3d8b	update: docs + publish script for the self-update feature README gets a top-level Updating section; docs/privileges.md gains a step-by-step trust-model writeup of `banger update`. The new scripts/publish-banger-release.sh drives the manual release cut: build, tar, sha256sum, cosign sign-blob, verify against the embedded public key, jq-merge into manifest.json, rclone upload to the R2 bucket. Refuses outright if the embedded key is still the placeholder so we can't accidentally publish an unverifiable release. Also folds in gofmt drift accumulated across the updater package and a few sibling files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 12:43:46 -03:00
Thales Maciel	777b597a1e	smoke: smol VMs by default + JOBS auto-detects nproc Three quality-of-life improvements now that the daemon-side races that gated parallel mode are fixed: 1. Smol VMs by default. Smoke installs a tuned config.toml at /etc/banger/config.toml between `system install` and `system restart` so the respawned daemon picks up: vcpu = 2 memory_mib = 1024 disk_size = "2G" system_overlay_size = "2G" Smoke scenarios assert behavior, not capacity — they don't need 4 vCPU / 8 GiB / 8 GiB / 8 GiB. Per-VM RAM cost drops from 8 GiB to 1 GiB; nominal disk drops from 16 GiB to 4 GiB (sparse, so actual use is small either way, but the new ceiling is gentler on hosts that can't overcommit). Scenarios that test reconfiguration (vm_set's --vcpu 2 → 4) still pass --vcpu explicitly, so this default doesn't perturb their assertions. 2. JOBS defaults to nproc. The Makefile resolves JOBS to `$(shell nproc)` if unset; the smoke script's existing cap of 8 keeps the parallel pool sane on bigger hosts. The script always passes --jobs N now, so behavior is consistent. Override with `make smoke JOBS=1` for a fully serial run. 3. Help text catches up. --help no longer flags parallelism as experimental (the underlying daemon races are fixed) and now describes the small-VM default. `make help` mentions the new default and how to override. Verified: `make smoke` (no JOBS) on a 32-core box auto-runs with JOBS=8, smol VMs, 21/21 PASS in 172s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:36:17 -03:00
Thales Maciel	72882e45d7	daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem from `vm.create` paths assuming single-caller semantics: 1. Kernel auto-pull manifest race. Parallel `vm.create` calls that each need to auto-pull the same kernel ref both run kernelcat.Fetch in parallel against the same /var/lib/banger/kernels/<name>/. Fetch writes manifest.json non-atomically (truncate + write); the peer reads it back mid-write and trips "parse manifest for X: unexpected end of JSON input". Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock). `KernelPull` and `readOrAutoPullKernel` both acquire it and re-check `kernelcat.ReadLocal` after the lock so a peer who finished while we waited is treated as success — `readOrAutoPullKernel` does NOT call `s.KernelPull` because that path errors with "already pulled" on a peer-success, which would be wrong for auto-pull. Different kernels stay parallel. 2. Image auto-pull race. Same shape as the kernel race but on the image side: parallel `vm.create` calls both run pullFromBundle / pullFromOCI for the missing image (each ~minutes of OCI fetch + ext4 build). The publishImage atom under imageOpsMu only protects the rename + UpsertImage commit, so the loser does all the work only to fail at the recheck with "image already exists". Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock). `findOrAutoPullImage` acquires it, re-checks FindImage, and only then calls PullImage. Loser short-circuits with the freshly-published image instead of redoing minutes of work. PullImage's own publishImage recheck stays as defense-in-depth for callers that bypass the auto-pull path. 3. Work-seed refresh race. When the host's SSH key has rotated since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk` triggers `refreshManagedWorkSeedFingerprint`, which rewrote the shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create` calls doing parallel `MaterializeWorkDisk` rdumps observed a torn ext4 image — "Superblock checksum does not match superblock". Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`) and atomic-rename. Concurrent readers either have the file open (kernel keeps the pre-rename inode alive) or open after the rename (see the new inode) — never observe a partial state. Two parallel refreshes are idempotent (same daemon, same SSH key) so unique tmp names are enough; whichever rename lands last wins, with identical content. UpsertImage runs after the rename so the recorded fingerprint always matches what's on disk. Plus one smoke harness fix: reclassify `vm_prune` from `pure` to `global`. `vm prune -f` removes ALL stopped VMs system-wide, not just the ones the scenario created — so a parallel peer scenario that happens to have its VM in `created`/`stopped` momentarily gets wiped. Moving prune to the post-pool serial phase keeps it from racing with in-flight scenarios. After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s (serial baseline 141s; the small overhead is the buffered-output and `wait -n` semaphore cost — well worth the parallelism for fast-iter work on a 32-core box). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:24:11 -03:00
Thales Maciel	115eec8576	smoke: discoverable scenarios + selectable runs + parallel dispatch `scripts/smoke.sh` was a 600-line linear script: no way to see what it covers without reading the whole thing, and no way to run a single scenario when iterating. Every iteration paid the full ~5-10 min suite, which made fast feedback loops painful enough to avoid the suite. Refactor into a registry + per-scenario functions: - Top-of-file SMOKE_SCENARIOS (ordered) + SMOKE_DESCS (one-line desc per scenario) + SMOKE_CLASS (pure / repodir / global) drive both listing and dispatch. The 21 existing scenario blocks become scenario_<name> functions. Bodies are the inline blocks verbatim, modulo the workspace fixture move described below. - New CLI: --list (cheap discovery, no install / no env-vars), --scenario NAME (or NAME,NAME,...), --jobs N (parallel dispatch), -h / --help. - New setup_fixtures runs once after the install/doctor/restart preamble and produces the throwaway git repo at $repodir that 'repodir'-class scenarios consume. Lifted out of scenario_workspace_run so single- scenario invocations (e.g. --scenario workspace_dryrun) get the fixture even when the scenario that historically built it isn't selected. - Wipe ~/.local/state/banger/ssh/known_hosts in the install preamble. `system uninstall --purge` clears /var/lib/banger but the user-side known_hosts persists by design — and smoke creates VMs that reuse guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a leftover entry trips StrictHostKeyChecking and the daemon's wait- for-ssh sees only timeouts. This was the real cause of the "guest ssh did not come up" flakes that surface across smoke iterations. Parallel dispatch: - --jobs N opts into a slot-limited pool: 'pure' scenarios fan out as individual jobs; 'repodir' scenarios fuse into a single serial chain (since they mutate $repodir in registry order); 'global' scenarios run serially after the pool, one at a time. - Cap is min(N, 8) — each parallel slot runs an 8 GiB VM, so RAM is the binding constraint. - Parallel-mode stdout/stderr per scenario buffer to per-scenario logs and emit one PASS/FAIL line on completion; on FAIL the buffer is dumped. Serial mode (--jobs 1, the default) keeps stdout unbuffered exactly as before. - Parallelism is documented as experimental in --help: it surfaces real daemon-side concurrency bugs (image auto-pull manifest race, work-seed-refresh race on the shared work-seed.ext4) that don't appear in serial mode and that need their own fix in the daemon. Serial (--jobs 1) is the reliable path; --jobs N is for fast- iteration dev work where occasional re-runs are acceptable. Exit codes: 0 ok, 1 assertion failed, 2 usage error (unknown scenario, missing SCENARIO=), 77 explicit selection skipped (NAT when sudo iptables is unavailable AND nat is the only selected scenario; soft-skip otherwise). Makefile additions: - `make smoke-list` — cheap discovery, no smoke-build dep, no env vars. - `make smoke-one SCENARIO=name` — single-scenario run, full preamble. MAKECMDGOALS guard catches missing SCENARIO= before any rebuild. - `make smoke JOBS=N` — passes through to the script's --jobs N. - Help text covers all three. Verified: serial full suite passes 21/21 in ~140s on this host; make smoke-one SCENARIO=workspace_restart runs the recently-added regression test alone in ~50s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 16:56:57 -03:00
Thales Maciel	c9358ab390	daemon: sync guest over ssh before stop to preserve workspace writes VM stop has been quietly losing data freshly written via `vm workspace prepare`: stop+start of a workspace-prepared VM would come back with /root/repo wiped on the work disk. Root cause is firecracker + Debian's systemd defaults. FC's SendCtrlAltDel (the only "graceful shutdown" action FC exposes) just delivers the keystroke; what the guest does with it is its choice. Debian routes ctrl-alt-del.target -> reboot.target, so the guest reboots, FC stays alive, the daemon's 10s wait_for_exit window expires, and the SIGKILL fallback drops anything still in FC's userspace I/O path. For an idle VM that's invisible. For one that just took 100s of small writes through a workspace prepare, it's data loss. Fix is to dial the guest over SSH inside StopVM and run `sync; systemctl --no-block poweroff \|\| /sbin/poweroff -f &` before the existing SendCtrlAltDel path. The synchronous `sync` is the load-bearing piece — it blocks until every dirty page hits virtio-blk and lands in the on-host root.ext4. Whether poweroff completes before SIGKILL fires is incidental; sync has already run. SSH unreachable falls back to the old SendCtrlAltDel behaviour so a broken-network guest can't make stop hang. Bounded by a 5s SSH-dial timeout so a half-broken guest can't extend the overall stop window past gracefulShutdownWait. Also adds two smoke scenarios: - `workspace + stop/start`: prepare -> stop -> start -> assert marker survives. This is the regression that caught the bug. - `vm exec`: end-to-end coverage for `d59425a` — auto-cd into the prepared workspace, exit-code propagation, dirty-host warning, --auto-prepare resync, refusal on stopped VM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:41:32 -03:00
Thales Maciel	71a332a6a1	cli: maturity polish — color, error translation, tabwriter consistency Adds three small but high-leverage presentation tweaks for v0.1: 1. internal/cli/style is a new ~70 LOC package with Pass/Fail/Warn/ Dim/Bold helpers. Each is TTY-gated and obeys NO_COLOR. No external dep. Wired into the doctor PASS/FAIL/WARN status, the "banger:" error prefix on stderr, and the dim 'ready in <elapsed>' line. 2. internal/cli/errors translates rpc.ErrorResponse into user-facing text. operation_failed becomes invisible (the message wins); not_found, already_exists, bad_request, bad_version, unauthorized, unknown_method get short labels; unknown codes pass through. The daemon-attached op_id lands in dim parens — paste into journalctl --grep to find the daemon log line that produced the failure. 3. Tabwriter config converges on (0, 8, 2, ' ', 0) across every list/table command. The vm prune confirmation table picked up the right config; system install + system status switched from bare "key: value\n" lines to tabular form. printVMSpecLine drops its Unicode middle dot for an ASCII '\|' so terminals without UTF-8 render cleanly. Tests cover translateRPCError for every code, style helpers no-op on non-TTY and under NO_COLOR. Smoke status greps switch from "key: value" to "key value" to match the new format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:27:07 -03:00
Thales Maciel	59e48e830b	daemon: split owner daemon from root helper Move the supported systemd path to two services: an owner-user bangerd for orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop, and Firecracker ownership. This removes repeated sudo from daily vm and image flows without leaving the general daemon running as root. Add install metadata, system install/status/restart/uninstall commands, and a system-owned runtime layout. Keep user SSH/config material in the owner home, lock file_sync to the owner home, and move daemon known_hosts handling out of the old root-owned control path. Route privileged lifecycle steps through typed privilegedOps calls, harden the two systemd units, and rewrite smoke plus docs around the supported service model. Verified with make build, make test, make lint, and make smoke on the supported systemd host path.	2026-04-26 12:43:17 -03:00
Thales Maciel	caa6a2b996	model: validate VM names as DNS labels at CLI + daemon A VM name flows into five places that all have narrower grammars than "arbitrary string": - the guest's /etc/hostname (vm_disk.patchRootOverlay) - the guest's /etc/hosts (same) - the <name>.vm DNS record (vmdns.RecordName) - the kernel command line (system.BuildBootArgs*) - VM-dir file-path fragments (layout.VMsDir/<id>, etc.) Nothing in the chain was validating the input. A name with whitespace, newline, dot, slash, colon, or = would produce broken hostnames, weird DNS labels, smuggled kernel cmdline tokens, or (in the worst case) surprising traversal through the on-disk layout. Not host shell injection — we already avoid shelling out with the raw name — but a real correctness and supportability bug. New: model.ValidateVMName. Rules: - 1..63 chars (DNS label max per RFC 1123; also a comfortable /etc/hostname cap) - lowercase ASCII letters, digits, '-' only - no leading or trailing '-' - no normalization — the name is the user-visible identifier (store key, `ssh <name>.vm`, `vm show`); silently rewriting "MyVM" → "myvm" would hand the user back something different than they typed Called from two places: - internal/cli/commands_vm.go vmCreateParamsFromFlags — rejects bad `--name` values before any RPC. Empty name still passes through so the daemon can generate one. - internal/daemon/vm_create.go reserveVM — defense in depth for any non-CLI RPC caller (SDK, direct JSON over the socket). Tests: - internal/model/vm_name_test.go — exhaustive character-class matrix (space, newline, tab, dot, slash, colon, equals, quote, control chars, unicode letters, uppercase, leading/trailing hyphen, over-length, max-length-exact, digits-only). - internal/cli TestVMCreateParamsFromFlagsRejectsInvalidName — CLI wire-through + empty-name passthrough. - internal/daemon TestReserveVMRejectsInvalidName — daemon defense-in-depth (including `box/../evil` path-traversal). - scripts/smoke.sh — end-to-end rejection + no-leaked-row assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:06:40 -03:00
Thales Maciel	235758e5b2	workspace: drop --readonly flag — advisory only against root guests --readonly ran `chmod -R a-w` over the workspace after copying, but every banger guest boots as root, and root bypasses DAC mode checks. So a user running `vm workspace prepare ... --readonly` got the mode bits set to 0444 but `echo x >> file` in the guest still succeeded. The flag promised enforcement it couldn't deliver. The feature also doesn't match the product model: workspaces are prepared precisely so the guest CAN edit them, and `workspace export` exists to pull those edits back as a patch. A "read-only workspace" contradicts that loop. Removed: - CLI flag `--readonly` on `vm workspace prepare` - api.VMWorkspacePrepareParams.ReadOnly field - model.WorkspacePrepareResult.ReadOnly field - daemon chmod dispatch in prepareVMWorkspaceGuestIO - smoke scenario pinning the (advisory) mode-bit behavior - misleading "exportbox-readonly" VM name in an unrelated export test (the test is about not mutating the real git index; renamed to exportbox-noindex-mutation) If real enforcement becomes a user need later, the right primitive is `chattr +i` (immutable bit — root CAN'T write) or a ro bind-mount. Reintroducing a new flag is cheaper than debugging what the current one actually guarantees. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:04:33 -03:00
Thales Maciel	bafe816fc7	smoke: cover the gaps — NAT, vm ports/restart/kill/prune, workspace variants, ssh-config Audit of banger's advertised CLI surface vs. what smoke was exercising turned up several gaps where a regression would have shipped silently. New scenarios: - NAT: asserts the per-VM POSTROUTING MASQUERADE rule is installed with --nat (scoped to the guest /32), idempotent across stop/start, and torn down on delete. End-to-end curl tests don't work here because the bridge IP and uplink IP both belong to the host — a guest reaching the uplink lands on host-local loopback whether MASQUERADE is set up or not — so the test pins the iptables rule itself. Skipped if passwordless `sudo iptables` isn't available. - vm ports: sshd :22 must be visible with the <name>.vm endpoint (not localhost, not the raw guest IP — the daemon prefers the DNS record when one exists). - vm restart: dedicated verb, not a stop+start alias. Asserts a fresh boot_id to prove the kernel actually recycled. - vm kill --signal KILL: forceful termination path (distinct from `vm stop`'s graceful Ctrl-Alt-Del). Post-kill state must be 'stopped' (not 'error') and the dm-snapshot must be cleaned up. - vm prune -f: batch delete of non-running VMs while preserving any that are still running. Regression guard for the case where prune could wipe a live session. - workspace prepare --readonly: mode bits on /root/repo/<file> must drop all write bits. Enforcement is advisory against a root guest, so the test asserts the bits, not EACCES. - workspace prepare --mode full_copy: alternate transfer path (tarred into rootfs, no overlay) still lands the repo contents at /root/repo. - workspace export --base-commit: guest-side commits captured in the patch when the pre-commit SHA is pinned. The feature's whole reason for existing; it had zero coverage. Includes a control assertion that the plain (no --base-commit) export does NOT see the committed file. - ssh-config --install / --uninstall: HOME-isolated to a smoke tempdir so we don't touch the invoking user's ~/.ssh/config. Seeds a pre-existing config to catch any regression where install clobbers instead of appending. Asserts idempotency (second install doesn't duplicate the Include line) and clean round-trip (uninstall leaves the user's own content intact). Coverage deltas from smoke (vs the last run): internal/hostnat 14.1% → 64.1% (+50pp — NAT rule dance) internal/daemon/opstate 56.2% → 87.5% (+31pp) internal/daemon 43.4% → 49.4% (+6pp) internal/cli 36.1% → 40.4% (+4pp) internal/daemon/workspace 64.1% → 67.5% (+3pp) Scenario count: 12 → 21. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 12:50:39 -03:00
Thales Maciel	b4afe13b2a	daemon: fix vm start (on a stopped VM) + regression coverage Two defects compounded to make `vm create X` → `vm stop X` → `vm start X` → `vm ssh X` fail with `not_running: vm X is not running` even though `vm show` reports `state=running`. 1. firecracker-go-sdk's startVMM spawns a goroutine that SIGTERMs firecracker when the ctx passed to Machine.Start cancels — and retains that ctx for the lifetime of the VMM, not just the boot phase. Our Machine.Start wrapper was plumbing the caller's ctx through, which on `vm.start` is the RPC request ctx. daemon.go's handleConn cancels reqCtx via `defer cancel()` right after writing the response. Net effect: firecracker is killed ~150ms after the `vm start` RPC "completes", invisibly, and the next `vm ssh` sees a dead PID. `vm.create` side-stepped the bug because BeginVMCreate detaches to context.Background() before calling startVMLocked; `vm.start` used the RPC ctx directly. Fix: Machine.Start now passes context.Background() to the SDK. We own firecracker lifecycle explicitly (StopVM / KillVM / cleanupRuntime), so ctx-driven cancellation here was never actually wired into anything useful. 2. With (1) fixed, the same scenario exposed a second defect: patchRootOverlay's e2cp/e2rm refuses to touch the dm-snapshot with "Inode bitmap checksum does not match bitmap" on a restart, because the COW holds stale free-block/free-inode counters from the previous guest boot. Kernel ext4 is fine with this; e2fsprogs is not. Fix: run `e2fsck -fy` on the snapshot between the dm_snapshot and patch_root_overlay stages. Idempotent on a fresh snapshot, reconciles the bitmaps on a reused COW. Regression coverage: - scripts/repro-restart-bug.sh — minimal create→stop→start→ssh reproducer with rich on-failure diagnostics (daemon log trace, firecracker.log tail, handles.json, pgrep-by-apiSock, apiSock stat). Exits non-zero if the bug returns. - scripts/smoke.sh — lifecycle scenario (create/ssh/stop/start/ ssh/delete) and vm-set scenario (--vcpu 2 → stop → set --vcpu 4 → start → assert nproc=4). Both were pulled when the bug was first found; now restored. Supporting: - internal/system/system.ExitCode — extracts exec.ExitError's code without forcing callers to import os/exec. Needed by the e2fsck caller (policy test pins os/exec to the shell-out packages). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 12:01:46 -03:00
Thales Maciel	e94e7c4dcc	smoke: workspace export scenario + smoke-fresh target + fix the export bug it caught The export round-trip (`vm create` → `workspace prepare` → guest edit → `workspace export`) exposed a reproducible failure on Debian bookworm guests: `git read-tree HEAD --index-output=/tmp/...` returns exit 128 "unable to write new index file" when the target lives on tmpfs while `.git` is on the workspace overlay. Move the temp index into `$(git rev-parse --git-dir)` so it shares a filesystem with `.git/index` and the lockfile + rename + hardlink dance git does internally works. Alongside: - new workspace-export smoke scenario that would have caught this at the boundary between daemon and guest git - `make smoke-fresh` = `smoke-clean && smoke` for release-time runs that want first-install paths (migrations, image pull) stamped into the coverage report Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 11:34:55 -03:00
Thales Maciel	672d7151e9	smoke: five more scenarios + fix exit-code propagation bug the new ones caught Five new smoke scenarios layered on top of the existing bare + workspace vm-run tests: - exit-code propagation: `sh -c 'exit 42'` must rc=42 - workspace dry-run: --dry-run lists tracked files without a VM - workspace --include-untracked: opt-in ships files outside the git index (regression guard on the security-default flip from review 4) - concurrent vm runs: two --rm invocations in parallel both succeed (stresses per-VM locks, createVMMu reservation window, tap pool) - invalid spec rejection: --vcpu 0 must fail with no VM row left behind (the "cleanup on partial failure" path the review flagged) The exit-code scenario caught a real bug on first run: `banger vm run --rm -- sh -c 'exit 42'` returned rc=0, not 42. Root cause in internal/cli/ssh.go's sshCommandArgs: extra args were appended to the ssh argv verbatim, relying on ssh(1)'s implicit space-join to deliver the remote command. That works for single tokens (echo hello) but re-tokenises multi-word commands on the remote side: `ssh host sh -c 'exit 42'` becomes remote `sh -c exit 42`, where `42` is $0 for the already-completed `exit`, and the exit code the user asked for is lost. Fix: shell-quote every element of extra (`'sh'` `'-c'` `'exit 42'`) and join them into a single trailing argv entry. ssh's space-join then produces exactly the command the user typed on the remote shell. TestSSHCommandArgs was updated to pin the quoting; the existing TestRunVMRunCommandModePropagatesExitCode test needed a one-word quote tweak (`false` → `'false'`). Smoke run after fix passes all seven scenarios in ~2 min on warm state. cmd/banger coverage jumped to 100% (the invalid-spec scenario hits the error-reporting path that wasn't covered before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 19:37:07 -03:00
Thales Maciel	5f81332b0a	make smoke: end-to-end boot suite with coverage from real VM runs The unit + integration tests can't cross machine.Start — the SDK boundary would need a fake firecracker that reimplements the control-plane HTTP API, and the ongoing maintenance cost of keeping that fake honest with upstream kills the value. Instead, add a pre-release smoke target that drives REAL Firecracker + real KVM, captures coverage from the -cover-instrumented binaries, and surfaces per-package deltas so regressions in the boot path don't ship silently. scripts/smoke.sh: - Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't touch real user state (state/cache persist under build/smoke/xdg for fast reruns; runtime is mktemp'd fresh per-run because sockets can't be reused) - Preflight: `banger doctor` must pass; UDP :42069 must be free (otherwise the user's real daemon is up and the smoke daemon can't bind its DNS listener — fail with an actionable message) - Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok` exercises create → start → socket ownership chown → machine.Start → SDK waitForSocket race → vsock agent readiness → guest SSH wait → exec → cleanup → delete - Scenario 2 — workspace: creates a throwaway git repo, runs `banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`, verifies the tracked file reached the guest (exercises workDisk capability PrepareHost + workspace.prepare) - `banger daemon stop` at the end so instrumented binaries flush GOCOVERDIR pods before the script exits Makefile additions: - smoke-build: builds banger/bangerd under build/smoke/bin/ with `go build -cover` - smoke: runs the script with GOCOVERDIR set, reports per-package coverage via `go tool covdata percent` - smoke-coverage-html: textfmt + go tool cover for a browsable report - smoke-clean: nukes build/smoke/ including the persisted XDG state Bonus fix uncovered during the first smoke run: doctor treated a missing state.db as a FAIL ("out of memory" from SQLite SQLITE_CANTOPEN), which red-flagged every fresh install. Split the store check: DB file absent → PASS with "will be created on first daemon start" detail; DB present but unreadable → FAIL as before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall pins the behaviour. Concrete coverage delta from the first successful smoke run (compared to `make coverage-total`'s unit-test-only 37.8%): internal/firecracker 43.6% → 75.0% internal/daemon/workspace 33.8% → 60.8% internal/store 40.1% → 56.3% internal/guest 63.7% → 57.4% (different mix: smoke exercises real SSH; unit tests cover more error branches) The packages the review flagged are the ones that moved most — which is the point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:59:57 -03:00
Thales Maciel	25a1466947	supply chain: verify signatures and pins across image + kernel builds Three independent hardenings, addressing a review finding that the kernel and image build pipelines were relying on HTTPS alone for artifact integrity. scripts/make-generic-kernel.sh - Fetch the detached PGP signature (linux-<ver>.tar.sign) alongside the tarball and verify it with gpg before extraction. An isolated $GNUPGHOME under the tempdir keeps the kernel signers out of the invoking user's keyring. - Import the three kernel.org release signing keys (Greg KH / Linus / Sasha Levin) from keyserver.ubuntu.com, falling back to keys.openpgp.org. Ubuntu comes first because keys.openpgp.org strips unverified UIDs on upload, leaving gpg with UID-less keys it refuses to trust. - Require VALIDSIG (cryptographic proof) rather than GOODSIG (printed even for expired keys) before proceeding. Verified end-to-end against a clean tarball (accepts) and a byte-flipped tampered copy (rejects with BADSIG). - gpg + gpgv + xz added to the required-tools check. images/golden/Dockerfile - Pin Docker's apt signing key by fingerprint. After downloading /etc/apt/keyrings/docker.asc we gpg --show-keys --with-colons it, extract the fpr, and compare against the expected 9DC858229FC7DD38854AE2D88D81803C0EBFCD88. A tampered or swapped key aborts the build before any apt repo metadata is fetched. - Replace `curl https://mise.run \| sh` with a pinned GitHub release binary (mise v2026.4.18, linux-x64) verified against its published sha256. Refuses to build on unknown architectures rather than silently installing a binary we have no hash for. - Add gnupg to the ESSENTIAL apt-get install so the fingerprint check has gpg available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 19:38:13 -03:00
Thales Maciel	afe91e805a	drop unused bench-create script + Makefile target The script carried a python3 dep for one json.dumps on a VM name that's always alphanumeric-plus-dashes anyway, it was never wired into CI or docs, and `time banger vm create` covers the same need ad hoc when anyone wants to measure create latency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:33:09 -03:00
Thales Maciel	6083e2dde5	Prune legacy void/alpine + customize.sh flows The golden-image Dockerfile + catalog pipeline replaces the entire manual rootfs-build stack. With that shipped, the per-distro shell flows are dead code. Removed: - scripts/customize.sh, scripts/interactive.sh, scripts/verify.sh - scripts/make-rootfs{,-void,-alpine}.sh - scripts/register-{void,alpine}-image.sh - scripts/make-{void,alpine}-kernel.sh - internal/imagepreset/ (only consumer was `banger internal packages`, which fed customize.sh) - examples/{void,alpine}.config.toml - Makefile targets: rootfs, rootfs-void, rootfs-alpine, void-kernel, alpine-kernel, void-register, alpine-register, void-vm, alpine-vm, verify-void, verify-alpine, plus the ALPINE_RELEASE / _IMAGE_NAME / _VM_NAME variables The void-6.12 kernel catalog entry is also gone — golden image pairs with generic-6.12 and nothing else in the catalog depended on it. Consolidated: imagemgr now holds the small DebianBasePackages list + package-hash helper inline, so the `image build --from-image` flow (still supported) no longer pulls from a separate imagepreset package. Net: 3,815 lines deleted, 59 added. No runtime functionality removed beyond the `banger internal packages` subcommand (hidden, used only by the deleted customize.sh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:39:53 -03:00
Thales Maciel	75baf2e415	publish-golden-image: content-addressed tarball names Embed the sha256 prefix in the uploaded filename so every rebuild lives at a unique URL. Cloudflare's edge cache (and any similar CDN in front of R2) can never serve stale bytes for the URL the catalog points at. The R2 console offers no per-URL purge for this bucket layout, so making the URL itself content-addressed is the only durable fix. Also republishes the debian-bookworm catalog entry with the new filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:26:57 -03:00
Thales Maciel	ab5627aec2	imagecat: publish debian-bookworm golden image First entry in the image catalog. Verified end-to-end: - https://images.thaloco.com/debian-bookworm-x86_64.tar.zst reachable - sha256 071495e6... matches - bundle unpacks to rootfs.ext4 (4 GiB) + manifest.json with the expected name/distro/arch/kernel_ref. publish-golden-image.sh tweaks: - default RCLONE_REMOTE from 'r2' to 'banger-images' (matches the rclone config actually in use here). - rclone copyto now passes --s3-no-check-bucket and --no-check-dest so scoped R2 tokens without HeadBucket/HeadObject permission still upload cleanly. To use: restart bangerd so it picks up the new embedded catalog, then `banger image pull debian-bookworm`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:25:42 -03:00
Thales Maciel	d22d05555c	scripts: bundle-based golden image pipeline Replaces the OCI-push flow with a bundle-based one that mirrors the kernel catalog (publish-kernel.sh / kernelcat). - scripts/make-golden-bundle.sh: docker build → docker create → docker export \| banger internal make-bundle → .tar.zst. Defaults target debian-bookworm / generic-6.12 / x86_64; pinned --size 4G to leave headroom for first-boot installs and in-VM apt use. - scripts/publish-golden-image.sh: rewritten to call make-golden-bundle, rclone upload to R2 (banger-images bucket, images.thaloco.com), and jq-patch internal/imagecat/catalog.json with URL / sha256 / size. --skip-upload stops after bundle build and copies to dist/. make-bundle default ext4 sizing also bumped from +25% to +50% headroom (mkfs.ext4 needs room for inode tables, block-group metadata, journal, and the default 5% reserved-blocks margin). The old 25% was too tight for the ~950 MB golden rootfs and aborted with "Could not allocate block". End-to-end smoke (local): golden Dockerfile → 286 MB tar.zst bundle with correct manifest, valid ext4, and all banger units + vsock agent present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:38:04 -03:00
Thales Maciel	da471b0640	Golden image Dockerfile + local build script Debian bookworm with two clearly-labeled sections: - ESSENTIAL: systemd, openssh-server, ca-certificates, curl, iproute2. - OPINION: git, jq, ripgrep, fd, build-essential, shellcheck, mise, Docker CE (+ Compose v2 + buildx), tmux, htop, and friends. Per-VM identity stripped at build time: /etc/machine-id cleared, SSH host keys removed with a ssh.service drop-in that runs `ssh-keygen -A` on first start so each VM gets a unique set. The script is a parameterized wrapper around `docker build`; it also supports `--push` to an OCI registry, which will be removed once the bundle pipeline is in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:11:40 -03:00
Thales Maciel	8f4be112c2	Generic kernel + init= boot path for OCI-pulled images Closes the full arc: banger kernel pull + image pull + vm create + vm ssh now works end-to-end against docker.io/library/debian:bookworm with zero manual image building. Generic kernel: - New scripts/make-generic-kernel.sh builds vmlinux from upstream kernel.org sources using Firecracker's official minimal config (configs/firecracker-x86_64-6.1.config). All critical drivers (virtio_blk, virtio_net, ext4, vsock) compiled in — no modules, no initramfs needed. - Published as generic-6.12 in the catalog (kernels.thaloco.com). - catalog.json updated with the new entry. Direct-boot init= override (vm_lifecycle.go): - For images without an initrd (direct-boot / OCI-pulled), banger now passes init=/usr/local/libexec/banger-first-boot on the kernel cmdline. The script runs as PID 1, mounts /proc /sys /dev /run, checks for systemd — if present execs it immediately; if not (container images), installs systemd-sysv + openssh-server via the guest's package manager, then execs systemd. - Also passes kernel-level ip= parameter via BuildBootArgsWithKernelIP so the kernel configures the network interface before init runs (container images don't ship iproute2, so the userspace bootstrap script can't call ip(8)). - Masks dev-ttyS0.device and dev-vdb.device systemd units that otherwise wait 90s for udev events that never fire in Firecracker guests started from container rootfses. first-boot.sh rewritten as universal init wrapper: - Works as PID 1 (mounts essential filesystems) OR as a systemd oneshot (existing behavior). - Installs both systemd-sysv AND openssh-server (container images have neither). - Dispatch updated: debian, alpine, fedora, arch, opensuse families + ID_LIKE fallback. All tests updated. Opencode capability skip for direct-boot images: - The opencode readiness check (WaitReady on vsock port 4096) now returns nil for images without an initrd, since pulled container images don't ship the opencode service. Without this, the VM would be marked as error for lacking an opinionated add-on. Docs: README and kernel-catalog.md updated to recommend generic-6.12 as the default kernel for OCI-pulled images. AGENTS.md notes the new build script. Verified live: - banger kernel pull generic-6.12 - banger image pull docker.io/library/debian:bookworm --kernel-ref generic-6.12 - banger vm create --image debian-bookworm --name testbox --nat - banger vm ssh testbox -- "id; uname -r; systemctl is-active banger-vsock-agent" → uid=0(root), kernel 6.12.8, Debian bookworm, vsock-agent active, sshd running, SSH working. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 20:12:56 -03:00
Thales Maciel	fa95849f5a	Phase 5: kernel catalog publish flow + docs Manual publish flow for the kernel catalog, designed for the current no-CI, private-repo state of banger. scripts/publish-kernel.sh <name>: - Reads $BANGER_KERNELS_DIR/<name>/ (the canonical layout produced by `banger kernel import`). - Pulls distro / arch / kernel_version from the local manifest. - Packages vmlinux + optional initrd.img + optional modules/ as <name>-<arch>.tar.zst with zstd -19. - Computes sha256 + size. - rclone copyto -> r2:banger-kernels/<file>. - HEAD-checks https://kernels.thaloco.com/<file> to catch public-access misconfig before declaring success. - jq-patches internal/kernelcat/catalog.json: replaces any prior entry with the same name, then sorts entries by name. - Prints next-step git+make commands; does not commit or rebuild automatically. Environment overrides RCLONE_REMOTE / RCLONE_BUCKET / BASE_URL / BANGER_KERNELS_DIR for non-default setups. docs/kernel-catalog.md covers the architecture (embedded JSON + external tarballs), end-user flow, the add/update/remove playbook, naming and tarball-layout conventions, the trust model (sha256 in embedded catalog catches transport/swap; no signing yet), and where the bucket lives. README.md gains a kernel-catalog example next to the existing image register example. AGENTS.md points at publish-kernel.sh and the docs. .gitignore now excludes .env so accidental drops of R2 credentials don't follow into commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 15:56:56 -03:00
Thales Maciel	7192ba24ae	Phase 3: banger kernel import bridges make--kernel.sh output `banger kernel import <name> --from <dir>` copies a staged kernel bundle into the local catalog. <dir> is the output of `make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/ or build/manual/alpine-kernel/). kernelcat.DiscoverPaths locates artifacts under <dir>: 1. Prefers metadata.json (written by make-void-kernel.sh). 2. Falls back to globbing: boot/vmlinux- or vmlinuz-* (Alpine fallback), boot/initramfs-*, lib/modules/<latest>. The daemon's KernelImport copies kernel + optional initrd via system.CopyFilePreferClone and modules via system.CopyDirContents (no-sudo mode — catalog lives under ~/.local/state), computes SHA256 over the kernel, and writes the manifest via kernelcat.WriteLocal. While wiring this up, fixed a latent bug in system.CopyDirContents: filepath.Join(sourceDir, ".") silently drops the trailing dot, so `cp -a source source/contents target/` was copying the whole source directory (including its basename) instead of just its contents. Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts (the only existing caller) silently benefits. scripts/register-void-image.sh and scripts/register-alpine-image.sh are rewritten to use `banger kernel import … && banger image register --kernel-ref …` instead of the find-and-pass-paths dance. Preserves the same user-facing commands and env vars. Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz fallback, kernel-missing error, round-trip copy into the catalog, and the --from required flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:53:49 -03:00
Thales Maciel	797a9de1ce	Install claude and pi through mise Provisioning was still installing `claude` and `pi` through a separate npm-global prefix even after the guest images had switched to `mise` for Node and opencode. That left two competing install paths and made the runtime layout harder to reason about. Switch the Debian and Void image setup flows to install `claude` and `pi` as `mise` npm tools, assert their shims exist after `mise reshim`, and symlink `node`, `npm`, `opencode`, `claude`, and `pi` directly from the mise shim directory into `/usr/local/bin`. Update the imagebuild test expectations and bump the Void rootfs default size to 4G so the larger default toolset still fits reliably.	2026-04-13 18:29:02 -03:00
Thales Maciel	37c4c091ec	Add guest sessions and agent VM defaults Add daemon-backed workspace and guest-session primitives so host orchestrators can prepare /root/repo, launch long-lived guest commands, and attach to pipe-mode sessions over the local stdio mux bridge. Persist richer session metadata and launch diagnostics, preflight guest cwd/command requirements, make pipe-mode attach rehydratable from guest state after daemon restart, and allow submodules when workspace prepare runs in full_copy mode. At the same time, stop vm run from auto-attaching opencode, make it print next-step commands instead, and make glibc guest images more agent-ready by installing node, opencode, claude, and pi while syncing opencode/claude/pi auth files into work disks on VM start. Validation: - GOCACHE=/tmp/banger-gocache go test ./... - make build - banger vm workspace prepare --help - banger vm session --help - banger vm session start --help - banger vm session attach --help	2026-04-12 23:48:42 -03:00
Thales Maciel	497e6dca3d	Rename experimental Void image to void Replace the old `void-exp` repository defaults with `void` so the Make targets, registration helper, example config, verification messaging, and sample test fixtures all line up with the new managed image name. Keep the scope to repo-facing naming only: config overrides, helper output, and test fixtures now expect `void`, while runtime compatibility for existing local `void-exp` VMs remains an operational concern outside this commit. Validation: go test ./..., make build, and a local `banger vm create --image void` smoke boot with ssh and opencode ports up.	2026-04-01 20:15:28 -03:00
Thales Maciel	70bc6d07d0	Fix void-kernel output directory setup Replace the stale `RUNTIME_DIR` mkdir in the experimental Void kernel helper with creation of the parent directory for `OUT_DIR`, which is the current BANGER_MANUAL_DIR/custom --out-dir flow used by the Make target. This restores `make void-kernel` without requiring an extra environment override. Validation: make void-kernel ARGS='--out-dir /tmp/banger-void-kernel-verify-$$'.	2026-04-01 19:42:30 -03:00
Thales Maciel	092d848620	Wait for real guest vsock health before opencode Make vm create wait for the guest-side vsock /healthz endpoint instead of only waiting for the host socket path, so the wait_vsock_agent stage reflects actual guest readiness. Start banger-vsock-agent earlier in the Alpine OpenRC graph and report later /ports failures as guest-service waits rather than vsock-agent waits, which makes the progress output match what the guest is really doing. Validate with go test ./..., a rebuilt managed alpine image, and a fresh vm create --image alpine --name alp --nat that now progresses through wait_vsock_agent -> wait_guest_ready -> wait_opencode -> ready.	2026-03-21 21:14:22 -03:00
Thales Maciel	a166068fab	Add an experimental Alpine image flow Stage a complete Alpine x86_64 image stack so \ --image alpineworks like the existing manual Void path instead of relying on Debian-oriented image builds.\n\nAdd make targets plus kernel/rootfs/register helpers that download pinned Alpine artifacts, extract a Firecracker-compatible vmlinux, build a matching mkinitfs initramfs, seed OpenRC services, and register/promote a managed image named alpine.\n\nFold in the bring-up fixes discovered during boot validation: use rootfstype=ext4 in shared boot args, install libgcc/libstdc++ for the opencode binary, and give opencode more time to become ready on cold boots.\n\nValidate with go test ./..., the Alpine helper builds, image promotion, and banger vm create --image alpine --name alp --nat plus guest service and port checks.	2026-03-21 20:25:55 -03:00
Thales Maciel	572bf32424	Remove runtime-bundle image dependencies Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.	2026-03-21 18:34:53 -03:00
Thales Maciel	01c7cb5e65	Reorganize the source checkout layout Separate tracked source from generated artifacts so the repo root stops accumulating helper scripts, manifests, and local runtime outputs. Move manual shell entrypoints under scripts/, manifests under config/, and the Firecracker API reference under docs/reference/. Make build and runtimebundle now target build/bin, build/runtime, and build/dist as the canonical source-checkout paths. Update runtime discovery, helper scripts, tests, and docs to follow the new layout while keeping legacy source-checkout runtime fallbacks for existing local bundles during migration. Validated with bash -n on the moved scripts, make build, and GOCACHE=/tmp/banger-gocache go test ./....	2026-03-21 17:22:57 -03:00
Thales Maciel	c8d9a122f9	Speed up VM create with work seeds Beat VM create wall time without changing VM semantics. Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly. Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks. Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".	2026-03-18 21:22:12 -03:00

44 commits