diff --git a/AGENTS.md b/AGENTS.md index 8050f32..5e15ebf 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,7 +25,6 @@ Always run `make build` before commit. - `./build/bin/banger image promote ` copies an unmanaged image into daemon-owned managed artifacts. - `scripts/make-generic-kernel.sh` builds a Firecracker-optimized vmlinux from upstream sources. `scripts/publish-kernel.sh ` publishes it to the kernel catalog. - `scripts/publish-golden-image.sh` rebuilds + publishes the golden image bundle and patches the image catalog. -- `scripts/publish-banger-release.sh ` cuts a banger release. Full runbook in `docs/release-process.md`. ## Image Model diff --git a/CHANGELOG.md b/CHANGELOG.md index e706114..e753034 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,193 +10,6 @@ changed between versions. ## [Unreleased] -## [v0.1.10] - 2026-05-03 - -### Added - -- README now includes an animated demo GIF showing the typical - sandbox lifecycle (`vm run`, host-side `ssh demo.vm`, stop/start - with file persistence, `vm exec`, `curl http://demo.vm`). The - recording script lives at `assets/demo.tape` and is rendered with - [VHS](https://github.com/charmbracelet/vhs). - -## [v0.1.9] - 2026-05-01 - -### Fixed - -- `vm exec` no longer falls back to `cd /root/repo` on VMs that have - no recorded workspace. Previously, running `vm exec` against a plain - VM (one that never had `vm workspace prepare` / `vm run ./repo`) - blew up with `cd: /root/repo: No such file or directory` — surfaced - via the login shell's mise activate hook because `bash -lc` sources - profile.d before the explicit cd. Now the auto-cd only fires when - the user passes `--guest-path` or the VM actually has a workspace - recorded; otherwise the command runs from root's home. Mise wrapping - is unchanged — without a `.mise.toml` it's a no-op. - -### Changed - -- `vm exec --guest-path` default in `--help` now reads "from last - workspace prepare; otherwise root's home" (was "or /root/repo"). - Anyone who relied on the implicit `/root/repo` default for a VM that - has a repo there but no workspace record must now pass - `--guest-path /root/repo` explicitly. - -### Notes - -- Internal: smoke-test harness ported from `scripts/smoke.sh` to a - Go test suite under `internal/smoketest`. `make smoke` is unchanged - for maintainers; no user-visible effect. - -## [v0.1.8] - 2026-05-01 - -### Fixed - -- `.vm` resolution from the host (NSS path: curl, ssh hostname, - etc.) now works on systemd-resolved hosts. The root helper's - `validateResolverAddr` was rejecting the `host:port` form - (`127.0.0.1:42069`) that banger constructs to point resolved at the - in-process DNS server, so the auto-wire silently failed at every - daemon startup. `dig @127.0.0.1` worked because that bypasses NSS; - any tool going through glibc's resolver chain didn't. -- Validator now accepts both bare IPs and `IP:port` (matching what - `resolvectl dns` itself accepts) with new test coverage for the - port'd form. - -### Notes - -- Existing v0.1.x installs that already booted with the broken - validator have stale per-link resolved state. After updating to - v0.1.8, run `sudo banger system restart` once to re-trigger the - auto-wire, or restart the host. systemd-resolved restarts also - wipe per-link state — banger restores it on its own daemon - startup but won't re-run for an already-running daemon. - -## [v0.1.7] - 2026-05-01 - -### Added - -- `vm run -d` / `--detach` creates the VM, runs workspace prep + tooling - bootstrap, then exits without attaching to ssh. Reconnect later with - `banger vm ssh `. The combos `-d --rm` and `-d -- ` are - rejected before VM creation. -- `vm run --no-bootstrap` skips the mise tooling install entirely; useful - when a workspace has a `.mise.toml` you don't want banger to act on. -- `banger doctor --verbose` / `-v` prints every check with details. - Without it, doctor's default output now collapses (see Changed). - -### Changed - -- **`vm run` refuses early when bootstrap can't succeed.** Previously, a - workspace containing `.mise.toml` or `.tool-versions` without `--nat` - set silently failed the bootstrap into a log file and dropped you into - ssh with tools missing. It now refuses before VM creation with - `tooling bootstrap requires --nat (or pass --no-bootstrap to skip)`. - Existing scripts that relied on the silent-failure path will need to - add `--nat` or `--no-bootstrap`. -- **`banger doctor` default output is now compact.** A healthy host - collapses to a single line (`all N checks passed`); failing or warning - checks print only the affected entries plus a summary footer - (`N passed, M warnings, K failures`). Pass `--verbose` for the full - per-check output. Anything parsing the previous always-verbose output - needs to switch to `doctor --verbose`. - -### Fixed - -- The detached bootstrap path runs synchronously (foreground, tee'd to - the existing log file) so the CLI only returns once installs finish. - Interactive mode keeps today's nohup'd background behaviour so the ssh - session starts promptly. - -## [v0.1.6] - 2026-04-29 - -### Fixed - -- v0.1.4's "running VMs survive daemon restart" fix was incomplete: - the binary-level reconcile path was correct, but `/run/banger` (the - daemon's runtime dir) was being wiped on every daemon stop because - systemd defaults to `RuntimeDirectoryPreserve=no`. The api-sock - symlinks the helper had created for live VMs vanished with it, - and `findByJailerPidfile` couldn't resolve them to find the chroot - + pidfile. v0.1.6 sets `RuntimeDirectoryPreserve=yes` on both - unit templates so the symlinks (and helper RPC sock) survive - the restart window. Live-verified: FC PID and guest boot_id both - unchanged across a full helper+daemon restart cycle with a VM - running. -- v0.1.4's CHANGELOG correction stands: existing v0.1.x installs - (where x < 6) need a one-time `sudo banger system install` after - updating to v0.1.6 to pick up both the new `KillMode=process` and - the new `RuntimeDirectoryPreserve=yes` directives. `banger update` - swaps binaries, not unit files. - -## [v0.1.5] - 2026-04-29 - -No functional changes. Verification release for v0.1.4: the previous -release shipped the running-VMs-survive-update fix, but updating -*to* v0.1.4 from v0.1.3 used v0.1.3's buggy driver, so the fix -couldn't be verified live in that direction. v0.1.5 exists so a -host on v0.1.4 can update to it and observe a running VM survive -end-to-end with v0.1.4 in the driver seat. - -## [v0.1.4] - 2026-04-29 - -### Fixed - -- Daemon restarts no longer kill running VMs. Two changes together: - - The `bangerd-root.service` and `bangerd.service` unit templates - now set `KillMode=process`. The default (`control-group`) sent - SIGKILL to every process in the unit's cgroup on stop/restart, - including the jailer-spawned firecracker children — fork/exec - doesn't escape a systemd cgroup. With `KillMode=process` only - the unit's main PID is signalled; firecracker children survive. - - `fcproc.FindPID` now also looks up jailer'd firecracker - processes via the pidfile jailer writes at - `/firecracker.pid` (sibling of the api-sock target). - Previously the only lookup path was `pgrep -n -f `, - which can't see jailer'd processes because their cmdline only - carries the chroot-relative `--api-sock /firecracker.socket`. - Reconcile after a daemon restart now correctly re-attaches to - surviving guests instead of mistaking them for stale and tearing - down their dm-snapshot. - -### Notes - -- v0.1.0's CHANGELOG line "daemon restarts do not interrupt running - guests" was wrong: it was true at the systemd cgroup layer in - theory but the default `KillMode` defeated it, and even with - `KillMode=process` the daemon's reconcile would mistake - surviving FCs for stale and tear them down. v0.1.4 is the version - where this actually works end-to-end. -- Updating from v0.1.0–v0.1.3 to v0.1.4 still kills running VMs - because the *driver* of the update is the buggy older binary. - Updates from v0.1.4 onward preserve running VMs across the - helper+daemon restart that `banger update` performs. -- Existing v0.1.0–v0.1.3 installs that update to v0.1.4 do NOT - automatically pick up the new unit files — `banger update` swaps - binaries, not systemd units. Run `sudo banger system install` once - on those hosts after updating to refresh the units. New v0.1.4+ - installs get the correct units from the start. - -## [v0.1.3] - 2026-04-29 - -No functional changes. Verification release: v0.1.2 fixed -`banger update`'s install.toml handling, but the fix only takes -effect when v0.1.2 (or later) is the driver of an update. v0.1.3 -exists so a host running v0.1.2 can update to it and confirm the -fix works end-to-end with the new code in the driver seat. - -## [v0.1.2] - 2026-04-29 - -### Fixed - -- `banger update` now writes the freshly-installed binary's commit - and built_at fields to `/etc/banger/install.toml`, not the running - CLI's. Previously install.toml's `version` was correct after an - update but `commit` + `built_at` still pointed at the pre-update - binary's identity, which made `banger doctor` raise a false-positive - "CLI/install drift" warning on every update. Caught by the v0.1.0 - → v0.1.1 live update smoke-test. - ## [v0.1.1] - 2026-04-29 ### Added @@ -312,15 +125,6 @@ root filesystem and network, and exits on demand. the swap rather than starting up against an incompatible store. - Linux only. amd64 only. KVM required. -[Unreleased]: https://git.thaloco.com/thaloco/banger/compare/v0.1.10...HEAD -[v0.1.10]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.10 -[v0.1.9]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.9 -[v0.1.8]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.8 -[v0.1.7]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.7 -[v0.1.6]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.6 -[v0.1.5]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.5 -[v0.1.4]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.4 -[v0.1.3]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.3 -[v0.1.2]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.2 +[Unreleased]: https://git.thaloco.com/thaloco/banger/compare/v0.1.1...HEAD [v0.1.1]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.1 [v0.1.0]: https://git.thaloco.com/thaloco/banger/releases/tag/v0.1.0 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index ec83255..0000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,62 +0,0 @@ -# Contributing - -## Build from source - -```bash -make build -sudo ./build/bin/banger system install --owner "$USER" -``` - -`make build` produces three binaries under `./build/bin/`: - -- `banger` — the user-facing CLI -- `bangerd` — the owner-user daemon (exposes `/run/banger/bangerd.sock`) -- `banger-vsock-agent` — the in-guest companion - -`system install` copies them into `/usr/local`, writes install -metadata under `/etc/banger`, lays down `bangerd.service` and -`bangerd-root.service`, and starts both. After that, daily commands -like `banger vm run` are unprivileged. - -To inspect or refresh the services: - -```bash -banger system status -sudo banger system restart -``` - -The two-service split (owner daemon + privileged root helper) is -explained in [`docs/privileges.md`](docs/privileges.md), including -the exact capability set the root helper holds. - -## Tests - -```bash -make test # go test ./... -make coverage # per-package + total statement coverage -make lint # gofmt + go vet + shellcheck -``` - -The smoke suite (`make smoke`) builds coverage-instrumented binaries, -installs them as a temporary systemd service, and runs end-to-end -scenarios against real Firecracker. Requires a KVM-capable host and -`sudo`. The suite lives under `internal/smoketest/` (build-tagged -`smoke`); `make smoke-list` prints scenario names; `make smoke-one -SCENARIO=` runs just one (comma-separated for several). See -the smoke comments in the `Makefile` for details. - -## Pre-commit hook - -```bash -make install-hooks -``` - -Points `core.hooksPath` at `.githooks/`, which runs lint + test + -build on every commit. Bypass with `git commit --no-verify`; revert -with `git config --unset core.hooksPath`. - -## Internals - -- [`docs/privileges.md`](docs/privileges.md) — daemon split, capability set, trust model. -- [`docs/release-process.md`](docs/release-process.md) — cutting and signing a release. -- [`AGENTS.md`](AGENTS.md) — repo-wide notes for code agents. diff --git a/Makefile b/Makefile index 640f615..780f87b 100644 --- a/Makefile +++ b/Makefile @@ -25,6 +25,7 @@ SMOKE_DIR := $(BUILD_DIR)/smoke SMOKE_BIN_DIR := $(SMOKE_DIR)/bin SMOKE_COVER_DIR := $(SMOKE_DIR)/covdata SMOKE_XDG_DIR := $(SMOKE_DIR)/xdg +SMOKE_SCRIPT := scripts/smoke.sh VERSION ?= $(shell git describe --tags --exact-match 2>/dev/null || echo dev) COMMIT ?= $(shell git rev-parse --verify HEAD 2>/dev/null || echo unknown) BUILT_AT ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ) @@ -60,9 +61,9 @@ help: ' make tidy Run go mod tidy' \ ' make clean Remove built Go binaries and coverage artefacts' \ ' make smoke Build instrumented binaries, run the supported systemd smoke suite, report coverage (needs KVM + sudo)' \ - ' make smoke JOBS=N Override parallelism (default: nproc, capped at 8). JOBS=1 forces serial.' \ - ' make smoke-list Print the list of smoke scenarios (no build, no install)' \ - ' make smoke-one SCENARIO=NAME Run a single smoke scenario (still does the install preamble; comma-separated for several)' \ + ' make smoke JOBS=N Override parallelism (default: nproc, capped at 8 by the script). JOBS=1 forces serial.' \ + ' make smoke-list Print the list of smoke scenarios with descriptions (no build, no install)' \ + ' make smoke-one SCENARIO=NAME Run a single smoke scenario (still does the install preamble)' \ ' make smoke-fresh smoke-clean + smoke — purges stale smoke-owned installs before a clean supported-path run' \ ' make smoke-coverage-html HTML coverage report from the last smoke run' \ ' make smoke-clean Remove the smoke build tree and purge any stale smoke-owned system install' \ @@ -163,17 +164,17 @@ clean: # Smoke test suite. Builds the three banger binaries with -cover # instrumentation under $(SMOKE_BIN_DIR), installs them as temporary -# bangerd.service + bangerd-root.service, runs the Go scenarios under -# internal/smoketest (built with -tags=smoke), copies service covdata -# out of /var/lib/banger, then purges the smoke-owned install on exit. +# bangerd.service + bangerd-root.service, runs scripts/smoke.sh, copies +# service covdata out of /var/lib/banger, then purges the smoke-owned +# install on exit. # -# This touches global systemd state. The harness refuses to overwrite a -# pre-existing non-smoke install and drops a marker file under -# /etc/banger so `make smoke-clean` can recover a stale smoke-owned -# install after an interrupted run. +# Unlike the old per-user daemon path, this touches global systemd +# state. The smoke script refuses to overwrite a pre-existing non-smoke +# install and uses a marker file so `make smoke-clean` can recover a +# stale smoke-owned install after an interrupted run. # # Requires a KVM-capable Linux host with sudo. This is a pre-release -# gate, not CI — the Go unit suite (`make test`) is what runs everywhere. +# gate, not CI — the Go test suite is what runs everywhere. smoke-build: $(SMOKE_BIN_DIR)/.built $(SMOKE_BIN_DIR)/.built: $(BUILD_INPUTS) go.mod go.sum @@ -183,11 +184,10 @@ $(SMOKE_BIN_DIR)/.built: $(BUILD_INPUTS) go.mod go.sum CGO_ENABLED=0 GOOS=linux GOARCH=amd64 $(GO) build -ldflags '$(GO_LDFLAGS)' -o "$(SMOKE_BIN_DIR)/banger-vsock-agent" ./cmd/banger-vsock-agent touch "$@" -# JOBS defaults to nproc; SMOKE_JOBS clamps it at 8. Each parallel slot -# runs a smoke-tuned VM, and over-subscribing the host pushes -# waitForSSH past its 60s deadline. Floored at 1 so JOBS=1 still works. +# JOBS defaults to nproc (the script caps at 8). Override with +# `make smoke JOBS=1` for a fully serial run, or any specific N for +# tighter parallelism. JOBS ?= $(shell nproc 2>/dev/null || echo 1) -SMOKE_JOBS := $(shell n=$(JOBS); [ $$n -lt 1 ] && n=1; [ $$n -gt 8 ] && n=8; echo $$n) smoke: smoke-build rm -rf "$(SMOKE_COVER_DIR)" @@ -195,31 +195,27 @@ smoke: smoke-build BANGER_SMOKE_BIN_DIR="$(abspath $(SMOKE_BIN_DIR))" \ BANGER_SMOKE_COVER_DIR="$(abspath $(SMOKE_COVER_DIR))" \ BANGER_SMOKE_XDG_DIR="$(abspath $(SMOKE_XDG_DIR))" \ - $(GO) test -tags=smoke -count=1 -v -parallel $(SMOKE_JOBS) -timeout 30m ./internal/smoketest + bash "$(SMOKE_SCRIPT)" --jobs $(JOBS) @echo '' @echo 'Smoke coverage:' @$(GO) tool covdata percent -i="$(SMOKE_COVER_DIR)" -# smoke-list parses the test scaffold for scenario names. Cheap: no -# smoke-build dep, no env vars, no test binary spawned. +# smoke-list is intentionally cheap: no smoke-build dep, no env vars. +# The script's --list path short-circuits before any side-effect or +# env validation, so this works on a fresh checkout. smoke-list: - @grep -oE 't\.Run\("[a-z_]+", *test[A-Za-z]+\)' internal/smoketest/smoke_test.go \ - | sed -E 's/t\.Run\("([a-z_]+)".*/ \1/' - -# smoke-one runs one scenario (or a comma-separated list) with the -# install preamble. Comma list becomes a regex alternation so multiple -# scenarios can be selected without invoking go test by hand. -SCENARIO_PATTERN := $(shell echo '$(SCENARIO)' | tr ',' '|') + @bash "$(SMOKE_SCRIPT)" --list +# smoke-one runs one scenario (or a comma-separated list) with the same +# install preamble as the full suite. Useful when iterating on a specific +# scenario — see `make smoke-list` for names. smoke-one: smoke-build rm -rf "$(SMOKE_COVER_DIR)" mkdir -p "$(SMOKE_COVER_DIR)" "$(SMOKE_XDG_DIR)" BANGER_SMOKE_BIN_DIR="$(abspath $(SMOKE_BIN_DIR))" \ BANGER_SMOKE_COVER_DIR="$(abspath $(SMOKE_COVER_DIR))" \ BANGER_SMOKE_XDG_DIR="$(abspath $(SMOKE_XDG_DIR))" \ - $(GO) test -tags=smoke -count=1 -v -timeout 30m \ - -run "TestSmoke/.*/($(SCENARIO_PATTERN))$$" \ - ./internal/smoketest + bash "$(SMOKE_SCRIPT)" --scenario "$(SCENARIO)" smoke-coverage-html: smoke $(GO) tool covdata textfmt -i="$(SMOKE_COVER_DIR)" -o="$(SMOKE_DIR)/cover.out" diff --git a/README.md b/README.md index ab2a8e6..57c6eb6 100644 --- a/README.md +++ b/README.md @@ -2,171 +2,347 @@ One-command development sandboxes on Firecracker microVMs. -![banger demo](assets/banger.gif) - -Spin up a clean Linux VM with your repo and tooling preloaded, drop -into ssh, and tear it down — all from one command. banger is built -for the dev loop, not the server use case: guests are short-lived, -single-user, reachable at `.vm` from your host, and disposable. +**Requirements:** Linux + KVM (`/dev/kvm`), `firecracker` on PATH (or `firecracker_bin` in config). banger v0.1.0 is tested against [Firecracker v1.14.1](https://github.com/firecracker-microvm/firecracker/releases/tag/v1.14.1) and supports any Firecracker ≥ v1.5.0. `banger doctor` warns when the installed version sits outside the tested range, and prints a distro-aware install hint when it's missing. ## Quick start -**Requirements**: -- Linux x86_64 with KVM -- Systemd -- [Firecracker >= v1.5](https://github.com/firecracker-microvm/firecracker) - -Install: - ```bash curl -fsSL https://releases.thaloco.com/banger/install.sh | bash +banger vm run --name sandbox ``` -The installer downloads the signed release, then prompts for sudo for install. -[Read more about how banger uses sudo](#Security) - -Verify host configuration: -```bash -banger doctor -``` - -First VM: ->The first run may take a couple minutes for the bundle download. ->Subsequent `vm run`s are expected to take from 1 to 3 seconds. +The installer runs as you, downloads + verifies the latest signed +release, then prompts before re-execing `sudo` for the system-install +step (writing `/usr/local/bin` + creating systemd units). If you'd +rather audit the script first: ```bash -banger vm run --name my-vm +curl -fsSL https://releases.thaloco.com/banger/install.sh -o install.sh +less install.sh +bash install.sh ``` -This auto-pulls the default image and drops you into an interactive ssh session. -Disconnecting an interactive session leaves the VM running, -`--rm` auto-deletes the VM when the session or command exits. +Or build from source: + +```bash +make build +sudo ./build/bin/banger system install --owner "$USER" +banger vm run --name sandbox +``` + +That's it. `banger vm run` auto-pulls the default golden image (a pre-built +Debian rootfs with sshd, mise, and the usual dev tools: Debian bookworm with +systemd, sshd, Docker CE, git, jq, and mise) and kernel, creates a VM, starts +it, and drops you into an interactive ssh session. First run takes a couple minutes (bundle +download); subsequent `vm run`s are seconds. + +## Supported host path + +banger's supported host/runtime path is: + +- Linux on `x86_64 / amd64` +- `systemd` as the host init/service manager +- `bangerd.service` running as the installed owner user +- `bangerd-root.service` running as the privileged host helper + +Other setups may work with manual adaptation, but they are not the +supported operating model for this repo. + +## Requirements + +- **x86_64 / amd64 Linux** — arm64 is not supported today. The companion + binaries, the published kernel catalog, and the OCI import path all + assume `linux/amd64`. `banger doctor` surfaces this as a failing + check on other architectures. +- **systemd on the host** — this is the supported service-management + path. banger's supported install/run model is the owner-user + `bangerd.service` plus the privileged `bangerd-root.service` + installed by `banger system install`. +- `/dev/kvm` +- `sudo` for the install/admin commands (`system install`, + `system restart`, `system uninstall`) +- Firecracker on `PATH`, or `firecracker_bin` set in config +- host tools checked by `banger doctor` + +## Build + install + +```bash +make build +sudo ./build/bin/banger system install --owner "$USER" +``` + +This installs two systemd units, copies the current `banger`, +`bangerd`, and `banger-vsock-agent` binaries into `/usr/local`, writes +install metadata under `/etc/banger`, and starts both services: + +- `bangerd.service` runs as the configured owner user and exposes the + public CLI socket at `/run/banger/bangerd.sock`. +- `bangerd-root.service` runs as root and handles the narrow set of + privileged host operations over the private helper socket at + `/run/banger-root/bangerd-root.sock`. + +After that, normal daily commands such as `banger vm run` and +`banger image pull` are unprivileged. + +This `systemd` service flow is the supported path. If you're not on a +host that can run both services, you're outside the supported host +model even if some pieces happen to work. + +The split matters: + +- `bangerd.service` runs as the owner user, keeps its writable state in + `/var/lib/banger`, `/var/cache/banger`, and `/run/banger`, and sees + the owner home read-only. +- `bangerd-root.service` is the only process that keeps elevated host + capabilities, and that capability set is limited to the host-kernel + primitives banger actually uses (`CAP_CHOWN`, `CAP_DAC_OVERRIDE`, + `CAP_FOWNER`, `CAP_KILL`, `CAP_MKNOD`, `CAP_NET_ADMIN`, `CAP_NET_RAW`, + `CAP_SETGID`, `CAP_SETUID`, `CAP_SYS_ADMIN`, `CAP_SYS_CHROOT`). + +To inspect or refresh the services: + +```bash +banger system status +sudo banger system restart +``` + +To remove the system services: + +```bash +sudo banger system uninstall +``` + +Add `--purge` if you also want to remove system-owned VM/image/cache +state under `/var/lib/banger`, `/var/cache/banger`, `/run/banger`, and +`/run/banger-root`. User config stays in place under your home +directory: + +- `~/.config/banger/` — config, optional `ssh_config` +- `~/.local/state/banger/ssh/` — user SSH key + known_hosts + +### Shell completion + +`banger` ships completion scripts for bash, zsh, fish, and +powershell. Tab-completion covers subcommands, flags, and live +resource names (VM, image, kernel) looked up from the installed +services. With the services down, resource completion silently +returns nothing — no file-completion fallback. + +```bash +# bash (system-wide) +banger completion bash | sudo tee /etc/bash_completion.d/banger + +# zsh (user-local; ~/.zfunc must be on fpath) +banger completion zsh > ~/.zfunc/_banger + +# fish +banger completion fish > ~/.config/fish/completions/banger.fish +``` + +`banger completion --help` shows the shell-specific loading +recipes. ## `vm run` +One command, four common shapes: + ```bash -banger vm run ./my-repo # copy /my-repo into /root/repo — drops into ssh +banger vm run # bare sandbox — drops into ssh +banger vm run ./repo # workspace at /root/repo — drops into ssh banger vm run ./repo -- make test # workspace + run command, exits with its status banger vm run --rm -- script.sh # ephemeral: VM is deleted on exit -banger vm run -d ./repo --nat # detached: prep + bootstrap, exit (no ssh attach) ``` -If a repository is passed, banger copies your repo's git-tracked files -into `/root/repo` and runs a `mise` bootstrap from `.mise.toml` / -`.tool-versions` if either is present. The bootstrap reaches the -public internet, so workspaces with mise manifests require `--nat`; -pass `--no-bootstrap` to skip the install entirely. Untracked files -are skipped by default — pass `--include-untracked` to ship them -too, or `--dry-run` to preview the file list. +- **Bare mode** gives you a clean shell. +- **Workspace mode** (path given) copies the repo's git-tracked files + into `/root/repo` and kicks off a best-effort `mise` tooling + bootstrap from the repo's `.mise.toml` / `.tool-versions`. Log: + `/root/.cache/banger/vm-run-tooling-.log`. Untracked files + (including local `.env`, scratch notes, credentials that aren't + gitignored) are skipped by default — pass `--include-untracked` to + also ship them. Pass `--dry-run` to print the exact file list and + exit without creating a VM. +- **Command mode** (`-- `) runs the command in the guest; exit + code propagates through `banger`. -In **command mode** (`-- `), the exit code propagates through -`banger`. In **detached mode** (`-d`), banger creates the VM, runs -workspace prep + bootstrap synchronously, then exits — no ssh -attach. Reconnect later with `banger vm ssh `. +Disconnecting from an interactive session leaves the VM running. Use +`vm stop` / `vm delete` to clean up — or pass `--rm` so the VM +auto-deletes once the session / command exits. -### Other VM verbs +`--branch`, `--from`, `--include-untracked`, and `--dry-run` apply +only to workspace mode. `--rm` skips the delete when the initial ssh +wait times out, so a wedged sshd leaves the VM alive for `banger vm +logs` inspection. -The CLI tries to feel familiar — every command and subcommand has -`--help`. Beyond `vm run`: `vm list` shows running VMs (`--all` for -every state), `vm ssh ` reconnects to one, `vm exec -- -` runs a command without a shell, `vm stop` / `vm kill` shut a -VM down (graceful / hard), `vm delete` removes a stopped one, and -`vm prune` sweeps every non-running VM. +## Hostnames: reaching `.vm` -### `--nat`: outbound internet - -By default, a guest can't reach the internet. -Pass `--nat` to enable it (host-side MASQUERADE): - -```bash -banger vm run --nat ./repo -- npm install -``` - -`--nat` works on `vm run` and `vm create`. To toggle on an existing -VM: `banger vm set --nat ` (or `--no-nat` to remove it). - -## Hostnames: `.vm` - -banger's daemon runs a DNS server for the `.vm` zone. With host-side -DNS routing, `curl http://sandbox.vm:3000` works from anywhere on -the host — no IP juggling. On systemd-resolved hosts, banger wires -this up automatically; everywhere else there's a manual recipe in +banger's owner daemon runs a DNS server for the `.vm` zone. With +host-side DNS routing you can `curl http://sandbox.vm:3000` from +anywhere on the host — no copy-pasting guest IPs. On +systemd-resolved hosts the owner daemon asks the root helper to +auto-wire this and that is the supported path. Everywhere else +there's a best-effort manual recipe. See [`docs/dns-routing.md`](docs/dns-routing.md). -For `ssh sandbox.vm` (instead of `banger vm ssh sandbox`): +### Optional: `ssh .vm` shortcut + +`banger vm ssh ` works out of the box. If you'd also like plain +`ssh sandbox.vm` from any terminal (using banger's key + known_hosts), +opt in: ```bash -banger ssh-config --install +banger ssh-config --install # adds `Include ~/.config/banger/ssh_config` + # to ~/.ssh/config in a marker-fenced block +banger ssh-config --uninstall # reverse it +banger ssh-config # show the include line to paste manually ``` -That adds a marker-fenced `Include` line to `~/.ssh/config`. -`banger ssh-config --uninstall` reverses it. +banger never touches `~/.ssh/config` on its own — the daemon keeps its +own known_hosts under `/var/lib/banger/ssh/known_hosts`, while +`banger ssh-config` keeps the user-facing file fresh at +`~/.config/banger/ssh_config`; whether and how it's +pulled into your SSH config is up to you. + +## Image catalog + +`banger image pull ` fetches a pre-built bundle from the +embedded catalog. `vm run` calls this for you on demand. + +Today's catalog: + +| Name | What it is | +|------|-----------| +| `debian-bookworm` | Debian 12 slim + sshd + docker + dev tools | + +See [`docs/image-catalog.md`](docs/image-catalog.md) for the bundle +format and how to publish a new entry. ## Config -`~/.config/banger/config.toml`. All keys are optional: +Config lives at `~/.config/banger/config.toml`. All keys optional. + +Most commonly set: + +- `default_image_name` — image used when `--image` is omitted + (default `debian-bookworm`, auto-pulled from the catalog if not + local). +- `ssh_key_path` — host SSH key. If unset, banger creates + `~/.local/state/banger/ssh/id_ed25519`. Accepts absolute paths or + `~/`-anchored paths; `~/foo` expands against `$HOME`. Relative + paths are rejected at config load. +- `firecracker_bin` — override the auto-resolved `PATH` lookup. + +Full key reference: [`docs/config.md`](docs/config.md). + +### `vm_defaults` — sizing for new VMs + +Every `vm run` / `vm create` prints a `spec:` line up front showing +the vCPU, RAM, and disk the VM will get. When the flags aren't set, +those values come from: + +1. `[vm_defaults]` in config (if present, wins). +2. Host-derived heuristics (roughly: `cpus/4` capped at 4, `ram/8` + capped at 8 GiB, 8 GiB disk). +3. Built-in constants (floor). + +`banger doctor` prints the effective defaults with provenance. ```toml [vm_defaults] vcpu = 4 memory_mib = 4096 disk_size = "16G" - -[[file_sync]] -host = "~/.config/git/config" -guest = "~/.config/git/config" - -[[file_sync]] -host = "~/.aws" -guest = "~/.aws" ``` -`vm_defaults` overrides banger's host-derived sizing. `file_sync` -copies host files into the VM's work disk at create time — handy -for credentials and dotfiles you want in every sandbox. Full -reference: [`docs/config.md`](docs/config.md). +All keys optional — omit whichever you want banger to decide. + +### `file_sync` — host → guest file copies + +```toml +[[file_sync]] +host = "~/.aws" # whole directory, recursive +guest = "~/.aws" + +[[file_sync]] +host = "~/.config/gh/hosts.yml" +guest = "~/.config/gh/hosts.yml" + +[[file_sync]] +host = "~/bin/my-script" +guest = "~/bin/my-script" +mode = "0755" # optional; default 0600 for files +``` + +Runs at `vm create` time. Each entry copies `host` → `guest` onto +the VM's work disk (mounted at `/root` in the guest). Guest paths +must live under `~/` or `/root/...`. Host paths must live under the +installed owner's home directory; `~/...` is the intended form, and +absolute paths are accepted only when they still point inside that +home. Default is no entries — add the ones you want. A top-level +symlink is followed only when its resolved target stays inside the +owner home. Symlinks encountered while recursing into a synced +directory are skipped with a warning — they'd otherwise leak files +from outside the named tree (e.g. a symlink inside `~/.aws` pointing +to an unrelated credential dir). ## Updating ```bash banger update --check # is a newer release available? sudo banger update # download, verify, swap, restart, run doctor +sudo banger update --to v0.1.1 +sudo banger update --dry-run ``` -The release tarball is cosign-verified against a public key embedded -in the running binary. On any post-swap failure, banger auto-restores -the previous install. See [`docs/privileges.md`](docs/privileges.md) -for the trust model. +`banger update` pulls the release manifest from +`https://releases.thaloco.com/banger/manifest.json`, downloads the +release tarball + `SHA256SUMS` + `SHA256SUMS.sig`, verifies the +cosign signature against the public key embedded in the running +binary, hashes the tarball, atomically swaps the three banger +binaries, restarts both systemd services, and runs `banger doctor`. +On any failure post-swap, it auto-restores the previous install +from `.previous` backups before surfacing the original error. -## Uninstalling +Refuses to start while any banger operation is in flight. No +background update checks; updates only happen when you ask. See +[`docs/privileges.md`](docs/privileges.md) for the trust model. -```bash -sudo banger system uninstall # remove services + binaries; keep state -sudo banger system uninstall --purge # also wipe VMs, images, caches under /var/lib/banger -``` +## Advanced -User config (`~/.config/banger/`) and SSH key -(`~/.local/state/banger/ssh/`) stay put either way — delete them by -hand if you want a full clean slate. +The common path is `vm run`. Power-user flows (`vm create`, OCI pull +for arbitrary images, `image register`, manual workspace prepare) are +documented in [`docs/advanced.md`](docs/advanced.md). ## Security -Guest VMs are single-user dev sandboxes, not multi-tenant servers. -sshd accepts only the host SSH key (no passwords, no -kbd-interactive), and guests are reachable only through the host -bridge (`172.16.0.0/24`). Don't expose the bridge or guest IPs to -an untrusted network. +Guest VMs are single-user development sandboxes, not multi-tenant +servers. Each guest's sshd is configured with: -The privileged surface lives entirely in `bangerd-root.service` and -is documented in [`docs/privileges.md`](docs/privileges.md). +``` +PermitRootLogin prohibit-password +PubkeyAuthentication yes +PasswordAuthentication no +KbdInteractiveAuthentication no +AuthorizedKeysFile /root/.ssh/authorized_keys +``` + +The host SSH key is the only authentication mechanism. `StrictModes` +is on (sshd's default); banger normalises `/root`, `/root/.ssh`, and +`authorized_keys` perms at provisioning time so the default passes. + +VMs are reachable only through the host bridge network +(`172.16.0.0/24` by default). Do not expose the bridge interface or +guest IPs to an untrusted network. ## Further reading -- [`docs/config.md`](docs/config.md) — full config reference. -- [`docs/dns-routing.md`](docs/dns-routing.md) — `.vm` host-side resolution. -- [`docs/image-catalog.md`](docs/image-catalog.md) — image bundles and how to publish. -- [`docs/kernel-catalog.md`](docs/kernel-catalog.md) — kernel bundles. -- [`docs/oci-import.md`](docs/oci-import.md) — pulling arbitrary OCI images. -- [`docs/advanced.md`](docs/advanced.md) — `vm create`, scripting, custom rootfs. -- [`docs/privileges.md`](docs/privileges.md) — trust model, capability set, daemon split. -- [`CONTRIBUTING.md`](CONTRIBUTING.md) — building from source, running tests. +- [`docs/config.md`](docs/config.md) — full config key reference. +- [`docs/dns-routing.md`](docs/dns-routing.md) — resolving + `.vm` hostnames from the host. +- [`docs/image-catalog.md`](docs/image-catalog.md) — bundle format + and publishing. +- [`docs/kernel-catalog.md`](docs/kernel-catalog.md) — kernel + bundles. +- [`docs/oci-import.md`](docs/oci-import.md) — pulling arbitrary + OCI images. +- [`docs/advanced.md`](docs/advanced.md) — power-user flows. diff --git a/assets/banger.gif b/assets/banger.gif deleted file mode 100644 index 2f88c5a..0000000 Binary files a/assets/banger.gif and /dev/null differ diff --git a/assets/demo.tape b/assets/demo.tape deleted file mode 100644 index d68741a..0000000 --- a/assets/demo.tape +++ /dev/null @@ -1,112 +0,0 @@ -# banger hero demo — VHS tape -# Render with: vhs assets/demo.tape - -Output assets/banger.gif - -Require banger -Require ssh -Require curl - -Set Shell "bash" -Set FontSize 14 -Set LineHeight 1.4 -Set Width 1200 -Set Height 720 -Set Padding 20 -Set Theme "Catppuccin Frappe" -Set TypingSpeed 66ms - -# Off-camera reset: enable bash syntax highlighting via ble.sh, prompt -# styling, drop any prior demo VM, and clear the screen. -Hide -Type "source ~/.local/share/blesh/ble.sh --noattach" -Enter -Sleep 200ms -Type "bleopt complete_auto_complete= complete_auto_history=" -Enter -Sleep 100ms -Type `export PS1="\n$PS1"` -Enter -Sleep 200ms -Type "[[ ${BLE_VERSION-} ]] && ble-attach" -Enter -Sleep 400ms -Type "ble-face -s syntax_error fg=red" -Enter -Sleep 100ms -Type "banger vm kill demo 2>/dev/null; banger vm delete demo 2>/dev/null; clear" -Enter -Sleep 500ms -Show - -Type "banger vm run --nat --name demo" -Enter -Wait+Line /demo:~#/ -Sleep 1.4s - -Type "uname -a" -Enter -Sleep 1.4s - -Type "exit" -Enter -Wait -Sleep 700ms - -Type "banger vm list" -Enter -Wait -Sleep 1.8s - -Type "ssh demo.vm" -Enter -Wait+Line /demo:~#/ -Sleep 500ms - -Type "touch foo bar baz" -Enter -Sleep 700ms - -Type "ls" -Enter -Sleep 1.4s - -Type "exit" -Enter -Sleep 700ms - -Type "banger vm stop demo" -Enter -Wait -Sleep 1s - -Type "banger vm start demo" -Enter -Wait -Sleep 1s - -Type "banger vm exec demo -- ls" -Enter -Wait -Sleep 1.4s - -Type "banger vm exec demo -- docker run -d -p 80:80 nginx" -Enter -Wait -Sleep 1.6s - -Type "banger vm ports demo" -Enter -Wait -Sleep 2s - -Type "curl http://demo.vm" -Sleep 1.2s -Enter -Wait -Sleep 4s - -Type "banger vm kill demo && banger vm delete demo" -Enter -Wait -Sleep 3s diff --git a/docs/oci-import-internals.md b/docs/oci-import-internals.md index 2607aa1..434a01e 100644 --- a/docs/oci-import-internals.md +++ b/docs/oci-import-internals.md @@ -11,9 +11,7 @@ - **`Pull`** wraps `go-containerregistry`'s `remote.Image` with the `linux/amd64` platform pinned. Layer blobs cache under - `/var/cache/banger/oci/blobs/` (system install) or - `~/.cache/banger/oci/blobs/` (dev mode) and populate lazily during - flatten. + `~/.cache/banger/oci/blobs/` and populate lazily during flatten. - **`Flatten`** replays layers oldest-first into a staging directory, applies whiteouts, rejects unsafe paths plus filenames that banger's debugfs ownership fixup cannot encode safely. Returns a `Metadata` diff --git a/docs/oci-import.md b/docs/oci-import.md index 841aed7..6160b7c 100644 --- a/docs/oci-import.md +++ b/docs/oci-import.md @@ -90,16 +90,12 @@ Unknown distros fall back to `ID_LIKE`, then error cleanly. ## Paths -Paths below assume the system install (`banger system install`). When -running `bangerd` directly without the helper, the same files live -under `~/.cache/banger/` and `~/.local/state/banger/` instead. - | What | Where | |------|-------| -| Layer blob cache | `/var/cache/banger/oci/blobs/sha256/` | -| Staging dir | `/var/lib/banger/images/.staging/` | +| Layer blob cache | `~/.cache/banger/oci/blobs/sha256/` | +| Staging dir | `~/.local/state/banger/images/.staging/` | | Extraction scratch | `$TMPDIR/banger-pull-/` | -| Published image | `/var/lib/banger/images//rootfs.ext4` | +| Published image | `~/.local/state/banger/images//rootfs.ext4` | ## Cache lifecycle diff --git a/docs/release-process.md b/docs/release-process.md deleted file mode 100644 index 510ac06..0000000 --- a/docs/release-process.md +++ /dev/null @@ -1,189 +0,0 @@ -# Release process - -Maintainer-facing runbook for cutting and publishing a new banger -release. End users don't need any of this — they pick up new releases -through `banger update` or the curl-piped `install.sh`. - -## What ships in a release - -Each release publishes four objects to the R2 bucket served at -`https://releases.thaloco.com/banger/`: - -| Object | Path | Notes | -|---|---|---| -| Tarball | `/banger--linux-amd64.tar.gz` | `banger`, `bangerd`, `banger-vsock-agent` at the root, no subdirs | -| Hashes | `/SHA256SUMS` | One line for the tarball, GNU `sha256sum` format | -| Signature | `/SHA256SUMS.sig` | base64-encoded ASN.1 ECDSA cosign-blob signature over `SHA256SUMS` | -| Manifest | `manifest.json` (bucket root) | Describes every published release; `latest_stable` points at the most recent | - -`install.sh` lives at the bucket root too (unversioned) so the -`curl … | bash` URL stays stable across releases. - -## Trust model recap - -Every release is cosign-signed. The public key is pinned in two places -that MUST stay in sync: - -- `internal/updater/verify_signature.go` — `BangerReleasePublicKey` - used by `banger update`. -- `scripts/install.sh` — embedded copy used by the curl-piped installer - before any banger binary is on disk. - -`scripts/publish-banger-release.sh` aborts the upload if the two copies -diverge — that's the only mechanism keeping them coupled, so don't -edit either alone. - -The signed payload is `SHA256SUMS`, which in turn covers the tarball. -Verification uses the Go standard library (`crypto/ecdsa.VerifyASN1`) -on the update path and `openssl dgst -verify` on the install-script -path. cosign is needed only for **signing**. - -## Pre-flight checklist - -Run these before tagging or publishing: - -1. **`make smoke`** — the full systemd-driven scenario suite must be - green. The smoke harness exercises the real install + update path - end to end; if it's red, do not cut. -2. **CHANGELOG entry.** Add a `## [vX.Y.Z] - YYYY-MM-DD` section under - `## [Unreleased]` describing what changed. Use the - [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) sub-headings - (`### Added`, `### Fixed`, `### Notes`). -3. **Bump the link table** at the bottom of `CHANGELOG.md`: - ```markdown - [Unreleased]: …/compare/vX.Y.Z...HEAD - [vX.Y.Z]: …/releases/tag/vX.Y.Z - ``` -4. **Note unit-file changes loudly** in the CHANGELOG entry. `banger - update` swaps binaries only — it does NOT rewrite - `/etc/systemd/system/bangerd*.service`. If this release changed - `renderSystemdUnit` / `renderRootHelperSystemdUnit`, the entry must - tell existing-install users to run `sudo banger system install` - once after updating to pick up the new units. v0.1.4 and v0.1.6 - are reference examples. - -Commit the CHANGELOG change, push to `main`, and confirm CI is green. - -## Cutting the release - -Order matters: publish first, then tag. - -1. **Run the publish script:** - - ```sh - scripts/publish-banger-release.sh vX.Y.Z - ``` - - The script: - - Builds `banger`, `bangerd`, `banger-vsock-agent` with `-ldflags` - baking the version, the current commit SHA, and a UTC build - timestamp into `internal/buildinfo`. - - Tarballs the three binaries (bare basenames at the tar root — - `internal/updater/StageTarball` rejects anything else). - - Computes `SHA256SUMS`, signs it with `cosign sign-blob` (no - transparency log, no bundle format — banger verifies the bare - ASN.1 DER signature directly). - - Verifies the signature against the public key extracted from - `internal/updater/verify_signature.go`, then diffs that against - the public key embedded in `scripts/install.sh`. Either failure - aborts before upload. - - Pulls the existing `manifest.json` from the bucket, appends the - new release entry, points `latest_stable` at it, and uploads - everything via rclone. - - Uploads `scripts/install.sh` to the bucket root so the curl-piped - installer stays current. - -2. **Tag and push:** - - ```sh - git tag vX.Y.Z - git push --tags - ``` - - Tagging happens AFTER publishing so the tag only exists if the - release actually shipped. - -3. **Verify from a clean machine:** - - ```sh - curl -fsSL https://releases.thaloco.com/banger/manifest.json | jq .latest_stable - curl -fsSL https://releases.thaloco.com/banger/install.sh | head -20 - banger update --check # on an existing install - ``` - -## Verification releases - -If a release fixes anything in the update flow itself — -`runUpdate` (`internal/cli/commands_update.go`), the systemd unit -templates, or the helper/daemon restart sequencing — cut a follow-up -no-op verification release immediately. The reason: `banger update` -runs the OLD binary as the driver of the swap. A fix in vN can't be -observed end-to-end on a vN-1 host updating to vN, because vN-1 is -still in the driver seat. vN+1 with no functional changes lets a host -on vN update to it and observe the fix live with vN as the driver. - -Examples in CHANGELOG.md: v0.1.3 follows v0.1.2's update-flow fix; -v0.1.5 follows v0.1.4's daemon-restart fix. - -The verification-release CHANGELOG section is short and explicit: -> No functional changes. Verification release for vN: … - -## Patch vs minor - -banger follows [SemVer](https://semver.org/spec/v2.0.0.html). For -v0.1.x, the practical contract: - -- **Patch (v0.1.x):** bug fixes, internal refactors, anything that - doesn't change the exposed API/CLI behavior. -- **Minor (v0.2.x):** any change to the **exposed API behavior or - contract**. The vsock guest-agent protocol is the canonical example — - a minor bump means existing VMs created against the older minor need - to be re-pulled. Other minor-trigger changes: removing a CLI flag, - changing a stable RPC method's request/response shape, breaking the - on-disk store schema in a non-forward-compatible way. - -If in doubt, prefer the higher bump. Patch releases that turn out to -have broken a contract are the worst-of-both: users update without -warning, then break. - -## Sibling catalogs - -Kernel and golden-image releases ship through the same gate. The -`internal/kernelcat/catalog.json` and `internal/imagecat/catalog.json` -manifests are `go:embed`-ed at build time, so a new entry only -reaches users when banger itself is re-released. In practice: - -1. Run `scripts/publish-kernel.sh ` or - `scripts/publish-golden-image.sh …` to upload the artefact and - patch the appropriate `catalog.json` in the working tree. -2. Commit the catalog change with whatever banger fix or feature it's - landing alongside. -3. Cut a banger release the normal way; the new catalog entry ships - with the next `banger` binary. - -The kernel and image catalogs each have their own R2 bucket -(`kernels.thaloco.com`, `images.thaloco.com`) so versioning of the -artefacts is independent of banger's release cadence — but -**discoverability** is gated by the banger release that embeds the -catalog pointer. - -## When something goes wrong mid-release - -- **Signature verification fails locally** in - `publish-banger-release.sh`: confirm `internal/updater/verify_signature.go` - contains the same public key as `cosign.pub` in the repo root. If - the script reports drift between `verify_signature.go` and - `install.sh`, run `diff` between the two `BEGIN PUBLIC KEY` blocks - and resolve before rerunning. -- **rclone upload fails partway through:** the script uploads tarball, - hashes, signature, and manifest in that order. Re-running is safe; - rclone will overwrite. Until the manifest is uploaded, no client - sees the new release — so a partial upload is invisible. -- **Manifest already names the version** (re-cutting): the publish - script's `jq` filter dedupes by `version`, so re-running with the - same `vX.Y.Z` cleanly replaces the entry. -- **Already tagged but the release is bad:** delete the tag locally - AND on the remote (`git push --delete origin vX.Y.Z`), revert the - CHANGELOG entry, fix the bug, and start the cycle over with a fresh - patch number. Do NOT re-use the version — installed clients have - already cached its `SHA256SUMS` against the manifest. diff --git a/internal/cli/banger.go b/internal/cli/banger.go index 7c40e5a..281325a 100644 --- a/internal/cli/banger.go +++ b/internal/cli/banger.go @@ -34,14 +34,10 @@ The most common workflow is one command: banger vm run bare sandbox, drops into ssh banger vm run ./repo ships a repo into /root/repo, drops into ssh banger vm run ./repo -- make test ships a repo, runs the command, exits with its status - banger vm run --rm -- script.sh --rm: VM auto-deletes when the session/command exits - banger vm run --nat ./repo --nat: outbound internet (required when .mise.toml installs tools) - banger vm run -d ./repo --nat -d/--detach: prep workspace + bootstrap, exit without ssh For a longer-lived VM, use 'banger vm create' to provision and 'banger vm ssh ' to attach. 'banger ps' lists running VMs; -'banger vm list --all' shows stopped ones too. Guests are reachable -at .vm from the host once 'banger ssh-config --install' is run. +'banger vm list --all' shows stopped ones too. First-time setup, in order: sudo banger system install install the systemd services @@ -75,8 +71,7 @@ to diagnose host readiness problems. } func (d *deps) newDoctorCommand() *cobra.Command { - var verbose bool - cmd := &cobra.Command{ + return &cobra.Command{ Use: "doctor", Short: "Check host and runtime readiness", Long: strings.TrimSpace(` @@ -90,10 +85,8 @@ Run 'banger doctor': - after upgrading the host kernel or banger itself - when 'banger vm run' fails with an unclear error -By default, prints failing and warning checks only and a summary -footer; a healthy host collapses to a single line. Pass --verbose to -print every check with its details. Exit code is non-zero if any -check fails. Warnings are reported but do not fail the run. +Exit code is non-zero if any check fails. Warnings are reported but +do not fail the run. `), Args: noArgsUsage("usage: banger doctor"), RunE: func(cmd *cobra.Command, args []string) error { @@ -101,7 +94,7 @@ check fails. Warnings are reported but do not fail the run. if err != nil { return err } - if err := printDoctorReport(cmd.OutOrStdout(), report, verbose); err != nil { + if err := printDoctorReport(cmd.OutOrStdout(), report); err != nil { return err } if report.HasFailures() { @@ -110,8 +103,6 @@ check fails. Warnings are reported but do not fail the run. return nil }, } - cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "show every check (default: only failures and warnings)") - return cmd } func newVersionCommand() *cobra.Command { diff --git a/internal/cli/bangerd_test.go b/internal/cli/bangerd_test.go deleted file mode 100644 index fa60b76..0000000 --- a/internal/cli/bangerd_test.go +++ /dev/null @@ -1,194 +0,0 @@ -package cli - -import ( - "bytes" - "database/sql" - "os" - "path/filepath" - "strings" - "testing" - - "banger/internal/store" - - "github.com/spf13/cobra" - _ "modernc.org/sqlite" -) - -func TestNewBangerdCommandSubcommands(t *testing.T) { - cmd := NewBangerdCommand() - if cmd.Use != "bangerd" { - t.Errorf("Use = %q, want bangerd", cmd.Use) - } - for _, flag := range []string{"system", "root-helper", "check-migrations"} { - if cmd.Flag(flag) == nil { - t.Errorf("flag %q missing", flag) - } - } -} - -func TestLastID(t *testing.T) { - tests := []struct { - name string - in []int - want int - }{ - {"nil", nil, 0}, - {"empty", []int{}, 0}, - {"single", []int{7}, 7}, - {"sorted ascending", []int{1, 2, 3}, 3}, - {"unsorted, max in middle", []int{1, 99, 5}, 99}, - {"duplicates", []int{4, 4, 2, 4}, 4}, - {"negative ignored", []int{-3, -1, 0}, 0}, - } - for _, tc := range tests { - t.Run(tc.name, func(t *testing.T) { - if got := lastID(tc.in); got != tc.want { - t.Fatalf("lastID(%v) = %d, want %d", tc.in, got, tc.want) - } - }) - } -} - -// stubExit replaces bangerdExit for the test and returns a pointer to -// the captured exit code (-1 = not called) and a restore func. -func stubExit(t *testing.T) *int { - t.Helper() - called := -1 - prev := bangerdExit - bangerdExit = func(code int) { called = code } - t.Cleanup(func() { bangerdExit = prev }) - return &called -} - -// pointHomeAtTempDB sets XDG_STATE_HOME (and HOME, which Resolve falls -// back to) so that paths.Resolve().DBPath lands at /banger/state.db. -// Returns the DB path. -func pointHomeAtTempDB(t *testing.T) string { - t.Helper() - tmp := t.TempDir() - t.Setenv("HOME", tmp) - t.Setenv("XDG_STATE_HOME", tmp) - t.Setenv("XDG_CONFIG_HOME", tmp) - t.Setenv("XDG_CACHE_HOME", tmp) - t.Setenv("XDG_RUNTIME_DIR", tmp) - dir := filepath.Join(tmp, "banger") - if err := os.MkdirAll(dir, 0o700); err != nil { - t.Fatalf("mkdir state dir: %v", err) - } - return filepath.Join(dir, "state.db") -} - -func TestRunCheckMigrationsCompatible(t *testing.T) { - dbPath := pointHomeAtTempDB(t) - s, err := store.Open(dbPath) - if err != nil { - t.Fatalf("store.Open: %v", err) - } - _ = s.Close() - - exit := stubExit(t) - cmd := &cobra.Command{} - var out bytes.Buffer - cmd.SetOut(&out) - - if err := runCheckMigrations(cmd, false); err != nil { - t.Fatalf("runCheckMigrations: %v", err) - } - if *exit != -1 { - t.Errorf("bangerdExit called with %d, want no call", *exit) - } - if !strings.HasPrefix(out.String(), "compatible:") { - t.Errorf("stdout = %q, want prefix \"compatible:\"", out.String()) - } -} - -func TestRunCheckMigrationsMigrationsNeeded(t *testing.T) { - dbPath := pointHomeAtTempDB(t) - // Hand-craft a DB that has schema_migrations with only the baseline - // row — InspectSchemaState classifies this as "migrations needed". - dsn := "file:" + dbPath + "?_pragma=foreign_keys(1)" - db, err := sql.Open("sqlite", dsn) - if err != nil { - t.Fatalf("sql.Open: %v", err) - } - if _, err := db.Exec(`CREATE TABLE schema_migrations (id INTEGER PRIMARY KEY, name TEXT NOT NULL, applied_at TEXT NOT NULL)`); err != nil { - t.Fatalf("create table: %v", err) - } - if _, err := db.Exec(`INSERT INTO schema_migrations VALUES (1, 'baseline', '2026-01-01T00:00:00Z')`); err != nil { - t.Fatalf("insert baseline: %v", err) - } - _ = db.Close() - - exit := stubExit(t) - cmd := &cobra.Command{} - var out bytes.Buffer - cmd.SetOut(&out) - - if err := runCheckMigrations(cmd, false); err != nil { - t.Fatalf("runCheckMigrations: %v", err) - } - if *exit != 1 { - t.Errorf("bangerdExit called with %d, want 1", *exit) - } - if !strings.HasPrefix(out.String(), "migrations needed:") { - t.Errorf("stdout = %q, want prefix \"migrations needed:\"", out.String()) - } -} - -func TestRunCheckMigrationsIncompatible(t *testing.T) { - dbPath := pointHomeAtTempDB(t) - s, err := store.Open(dbPath) - if err != nil { - t.Fatalf("store.Open: %v", err) - } - _ = s.Close() - - // Inject an unknown migration id directly so the binary's known set - // is a strict subset — InspectSchemaState classifies as incompatible. - dsn := "file:" + dbPath - db, err := sql.Open("sqlite", dsn) - if err != nil { - t.Fatalf("sql.Open: %v", err) - } - if _, err := db.Exec(`INSERT INTO schema_migrations VALUES (9999, 'from_the_future', '2030-01-01T00:00:00Z')`); err != nil { - t.Fatalf("insert future row: %v", err) - } - _ = db.Close() - - exit := stubExit(t) - cmd := &cobra.Command{} - var out bytes.Buffer - cmd.SetOut(&out) - - if err := runCheckMigrations(cmd, false); err != nil { - t.Fatalf("runCheckMigrations: %v", err) - } - if *exit != 2 { - t.Errorf("bangerdExit called with %d, want 2", *exit) - } - if !strings.HasPrefix(out.String(), "incompatible:") { - t.Errorf("stdout = %q, want prefix \"incompatible:\"", out.String()) - } -} - -func TestRunCheckMigrationsInspectError(t *testing.T) { - // Point at a state dir with a non-DB file at state.db so Inspect - // fails to open it. The function should wrap the error with the path. - dbPath := pointHomeAtTempDB(t) - if err := os.WriteFile(dbPath, []byte("not a sqlite file"), 0o600); err != nil { - t.Fatalf("write garbage: %v", err) - } - - stubExit(t) - cmd := &cobra.Command{} - var out bytes.Buffer - cmd.SetOut(&out) - - err := runCheckMigrations(cmd, false) - if err == nil { - t.Fatal("runCheckMigrations: nil error, want wrapped inspect error") - } - if !strings.Contains(err.Error(), dbPath) { - t.Errorf("error %q does not mention DB path %q", err.Error(), dbPath) - } -} diff --git a/internal/cli/cli_test.go b/internal/cli/cli_test.go index f39a962..e924a18 100644 --- a/internal/cli/cli_test.go +++ b/internal/cli/cli_test.go @@ -133,15 +133,12 @@ func TestDoctorCommandPrintsReportAndFailsOnHardFailures(t *testing.T) { t.Fatalf("Execute() error = %v, want doctor failure", err) } output := stdout.String() - if strings.Contains(output, "PASS\truntime bundle") { - t.Fatalf("output = %q, brief default should hide PASS rows", output) + if !strings.Contains(output, "PASS\truntime bundle") { + t.Fatalf("output = %q, want runtime bundle pass", output) } if !strings.Contains(output, "FAIL\tfeature nat") { t.Fatalf("output = %q, want feature nat fail", output) } - if !strings.Contains(output, "1 passed, 0 warnings, 1 failure") { - t.Fatalf("output = %q, want summary footer", output) - } } func TestDoctorCommandReturnsUnderlyingError(t *testing.T) { @@ -588,7 +585,7 @@ func TestRunVMCreatePollsUntilDone(t *testing.T) { } var stderr bytes.Buffer - got, err := d.runVMCreate(context.Background(), "/tmp/bangerd.sock", &stderr, api.VMCreateParams{Name: "devbox"}, false) + got, err := d.runVMCreate(context.Background(), "/tmp/bangerd.sock", &stderr, api.VMCreateParams{Name: "devbox"}) if err != nil { t.Fatalf("d.runVMCreate: %v", err) } @@ -643,7 +640,7 @@ func TestVMCreateProgressRendererSuppressesDuplicateLines(t *testing.T) { func TestVMRunProgressRendererSuppressesDuplicateLines(t *testing.T) { var stderr bytes.Buffer - renderer := newVMRunProgressRenderer(&stderr, true) + renderer := newVMRunProgressRenderer(&stderr) renderer.render("waiting for guest ssh") renderer.render("waiting for guest ssh") @@ -661,67 +658,6 @@ func TestVMRunProgressRendererSuppressesDuplicateLines(t *testing.T) { } } -// TestVMRunProgressRendererInlineRewrites covers the TTY default: each -// render call rewrites the same line via \r + clear-to-EOL instead of -// emitting a newline, so the user sees one moving status line until -// commitLine / clear / the caller's own newline closes it out. -func TestVMRunProgressRendererInlineRewrites(t *testing.T) { - var stderr bytes.Buffer - renderer := &vmRunProgressRenderer{out: &stderr, enabled: true, inline: true} - - renderer.render("waiting for guest ssh") - renderer.render("preparing guest workspace") - renderer.commitLine("vm devbox running; reconnect with: banger vm ssh devbox") - - got := stderr.String() - wantPrefix := "\r\x1b[K[vm run] waiting for guest ssh" + - "\r\x1b[K[vm run] preparing guest workspace" + - "\r\x1b[K[vm run] vm devbox running; reconnect with: banger vm ssh devbox\n" - if got != wantPrefix { - t.Fatalf("inline output = %q, want %q", got, wantPrefix) - } -} - -// TestVMRunProgressRendererClearWipesActiveLine guards the path used -// before sshExec/runSSHSession: clear() must erase the live inline -// line so the next writer (the ssh session, a warning, the user's -// command output) starts from column 0 without a trailing status. -func TestVMRunProgressRendererClearWipesActiveLine(t *testing.T) { - var stderr bytes.Buffer - renderer := &vmRunProgressRenderer{out: &stderr, enabled: true, inline: true} - - renderer.render("attaching to guest") - renderer.clear() - // clear() on an already-cleared renderer is a no-op (active=false). - renderer.clear() - - got := stderr.String() - want := "\r\x1b[K[vm run] attaching to guest\r\x1b[K" - if got != want { - t.Fatalf("after clear stderr = %q, want %q", got, want) - } -} - -// TestVMCreateProgressRendererInlineRewrites mirrors the vm_run inline -// test for the create-side renderer so both progress paths stay in -// sync if either is touched in isolation. -func TestVMCreateProgressRendererInlineRewrites(t *testing.T) { - var stderr bytes.Buffer - renderer := &vmCreateProgressRenderer{out: &stderr, enabled: true, inline: true} - - renderer.render(api.VMCreateOperation{Stage: "prepare_work_disk", Detail: "cloning work seed"}) - renderer.render(api.VMCreateOperation{Stage: "wait_vsock_agent", Detail: "waiting for guest vsock agent"}) - renderer.clear() - - got := stderr.String() - want := "\r\x1b[K[vm create] preparing work disk: cloning work seed" + - "\r\x1b[K[vm create] waiting for vsock agent: waiting for guest vsock agent" + - "\r\x1b[K" - if got != want { - t.Fatalf("inline output = %q, want %q", got, want) - } -} - func TestWithHeartbeatNoOpForNonTTY(t *testing.T) { var buf bytes.Buffer called := false @@ -801,50 +737,6 @@ func TestAbsolutizeImageRegisterPaths(t *testing.T) { } } -func TestAbsolutizePaths(t *testing.T) { - tmp := t.TempDir() - wd, err := os.Getwd() - if err != nil { - t.Fatalf("Getwd: %v", err) - } - if err := os.Chdir(tmp); err != nil { - t.Fatalf("Chdir: %v", err) - } - t.Cleanup(func() { _ = os.Chdir(wd) }) - - empty := "" - abs := "/already/absolute/path" - rel1 := filepath.Join("a", "b") - rel2 := "./c/d" - - if err := absolutizePaths(&empty, &abs, &rel1, &rel2); err != nil { - t.Fatalf("absolutizePaths: %v", err) - } - - if empty != "" { - t.Errorf("empty value mutated: %q", empty) - } - if abs != "/already/absolute/path" { - t.Errorf("absolute value mutated: %q", abs) - } - if !filepath.IsAbs(rel1) { - t.Errorf("rel1 not absolutized: %q", rel1) - } - if !filepath.IsAbs(rel2) { - t.Errorf("rel2 not absolutized: %q", rel2) - } - // Sanity: relative paths should land under tmp. - if !strings.HasPrefix(rel1, tmp) { - t.Errorf("rel1 = %q, want prefix %q", rel1, tmp) - } -} - -func TestAbsolutizePathsNoArgs(t *testing.T) { - if err := absolutizePaths(); err != nil { - t.Fatalf("absolutizePaths() with no args: %v", err) - } -} - func TestPrintImageListTableShowsRootfsSizes(t *testing.T) { rootfs := filepath.Join(t.TempDir(), "rootfs.ext4") if err := os.WriteFile(rootfs, nil, 0o644); err != nil { @@ -1385,9 +1277,6 @@ func TestRunVMRunWorkspacePreparesAndAttaches(t *testing.T) { &repo, nil, false, - false, - false, - false, ) if err != nil { t.Fatalf("d.runVMRun: %v", err) @@ -1464,9 +1353,6 @@ func TestVMRunPrintsPostCreateProgress(t *testing.T) { &repo, nil, false, - false, - false, - false, ) if err != nil { t.Fatalf("d.runVMRun: %v", err) @@ -1542,9 +1428,6 @@ func TestRunVMRunWarnsWhenToolingHarnessStartFails(t *testing.T) { &repo, nil, false, - false, - false, - false, ) if err != nil { t.Fatalf("d.runVMRun: %v", err) @@ -1596,9 +1479,6 @@ func TestRunVMRunBareModeSkipsWorkspaceAndTooling(t *testing.T) { nil, nil, false, - false, - false, - false, ) if err != nil { t.Fatalf("d.runVMRun: %v", err) @@ -1642,10 +1522,7 @@ func TestRunVMRunRMDeletesAfterSessionExits(t *testing.T) { api.VMCreateParams{Name: "tmpbox"}, nil, nil, - true, // --rm, - false, - false, - false, + true, // --rm ) if err != nil { t.Fatalf("d.runVMRun: %v", err) @@ -1695,10 +1572,7 @@ func TestRunVMRunRMSkipsDeleteOnSSHWaitTimeout(t *testing.T) { api.VMCreateParams{Name: "slowvm"}, nil, nil, - true, // --rm, - false, - false, - false, + true, // --rm ) if err == nil { t.Fatal("want timeout error") @@ -1741,9 +1615,6 @@ func TestRunVMRunSSHTimeoutReturnsActionableError(t *testing.T) { nil, nil, false, - false, - false, - false, ) if err == nil { t.Fatal("want timeout error") @@ -1793,9 +1664,6 @@ func TestRunVMRunCommandModePropagatesExitCode(t *testing.T) { nil, []string{"false"}, false, - false, - false, - false, ) var exitErr ExitCodeError if !errors.As(err, &exitErr) || exitErr.Code != 7 { diff --git a/internal/cli/commands_system.go b/internal/cli/commands_system.go index f1099ac..50768b0 100644 --- a/internal/cli/commands_system.go +++ b/internal/cli/commands_system.go @@ -300,13 +300,6 @@ func renderSystemdUnit(meta installmeta.Metadata) string { "ExecStart=" + systemBangerdBin + " --system", "Restart=on-failure", "RestartSec=1s", - // KillMode=process: only signal the main PID on stop/restart. - // The default (control-group) sends SIGKILL to every process in - // the unit's cgroup, including descendants — and during `banger - // update` we restart this unit, which would terminate any - // in-flight subprocesses spawned by the daemon. The daemon - // shuts its own children down explicitly when needed. - "KillMode=process", "Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "Environment=TMPDIR=/run/banger", "UMask=0077", @@ -329,13 +322,6 @@ func renderSystemdUnit(meta installmeta.Metadata) string { "CacheDirectoryMode=0700", "RuntimeDirectory=banger", "RuntimeDirectoryMode=0700", - // Keep /run/banger across stop/restart so the api-sock symlinks - // the helper creates for live VMs aren't wiped between the daemon - // stopping and the new daemon's reconcile re-attaching to them. - // Without this, `banger update` restarts the daemon, /run/banger - // is wiped, the api-sock symlinks vanish, and rediscoverHandles - // can't resolve the chroot path it needs to read jailer's pidfile. - "RuntimeDirectoryPreserve=yes", } if coverDir := strings.TrimSpace(os.Getenv(systemCoverDirEnv)); coverDir != "" { lines = append(lines, "Environment=GOCOVERDIR="+systemdQuote(coverDir)) @@ -364,34 +350,6 @@ func renderRootHelperSystemdUnit() string { "ExecStart=" + systemBangerdBin + " --root-helper", "Restart=on-failure", "RestartSec=1s", - // KillMode=process + SendSIGKILL=no together make the helper - // safe to restart while banger-launched firecrackers are - // running. firecracker lives in this unit's cgroup (jailer - // doesn't open a sub-cgroup), so: - // - // - Default control-group mode SIGKILLs every process in - // the cgroup on stop. - // - KillMode=process limits the initial SIGTERM to the - // helper main PID; systemd leaves remaining cgroup - // processes alone (and logs "Unit process N (firecracker) - // remains running after unit stopped"). - // - SendSIGKILL=no disables the FinalKillSignal escalation - // that would otherwise SIGKILL leftovers after the timeout. - // - // One more pitfall: the firecracker SDK installs a default - // signal-forwarding goroutine in the helper that catches - // SIGTERM (etc.) and forwards it to every firecracker child. - // We disable that explicitly via ForwardSignals: []os.Signal{} - // in firecracker.buildConfig — without that override, systemd - // signaling the helper main would propagate to every running - // VM regardless of what these directives do. - // - // `banger system uninstall` and the daemon's vm-stop path - // explicitly stop firecracker processes when actually needed, - // so we don't lose the systemd-driven kill as a real safety - // net — banger drives those kills itself. - "KillMode=process", - "SendSIGKILL=no", "Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "Environment=TMPDIR=" + installmeta.DefaultRootHelperRuntimeDir, "UMask=0077", @@ -413,12 +371,6 @@ func renderRootHelperSystemdUnit() string { "ReadWritePaths=/var/lib/banger", "RuntimeDirectory=banger-root", "RuntimeDirectoryMode=0711", - // Same rationale as bangerd.service: the helper-managed - // /run/banger-root holds the helper's RPC socket and any - // per-VM scratch state; preserving it across restart keeps - // the daemon's reconnect path and reconcile re-attachment - // from racing against systemd's runtime-dir cleanup. - "RuntimeDirectoryPreserve=yes", } if coverDir := strings.TrimSpace(os.Getenv(rootCoverDirEnv)); coverDir != "" { lines = append(lines, "Environment=GOCOVERDIR="+systemdQuote(coverDir)) diff --git a/internal/cli/commands_update.go b/internal/cli/commands_update.go index d4313ac..1c0ee3f 100644 --- a/internal/cli/commands_update.go +++ b/internal/cli/commands_update.go @@ -30,12 +30,10 @@ const stagingTarballName = "release.tar.gz" func (d *deps) newUpdateCommand() *cobra.Command { var ( - checkOnly bool - dryRun bool - force bool - toVersion string - manifestURL string - pubkeyFile string + checkOnly bool + dryRun bool + force bool + toVersion string ) cmd := &cobra.Command{ Use: "update", @@ -70,12 +68,10 @@ talks to systemd. Run with sudo. Args: noArgsUsage("usage: banger update [--check] [--dry-run] [--force] [--to vX.Y.Z]"), RunE: func(cmd *cobra.Command, args []string) error { return d.runUpdate(cmd, runUpdateOpts{ - checkOnly: checkOnly, - dryRun: dryRun, - force: force, - toVersion: toVersion, - manifestURL: manifestURL, - pubkeyFile: pubkeyFile, + checkOnly: checkOnly, + dryRun: dryRun, + force: force, + toVersion: toVersion, }) }, } @@ -83,53 +79,23 @@ talks to systemd. Run with sudo. cmd.Flags().BoolVar(&dryRun, "dry-run", false, "fetch and verify, but do not swap or restart anything") cmd.Flags().BoolVar(&force, "force", false, "skip in-flight-op refusal and post-restart doctor verification") cmd.Flags().StringVar(&toVersion, "to", "", "specific release version to install (default: latest_stable from manifest)") - // Hidden test/dev hooks: redirect the updater at a non-default - // manifest URL and trust a non-default cosign public key. Used by - // the smoke suite to drive a real update against locally-built - // release artefacts. Production users have no reason to touch - // these; they are not advertised in --help. - cmd.Flags().StringVar(&manifestURL, "manifest-url", "", "") - cmd.Flags().StringVar(&pubkeyFile, "pubkey-file", "", "") - _ = cmd.Flags().MarkHidden("manifest-url") - _ = cmd.Flags().MarkHidden("pubkey-file") return cmd } type runUpdateOpts struct { - checkOnly bool - dryRun bool - force bool - toVersion string - manifestURL string - pubkeyFile string + checkOnly bool + dryRun bool + force bool + toVersion string } func (d *deps) runUpdate(cmd *cobra.Command, opts runUpdateOpts) error { ctx := cmd.Context() out := cmd.OutOrStdout() - // Resolve the test/dev override flags up front so a bad - // --pubkey-file fails fast before any network round-trips. - pubKeyPEM := updater.BangerReleasePublicKey - if strings.TrimSpace(opts.pubkeyFile) != "" { - body, err := os.ReadFile(opts.pubkeyFile) - if err != nil { - return fmt.Errorf("read --pubkey-file: %w", err) - } - pubKeyPEM = string(body) - } - // Discover. client := &http.Client{Timeout: 30 * time.Second} - var ( - manifest updater.Manifest - err error - ) - if strings.TrimSpace(opts.manifestURL) != "" { - manifest, err = updater.FetchManifestFrom(ctx, client, opts.manifestURL) - } else { - manifest, err = updater.FetchManifest(ctx, client) - } + manifest, err := updater.FetchManifest(ctx, client) if err != nil { return fmt.Errorf("discover: %w", err) } @@ -176,7 +142,7 @@ func (d *deps) runUpdate(cmd *cobra.Command, opts runUpdateOpts) error { if err != nil { return fmt.Errorf("download: %w", err) } - if err := updater.FetchAndVerifySignatureWithKey(ctx, client, target, sumsBody, pubKeyPEM); err != nil { + if err := updater.FetchAndVerifySignature(ctx, client, target, sumsBody); err != nil { // Don't leave the staged tarball around — it failed // signature verification and shouldn't be re-runnable. _ = os.Remove(tarballPath) @@ -213,21 +179,15 @@ func (d *deps) runUpdate(cmd *cobra.Command, opts runUpdateOpts) error { return fmt.Errorf("swap: %w (rolled back)", err) } - // Restart services + wait for the new daemon. A `systemctl restart` - // that fails has typically already STOPPED the unit, so the prior - // binary on disk isn't running anywhere — Rollback() must be paired - // with a re-restart to bring the rolled-back binary back into a - // running state. That's rollbackAndRestart's job; rollbackAndWrap - // is for the swap-step failures earlier where the restart never - // fired and the old binary is still in memory. + // Restart services + wait for the new daemon. if err := d.runSystemctl(ctx, "restart", installmeta.DefaultRootHelperService); err != nil { - return rollbackAndRestart(ctx, d, swap, "restart helper", err) + return rollbackAndWrap(swap, "restart helper", err) } if err := d.runSystemctl(ctx, "restart", installmeta.DefaultService); err != nil { - return rollbackAndRestart(ctx, d, swap, "restart daemon", err) + return rollbackAndWrap(swap, "restart daemon", err) } if err := d.waitForDaemonReady(ctx, socketPath); err != nil { - return rollbackAndRestart(ctx, d, swap, "wait daemon ready", err) + return rollbackAndWrap(swap, "wait daemon ready", err) } // Verify with doctor unless --force says otherwise. @@ -238,20 +198,13 @@ func (d *deps) runUpdate(cmd *cobra.Command, opts runUpdateOpts) error { } // Finalise: refresh install metadata, drop backups, clean staging. - // Read the new binary's identity by exec'ing it; buildinfo.Current() - // reflects the OLD running CLI (we're it), so the commit + built_at - // have to come from the freshly-swapped /usr/local/bin/banger or - // install.toml ends up with mixed-version fields. - newInfo, err := readInstalledBuildinfo(ctx, targets.Banger) - if err != nil { - fmt.Fprintf(out, "warning: read installed buildinfo: %v\n", err) - // Fall back to the manifest version + the running binary's - // commit/built_at. install.toml drift is a doctor warning, - // not a broken host, so don't fail the update. - old := buildinfo.Current() - newInfo = buildinfo.Info{Version: target.Version, Commit: old.Commit, BuiltAt: old.BuiltAt} - } - if err := installmeta.UpdateBuildInfo(installmeta.DefaultPath, newInfo.Version, newInfo.Commit, newInfo.BuiltAt); err != nil { + info := buildinfo.Current() + // We just installed `target.Version` — info.Version still reflects + // the OLD running binary (we're it). The new bangerd encodes its + // own version; for install.toml we record what we INSTALLED. + if err := installmeta.UpdateBuildInfo(installmeta.DefaultPath, target.Version, info.Commit, info.BuiltAt); err != nil { + // Don't fail the update for this — the install is healthy; + // install.toml drift is a doctor warning, not a broken host. fmt.Fprintf(out, "warning: update install metadata: %v\n", err) } if err := updater.CleanupBackups(swap); err != nil { @@ -330,51 +283,6 @@ func sanityRunStaged(ctx context.Context, staged updater.StagedRelease, expected return nil } -// readInstalledBuildinfo execs the just-swapped banger binary, parses -// its three-line `version` output, and returns the parsed identity. -// Used to refresh install.toml after an update so the on-disk record -// reflects the binary that's actually installed — buildinfo.Current() -// in the running process is the OLD binary's identity, not the one we -// just put on disk. -// -// Output shape (from internal/cli/banger.go versionString): -// -// version: vX.Y.Z -// commit: -// built_at: -func readInstalledBuildinfo(ctx context.Context, bangerPath string) (buildinfo.Info, error) { - out, err := exec.CommandContext(ctx, bangerPath, "version").Output() - if err != nil { - return buildinfo.Info{}, fmt.Errorf("exec %s version: %w", bangerPath, err) - } - return parseVersionOutput(string(out)) -} - -// parseVersionOutput extracts the three identity fields from -// `banger version`. Split out of readInstalledBuildinfo so it can be -// unit-tested without exec'ing a real binary. -func parseVersionOutput(out string) (buildinfo.Info, error) { - var info buildinfo.Info - for _, line := range strings.Split(out, "\n") { - k, v, ok := strings.Cut(line, ":") - if !ok { - continue - } - switch strings.TrimSpace(k) { - case "version": - info.Version = strings.TrimSpace(v) - case "commit": - info.Commit = strings.TrimSpace(v) - case "built_at": - info.BuiltAt = strings.TrimSpace(v) - } - } - if info.Version == "" || info.Commit == "" || info.BuiltAt == "" { - return buildinfo.Info{}, fmt.Errorf("could not parse version/commit/built_at from %q", strings.TrimSpace(out)) - } - return info, nil -} - // runPostUpdateDoctor invokes `banger doctor` on the JUST-INSTALLED // CLI (not d.doctor — that's the in-process implementation; we want // to exercise the new binary end-to-end). diff --git a/internal/cli/commands_update_test.go b/internal/cli/commands_update_test.go deleted file mode 100644 index 7207008..0000000 --- a/internal/cli/commands_update_test.go +++ /dev/null @@ -1,79 +0,0 @@ -package cli - -import "testing" - -func TestParseVersionOutput(t *testing.T) { - cases := []struct { - name string - in string - wantVersion string - wantCommit string - wantBuilt string - wantErr bool - }{ - { - name: "happy path — three-line shape from banger version", - in: `version: v0.1.2 -commit: a0b5c7fa3ca95a37ba99b35280fc75e5647b59e8 -built_at: 2026-04-29T17:34:45Z -`, - wantVersion: "v0.1.2", - wantCommit: "a0b5c7fa3ca95a37ba99b35280fc75e5647b59e8", - wantBuilt: "2026-04-29T17:34:45Z", - }, - { - name: "tolerates extra whitespace around the values", - in: ` version : v0.1.2 - commit : abc123 - built_at : 2026-01-01T00:00:00Z`, - wantVersion: "v0.1.2", - wantCommit: "abc123", - wantBuilt: "2026-01-01T00:00:00Z", - }, - { - name: "missing commit field is rejected", - in: "version: v0.1.2\nbuilt_at: 2026-01-01T00:00:00Z\n", - wantErr: true, - }, - { - name: "empty input is rejected", - in: "", - wantErr: true, - }, - { - name: "unrelated lines are ignored", - in: `banger v0.1.2 -some other diagnostic line: with a colon -version: v0.1.2 -commit: abc -built_at: 2026-01-01T00:00:00Z -`, - wantVersion: "v0.1.2", - wantCommit: "abc", - wantBuilt: "2026-01-01T00:00:00Z", - }, - } - for _, tc := range cases { - t.Run(tc.name, func(t *testing.T) { - got, err := parseVersionOutput(tc.in) - if tc.wantErr { - if err == nil { - t.Fatalf("want error, got nil; parsed=%+v", got) - } - return - } - if err != nil { - t.Fatalf("unexpected error: %v", err) - } - if got.Version != tc.wantVersion { - t.Errorf("Version: got %q, want %q", got.Version, tc.wantVersion) - } - if got.Commit != tc.wantCommit { - t.Errorf("Commit: got %q, want %q", got.Commit, tc.wantCommit) - } - if got.BuiltAt != tc.wantBuilt { - t.Errorf("BuiltAt: got %q, want %q", got.BuiltAt, tc.wantBuilt) - } - }) - } -} diff --git a/internal/cli/commands_vm.go b/internal/cli/commands_vm.go index d30dfb2..8228a5b 100644 --- a/internal/cli/commands_vm.go +++ b/internal/cli/commands_vm.go @@ -35,11 +35,8 @@ provisions ssh, and drops you into the guest in one command. Use longer-lived VM you'll come back to. Quick reference: - banger vm run interactive sandbox (stays alive on disconnect) - banger vm run --rm -- script.sh ephemeral: VM auto-deletes on exit - banger vm run ./repo -- make test ship a repo, run a command, exit with its status - banger vm run --nat ./repo --nat: outbound internet (required for mise bootstrap) - banger vm run -d ./repo --nat -d/--detach: prep + bootstrap, exit (no ssh attach) + banger vm run ephemeral sandbox; --rm to delete on exit + banger vm run ./repo -- make test ship a repo, run a command, exit banger vm create --name dev persistent VM; pair with 'vm ssh' banger vm ssh open a shell in a running VM banger vm exec -- make test run a command in the workspace with mise toolchain @@ -48,7 +45,6 @@ Quick reference: banger vm delete stop + remove disks banger ps / banger vm list running / all VMs (use --all) banger vm logs guest console + daemon log - banger vm set --nat toggle NAT on an existing VM (--no-nat to remove) banger vm workspace prepare/export ship a repo in, pull diffs back `), Example: strings.TrimSpace(` @@ -95,9 +91,6 @@ func (d *deps) newVMRunCommand() *cobra.Command { removeOnExit bool includeUntracked bool dryRun bool - detach bool - skipBootstrap bool - verbose bool ) cmd := &cobra.Command{ Use: "run [path] [-- command args...]", @@ -105,43 +98,16 @@ func (d *deps) newVMRunCommand() *cobra.Command { Long: strings.TrimSpace(` Create a sandbox VM and either drop into an interactive shell or run a command. -Modes: +Three modes: banger vm run bare sandbox, drops into ssh banger vm run ./repo workspace sandbox, drops into ssh at /root/repo - banger vm run ./repo -- make test workspace + run command, exit with its status - banger vm run --rm -- script.sh ephemeral: VM auto-deletes when the session/command exits - banger vm run -d ./repo workspace + bootstrap, exit (reconnect with 'vm ssh') - -Workspace mode (path argument): - Passing a path copies the repo's git-tracked files into /root/repo - inside the guest. Untracked files are skipped by default — pass - --include-untracked to ship them too, or --dry-run to preview the - file list without creating a VM. - -Outbound internet (--nat): - Guests have no internet access by default. Pass --nat to enable - host-side MASQUERADE so the VM can reach the public network. NAT is - required whenever the workspace declares mise tooling (see below). - Toggle on an existing VM with 'banger vm set --nat '. - -Tooling bootstrap (workspace mode): - When the workspace contains a .mise.toml or .tool-versions, vm run - installs the listed tools via mise on first boot. The bootstrap - needs internet, so --nat must be set. Pass --no-bootstrap to skip - it entirely (no NAT requirement). - -Exit behaviour: - In command mode (-- ), the guest command's exit code propagates - through banger. Without --rm, the VM stays alive after the session - or command exits — reconnect with 'banger vm ssh '. With --rm, - the VM is deleted on exit (stdout/stderr are preserved). + banger vm run ./repo -- make test workspace, runs command, exits with its status `), Args: cobra.ArbitraryArgs, Example: strings.TrimSpace(` banger vm run banger vm run ../repo --name agent-box --branch feature/demo banger vm run ../repo -- make test - banger vm run -d ../repo --nat banger vm run -- uname -a `), RunE: func(cmd *cobra.Command, args []string) error { @@ -163,12 +129,6 @@ Exit behaviour: if sourcePath == "" && strings.TrimSpace(branchName) != "" { return errors.New("--branch requires a path argument") } - if detach && removeOnExit { - return errors.New("cannot combine --detach with --rm") - } - if detach && len(commandArgs) > 0 { - return errors.New("cannot combine --detach with a guest command") - } var repoPtr *vmRunRepo if sourcePath != "" { @@ -214,7 +174,7 @@ Exit behaviour: if err != nil { return err } - return d.runVMRun(cmd.Context(), layout.SocketPath, cfg, cmd.InOrStdin(), cmd.OutOrStdout(), cmd.ErrOrStderr(), params, repoPtr, commandArgs, removeOnExit, detach, skipBootstrap, verbose) + return d.runVMRun(cmd.Context(), layout.SocketPath, cfg, cmd.InOrStdin(), cmd.OutOrStdout(), cmd.ErrOrStderr(), params, repoPtr, commandArgs, removeOnExit) }, } cmd.Flags().StringVar(&name, "name", "", "vm name") @@ -223,15 +183,12 @@ Exit behaviour: cmd.Flags().IntVar(&memory, "memory", defaults.MemoryMiB, "memory in MiB") cmd.Flags().StringVar(&systemOverlaySize, "system-overlay-size", model.FormatSizeBytes(defaults.SystemOverlaySizeByte), "system overlay size") cmd.Flags().StringVar(&workDiskSize, "disk-size", model.FormatSizeBytes(defaults.WorkDiskSizeBytes), "work disk size") - cmd.Flags().BoolVar(&natEnabled, "nat", false, "enable outbound internet from the guest (host-side MASQUERADE; required when the workspace declares mise tooling)") + cmd.Flags().BoolVar(&natEnabled, "nat", false, "enable NAT") cmd.Flags().StringVar(&branchName, "branch", "", "create and switch to a new guest branch") cmd.Flags().StringVar(&fromRef, "from", "HEAD", "git ref to branch from when --branch is set (default: HEAD)") - cmd.Flags().BoolVar(&removeOnExit, "rm", false, "ephemeral mode: delete the VM (and its disks) after the ssh session / command exits") + cmd.Flags().BoolVar(&removeOnExit, "rm", false, "delete the VM after the ssh session / command exits") cmd.Flags().BoolVar(&includeUntracked, "include-untracked", false, "also copy untracked non-ignored files into the guest workspace (default: tracked files only)") cmd.Flags().BoolVar(&dryRun, "dry-run", false, "list the files that would be copied into the guest workspace and exit without creating a VM") - cmd.Flags().BoolVarP(&detach, "detach", "d", false, "detached mode: create the VM, run workspace prep + bootstrap synchronously, exit without ssh attach (reconnect with 'vm ssh')") - cmd.Flags().BoolVar(&skipBootstrap, "no-bootstrap", false, "skip the mise tooling bootstrap (no --nat requirement)") - cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "show every progress line instead of a single rewriting status line") _ = cmd.RegisterFlagCompletionFunc("image", d.completeImageNames) return cmd } @@ -395,7 +352,6 @@ func (d *deps) newVMCreateCommand() *cobra.Command { workDiskSize = model.FormatSizeBytes(defaults.WorkDiskSizeBytes) natEnabled bool noStart bool - verbose bool ) cmd := &cobra.Command{ Use: "create", @@ -423,7 +379,7 @@ Use 'vm create' for a longer-lived VM you'll come back to. Use if err != nil { return err } - vm, err := d.runVMCreate(cmd.Context(), layout.SocketPath, cmd.ErrOrStderr(), params, verbose) + vm, err := d.runVMCreate(cmd.Context(), layout.SocketPath, cmd.ErrOrStderr(), params) if err != nil { return err } @@ -436,9 +392,8 @@ Use 'vm create' for a longer-lived VM you'll come back to. Use cmd.Flags().IntVar(&memory, "memory", defaults.MemoryMiB, "memory in MiB") cmd.Flags().StringVar(&systemOverlaySize, "system-overlay-size", model.FormatSizeBytes(defaults.SystemOverlaySizeByte), "system overlay size") cmd.Flags().StringVar(&workDiskSize, "disk-size", model.FormatSizeBytes(defaults.WorkDiskSizeBytes), "work disk size") - cmd.Flags().BoolVar(&natEnabled, "nat", false, "enable outbound internet from the guest (host-side MASQUERADE)") + cmd.Flags().BoolVar(&natEnabled, "nat", false, "enable NAT") cmd.Flags().BoolVar(&noStart, "no-start", false, "create without starting") - cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "show every progress line instead of a single rewriting status line") _ = cmd.RegisterFlagCompletionFunc("image", d.completeImageNames) return cmd } diff --git a/internal/cli/daemon_lifecycle_test.go b/internal/cli/daemon_lifecycle_test.go index f4c7779..7b946f7 100644 --- a/internal/cli/daemon_lifecycle_test.go +++ b/internal/cli/daemon_lifecycle_test.go @@ -142,7 +142,6 @@ func TestRenderSystemdUnitIncludesHardeningDirectives(t *testing.T) { "Wants=network-online.target bangerd-root.service", "After=bangerd-root.service", "Requires=bangerd-root.service", - "KillMode=process", "UMask=0077", "Environment=TMPDIR=/run/banger", "NoNewPrivileges=yes", @@ -164,7 +163,6 @@ func TestRenderSystemdUnitIncludesHardeningDirectives(t *testing.T) { "CacheDirectoryMode=0700", "RuntimeDirectory=banger", "RuntimeDirectoryMode=0700", - "RuntimeDirectoryPreserve=yes", `ReadOnlyPaths="/home/alice/dev home"`, } { if !strings.Contains(unit, want) { @@ -178,15 +176,6 @@ func TestRenderRootHelperSystemdUnitIncludesRequiredCapabilities(t *testing.T) { for _, want := range []string{ "ExecStart=/usr/local/bin/bangerd --root-helper", - // Both directives are load-bearing for "VM survives helper - // restart": KillMode=process limits the initial SIGTERM to - // the helper main, SendSIGKILL=no disables the SIGKILL - // escalation. The helper itself does the cgroup reparent - // (see roothelper.reparentToBangerFCCgroup) — without - // that, even these directives leave firecracker exposed to - // systemd's stop-time cleanup. - "KillMode=process", - "SendSIGKILL=no", "Environment=TMPDIR=/run/banger-root", "NoNewPrivileges=yes", "PrivateTmp=yes", @@ -198,7 +187,6 @@ func TestRenderRootHelperSystemdUnitIncludesRequiredCapabilities(t *testing.T) { "ReadWritePaths=/var/lib/banger", "RuntimeDirectory=banger-root", "RuntimeDirectoryMode=0711", - "RuntimeDirectoryPreserve=yes", } { if !strings.Contains(unit, want) { t.Fatalf("unit = %q, want %q", unit, want) diff --git a/internal/cli/printers.go b/internal/cli/printers.go index afedbc8..d4ea646 100644 --- a/internal/cli/printers.go +++ b/internal/cli/printers.go @@ -272,34 +272,9 @@ func printKernelCatalogTable(out anyWriter, entries []api.KernelCatalogEntry) er // -- doctor printer ------------------------------------------------- -func printDoctorReport(out anyWriter, report system.Report, verbose bool) error { +func printDoctorReport(out anyWriter, report system.Report) error { colorWriter, _ := out.(io.Writer) - - var passes, warns, fails int - for _, c := range report.Checks { - switch c.Status { - case system.CheckStatusPass: - passes++ - case system.CheckStatusWarn: - warns++ - case system.CheckStatusFail: - fails++ - } - } - - if !verbose && warns == 0 && fails == 0 { - msg := fmt.Sprintf("all %d checks passed", passes) - if colorWriter != nil { - msg = style.Pass(colorWriter, msg) - } - _, err := fmt.Fprintln(out, msg) - return err - } - for _, check := range report.Checks { - if !verbose && check.Status == system.CheckStatusPass { - continue - } status := strings.ToUpper(string(check.Status)) if colorWriter != nil { switch check.Status { @@ -320,19 +295,5 @@ func printDoctorReport(out anyWriter, report system.Report, verbose bool) error } } } - - if !verbose { - if _, err := fmt.Fprintf(out, "\n%d passed, %s, %s\n", passes, pluralCount(warns, "warning"), pluralCount(fails, "failure")); err != nil { - return err - } - } - return nil } - -func pluralCount(n int, word string) string { - if n == 1 { - return fmt.Sprintf("%d %s", n, word) - } - return fmt.Sprintf("%d %ss", n, word) -} diff --git a/internal/cli/printers_test.go b/internal/cli/printers_test.go deleted file mode 100644 index 3018ca8..0000000 --- a/internal/cli/printers_test.go +++ /dev/null @@ -1,88 +0,0 @@ -package cli - -import ( - "bytes" - "strings" - "testing" - - "banger/internal/system" -) - -func TestPrintDoctorReport_BriefAllPass(t *testing.T) { - report := system.Report{} - report.AddPass("first", "detail one") - report.AddPass("second", "detail two") - report.AddPass("third") - - var buf bytes.Buffer - if err := printDoctorReport(&buf, report, false); err != nil { - t.Fatalf("printDoctorReport: %v", err) - } - - got := buf.String() - want := "all 3 checks passed\n" - if got != want { - t.Fatalf("brief all-pass output\n got: %q\nwant: %q", got, want) - } -} - -func TestPrintDoctorReport_BriefHidesPassDetails(t *testing.T) { - report := system.Report{} - report.AddPass("first", "detail one") - report.AddWarn("second", "warn detail") - report.AddPass("third", "detail three") - report.AddFail("fourth", "fail detail") - - var buf bytes.Buffer - if err := printDoctorReport(&buf, report, false); err != nil { - t.Fatalf("printDoctorReport: %v", err) - } - - got := buf.String() - if strings.Contains(got, "PASS") || strings.Contains(got, "first") || strings.Contains(got, "third") { - t.Fatalf("brief mode leaked PASS rows: %q", got) - } - for _, want := range []string{"WARN\tsecond", "warn detail", "FAIL\tfourth", "fail detail"} { - if !strings.Contains(got, want) { - t.Fatalf("missing %q in brief output: %q", want, got) - } - } - if !strings.Contains(got, "2 passed, 1 warning, 1 failure") { - t.Fatalf("missing summary footer in: %q", got) - } -} - -func TestPrintDoctorReport_BriefSummaryPlurals(t *testing.T) { - report := system.Report{} - report.AddPass("a") - report.AddWarn("b") - report.AddWarn("c") - - var buf bytes.Buffer - if err := printDoctorReport(&buf, report, false); err != nil { - t.Fatalf("printDoctorReport: %v", err) - } - if !strings.Contains(buf.String(), "1 passed, 2 warnings, 0 failures") { - t.Fatalf("plural counts wrong: %q", buf.String()) - } -} - -func TestPrintDoctorReport_VerboseShowsEverything(t *testing.T) { - report := system.Report{} - report.AddPass("first", "detail one") - report.AddWarn("second", "warn detail") - - var buf bytes.Buffer - if err := printDoctorReport(&buf, report, true); err != nil { - t.Fatalf("printDoctorReport: %v", err) - } - got := buf.String() - for _, want := range []string{"PASS\tfirst", "detail one", "WARN\tsecond", "warn detail"} { - if !strings.Contains(got, want) { - t.Fatalf("verbose mode missing %q: %q", want, got) - } - } - if strings.Contains(got, "passed,") { - t.Fatalf("verbose mode should not print summary footer: %q", got) - } -} diff --git a/internal/cli/vm_create.go b/internal/cli/vm_create.go index 144050f..63c0858 100644 --- a/internal/cli/vm_create.go +++ b/internal/cli/vm_create.go @@ -61,14 +61,14 @@ func printVMSpecLine(out io.Writer, params api.VMCreateParams) { // gets the spec line up front and the progress renderer thereafter. // On context cancel we cooperate with the daemon to cancel the // in-flight op so it doesn't leak partially-created VM state. -func (d *deps) runVMCreate(ctx context.Context, socketPath string, stderr io.Writer, params api.VMCreateParams, verbose bool) (model.VMRecord, error) { +func (d *deps) runVMCreate(ctx context.Context, socketPath string, stderr io.Writer, params api.VMCreateParams) (model.VMRecord, error) { start := time.Now() printVMSpecLine(stderr, params) begin, err := d.vmCreateBegin(ctx, socketPath, params) if err != nil { return model.VMRecord{}, err } - renderer := newVMCreateProgressRenderer(stderr, verbose) + renderer := newVMCreateProgressRenderer(stderr) renderer.render(begin.Operation) op := begin.Operation @@ -76,7 +76,6 @@ func (d *deps) runVMCreate(ctx context.Context, socketPath string, stderr io.Wri if op.Done { renderer.render(op) if op.Success && op.VM != nil { - renderer.clear() elapsed := formatVMCreateElapsed(time.Since(start)) _, _ = fmt.Fprintf(stderr, "[vm create] ready in %s\n", style.Dim(stderr, elapsed)) return *op.VM, nil @@ -114,22 +113,13 @@ func (d *deps) runVMCreate(ctx context.Context, socketPath string, stderr io.Wri type vmCreateProgressRenderer struct { out io.Writer enabled bool - inline bool - active bool lastLine string } -// newVMCreateProgressRenderer wires up progress for `vm create`. On -// non-TTY writers it stays disabled (CI/test logs already capture the -// spec + ready lines); on TTY it rewrites a single line via \r unless -// verbose is set or BANGER_NO_PROGRESS is exported, in which case it -// falls back to one line per stage. -func newVMCreateProgressRenderer(out io.Writer, verbose bool) *vmCreateProgressRenderer { - tty := writerSupportsProgress(out) +func newVMCreateProgressRenderer(out io.Writer) *vmCreateProgressRenderer { return &vmCreateProgressRenderer{ out: out, - enabled: tty, - inline: tty && !verbose && !progressDisabledByEnv(), + enabled: writerSupportsProgress(out), } } @@ -142,32 +132,9 @@ func (r *vmCreateProgressRenderer) render(op api.VMCreateOperation) { return } r.lastLine = line - if r.inline { - _, _ = fmt.Fprint(r.out, "\r\x1b[K", line) - r.active = true - return - } _, _ = fmt.Fprintln(r.out, line) } -// clear resets the live inline line so the caller can write a clean -// terminating message. No-op outside inline mode. -func (r *vmCreateProgressRenderer) clear() { - if r == nil || !r.enabled || !r.inline || !r.active { - return - } - _, _ = fmt.Fprint(r.out, "\r\x1b[K") - r.active = false - r.lastLine = "" -} - -// progressDisabledByEnv is the BANGER_NO_PROGRESS escape hatch — a -// non-empty value forces line-per-stage output even on a TTY, so users -// can pipe `script(1)` / tmux capture without \r artifacts. -func progressDisabledByEnv() bool { - return strings.TrimSpace(os.Getenv("BANGER_NO_PROGRESS")) != "" -} - // writerSupportsProgress returns true only when out is a terminal. // Keeps stage lines + heartbeat dots out of piped / logged output // where they'd just be noise. diff --git a/internal/cli/vm_exec.go b/internal/cli/vm_exec.go index 2ec862a..cfd8453 100644 --- a/internal/cli/vm_exec.go +++ b/internal/cli/vm_exec.go @@ -21,14 +21,13 @@ func (d *deps) newVMExecCommand() *cobra.Command { Use: "exec -- [args...]", Short: "Run a command in the VM workspace with the repo toolchain", Long: strings.TrimSpace(` -Run a command inside a persistent VM, wrapping it with 'mise exec' so -all mise-managed tools (Go, Node, Python, etc.) are on PATH. +Run a command inside a persistent VM, automatically cd-ing into the +prepared workspace and wrapping the command with 'mise exec' so all +mise-managed tools (Go, Node, Python, etc.) are on PATH. -If the VM has a prepared workspace (from 'vm workspace prepare' or -'vm run ./repo'), the command runs from that directory and a stale- -workspace warning is printed when the host repo has advanced since the -last prepare; pass --auto-prepare to re-sync first. Otherwise the -command runs from root's home directory. --guest-path overrides both. +The workspace path comes from the last 'vm workspace prepare' or +'vm run ./repo' on this VM. If the host repo has advanced since then, +banger warns; pass --auto-prepare to re-sync the workspace first. Exit code of the guest command is propagated verbatim. `), @@ -77,14 +76,13 @@ Exit code of the guest command is propagated verbatim. return fmt.Errorf("vm %q is not running (state: %s)", vm.Name, vm.State) } - // Resolve effective guest workspace path. Empty means "no - // cd": run from the SSH session's default cwd ($HOME). We - // only auto-cd when the user explicitly passed --guest-path - // or the VM actually has a recorded workspace — otherwise - // arbitrary VMs (no repo) would fail with cd errors. + // Resolve effective guest workspace path. execGuestPath := strings.TrimSpace(guestPath) if execGuestPath == "" { - execGuestPath = strings.TrimSpace(vm.Workspace.GuestPath) + execGuestPath = vm.Workspace.GuestPath + } + if execGuestPath == "" { + execGuestPath = "/root/repo" } // Dirty-workspace check: compare stored HEAD with current host HEAD. @@ -132,18 +130,15 @@ Exit code of the guest command is propagated verbatim. return nil }, } - cmd.Flags().StringVar(&guestPath, "guest-path", "", "workspace directory in the guest (default: from last workspace prepare; otherwise root's home)") + cmd.Flags().StringVar(&guestPath, "guest-path", "", "workspace directory in the guest (default: from last workspace prepare, or /root/repo)") cmd.Flags().BoolVar(&autoPrepare, "auto-prepare", false, "re-sync the workspace from the host repo before running if it's stale") _ = cmd.RegisterFlagCompletionFunc("guest-path", cobra.NoFileCompletions) return cmd } -// buildVMExecScript returns the bash -lc argument that runs the -// command through mise exec when mise is available, falling back to a -// plain exec if it's not. When guestPath is non-empty, the script -// cd's into it first (workspace mode); when empty, the command runs -// from the SSH session's default cwd so VMs without a prepared -// workspace don't blow up on a non-existent /root/repo. Each command +// buildVMExecScript returns the bash -lc argument that cd's into the +// workspace and runs the command through mise exec when mise is +// available, falling back to a plain exec if it's not. Each command // argument is shell-quoted so spaces and special characters survive // the bash re-parse inside the -lc string. func buildVMExecScript(guestPath string, command []string) string { @@ -152,15 +147,12 @@ func buildVMExecScript(guestPath string, command []string) string { parts[i] = shellQuote(a) } quotedCmd := strings.Join(parts, " ") - body := fmt.Sprintf( - "if command -v mise >/dev/null 2>&1; then mise exec -- %s; else %s; fi", + return fmt.Sprintf( + "cd %s && if command -v mise >/dev/null 2>&1; then mise exec -- %s; else %s; fi", + shellQuote(guestPath), quotedCmd, quotedCmd, ) - if guestPath == "" { - return body - } - return fmt.Sprintf("cd %s && %s", shellQuote(guestPath), body) } // vmExecDirtyCheck compares the HEAD commit stored in the VM's diff --git a/internal/cli/vm_exec_test.go b/internal/cli/vm_exec_test.go deleted file mode 100644 index e57f5af..0000000 --- a/internal/cli/vm_exec_test.go +++ /dev/null @@ -1,35 +0,0 @@ -package cli - -import ( - "strings" - "testing" -) - -func TestBuildVMExecScriptWithGuestPath(t *testing.T) { - got := buildVMExecScript("/root/repo", []string{"make", "test"}) - want := "cd '/root/repo' && if command -v mise >/dev/null 2>&1; then mise exec -- 'make' 'test'; else 'make' 'test'; fi" - if got != want { - t.Fatalf("buildVMExecScript with path:\n got: %q\n want: %q", got, want) - } -} - -func TestBuildVMExecScriptWithoutGuestPath(t *testing.T) { - got := buildVMExecScript("", []string{"whoami"}) - want := "if command -v mise >/dev/null 2>&1; then mise exec -- 'whoami'; else 'whoami'; fi" - if got != want { - t.Fatalf("buildVMExecScript without path:\n got: %q\n want: %q", got, want) - } - if strings.Contains(got, "cd ") { - t.Fatalf("expected no cd when guestPath is empty, got: %q", got) - } -} - -func TestBuildVMExecScriptShellQuotesPathWithSpaces(t *testing.T) { - got := buildVMExecScript("/tmp/with space", []string{"echo", "a b"}) - if !strings.Contains(got, "cd '/tmp/with space'") { - t.Fatalf("expected guest path to be shell-quoted, got: %q", got) - } - if !strings.Contains(got, "mise exec -- 'echo' 'a b'") { - t.Fatalf("expected command args to be shell-quoted, got: %q", got) - } -} diff --git a/internal/cli/vm_run.go b/internal/cli/vm_run.go index 2a8f60b..1b8b182 100644 --- a/internal/cli/vm_run.go +++ b/internal/cli/vm_run.go @@ -114,23 +114,6 @@ func (d *deps) vmRunPreflightRepo(ctx context.Context, rawPath string) (string, return sourcePath, nil } -// repoHasMiseFiles reports whether the repo at sourcePath contains a -// mise tooling manifest. Used as a host-side preflight: when --nat is -// off and a manifest is present, vm run refuses early instead of -// committing to a VM that will silently fail to install tools. -func repoHasMiseFiles(sourcePath string) (bool, error) { - for _, name := range []string{".mise.toml", ".tool-versions"} { - info, err := os.Stat(filepath.Join(sourcePath, name)) - if err == nil && !info.IsDir() { - return true, nil - } - if err != nil && !errors.Is(err, os.ErrNotExist) { - return false, fmt.Errorf("inspect %s: %w", name, err) - } - } - return false, nil -} - // splitVMRunArgs partitions cobra positional args into the optional path // argument and the trailing command (everything after a `--` separator). // The path slice may contain 0..1 entries; the command slice may be empty. @@ -149,19 +132,9 @@ func splitVMRunArgs(cmd *cobra.Command, args []string) (pathArgs, commandArgs [] // for guest ssh, optionally materialise a workspace and kick off the // tooling bootstrap, then either attach interactively or run the // user's command and propagate its exit status. -func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.DaemonConfig, stdin io.Reader, stdout, stderr io.Writer, params api.VMCreateParams, repo *vmRunRepo, command []string, removeOnExit, detach, skipBootstrap, verbose bool) error { - if repo != nil && !skipBootstrap && !params.NATEnabled { - hasMise, err := repoHasMiseFiles(repo.sourcePath) - if err != nil { - return err - } - if hasMise { - return errors.New("tooling bootstrap requires --nat (or pass --no-bootstrap to skip)") - } - } - progress := newVMRunProgressRenderer(stderr, verbose) - defer progress.clear() - vm, err := d.runVMCreate(ctx, socketPath, stderr, params, verbose) +func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.DaemonConfig, stdin io.Reader, stdout, stderr io.Writer, params api.VMCreateParams, repo *vmRunRepo, command []string, removeOnExit bool) error { + progress := newVMRunProgressRenderer(stderr) + vm, err := d.runVMCreate(ctx, socketPath, stderr, params) if err != nil { return err } @@ -184,10 +157,8 @@ func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.Daemon cleanupCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() if err := d.vmDelete(cleanupCtx, socketPath, vmRef); err != nil { - progress.clear() printVMRunWarning(stderr, fmt.Sprintf("--rm cleanup failed: %v (leaked vm %q; delete manually)", err, vmRef)) } else if err := removeUserKnownHosts(vm); err != nil { - progress.clear() printVMRunWarning(stderr, fmt.Sprintf("known_hosts cleanup failed: %v", err)) } }() @@ -226,7 +197,6 @@ func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.Daemon fromRef = repo.fromRef } if !repo.includeUntracked { - progress.clear() d.noteUntrackedSkipped(ctx, stderr, repo.sourcePath) } prepared, err := d.vmWorkspacePrepare(ctx, socketPath, api.VMWorkspacePrepareParams{ @@ -244,29 +214,23 @@ func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.Daemon // The prepare RPC already did the full git inspection on the // daemon side; grab what the tooling harness needs from its // result instead of re-inspecting here. - if len(command) == 0 && !skipBootstrap { + if len(command) == 0 { client, err := d.guestDial(ctx, sshAddress, cfg.SSHKeyPath) if err != nil { return fmt.Errorf("vm %q is running but guest ssh is unavailable: %w", vmRef, err) } - if err := d.startVMRunToolingHarness(ctx, client, prepared.Workspace.RepoRoot, prepared.Workspace.RepoName, progress, detach, stderr); err != nil { - progress.clear() + if err := d.startVMRunToolingHarness(ctx, client, prepared.Workspace.RepoRoot, prepared.Workspace.RepoName, progress); err != nil { printVMRunWarning(stderr, fmt.Sprintf("guest tooling bootstrap start failed: %v", err)) } _ = client.Close() } } - if detach { - progress.commitLine(fmt.Sprintf("vm %s running; reconnect with: banger vm ssh %s", vmRef, vmRef)) - return nil - } sshArgs, err := sshCommandArgs(cfg, vm.Runtime.GuestIP, command) if err != nil { return fmt.Errorf("vm %q is running but ssh args could not be built: %w", vmRef, err) } if len(command) > 0 { progress.render("running command in guest") - progress.clear() if err := d.sshExec(ctx, stdin, stdout, stderr, sshArgs); err != nil { var exitErr *exec.ExitError if errors.As(err, &exitErr) { @@ -277,7 +241,6 @@ func (d *deps) runVMRun(ctx context.Context, socketPath string, cfg model.Daemon return nil } progress.render("attaching to guest") - progress.clear() return d.runSSHSession(ctx, socketPath, vmRef, stdin, stdout, stderr, sshArgs, removeOnExit) } @@ -297,13 +260,7 @@ func vmRunToolingHarnessLogPath(repoName string) string { // script inside the guest. repoRoot / repoName both come from the // daemon's workspace.prepare RPC response so the CLI doesn't have // to re-inspect the git tree. -// -// When wait is true (used by --detach), the harness runs in the -// foreground so the CLI can return only after bootstrap finishes; -// the harness's stdout is streamed to syncOut for live visibility. -// When wait is false (interactive mode), the harness is nohup'd so -// the user's ssh session can start while bootstrap continues. -func (d *deps) startVMRunToolingHarness(ctx context.Context, client vmRunGuestClient, repoRoot, repoName string, progress *vmRunProgressRenderer, wait bool, syncOut io.Writer) error { +func (d *deps) startVMRunToolingHarness(ctx context.Context, client vmRunGuestClient, repoRoot, repoName string, progress *vmRunProgressRenderer) error { if progress != nil { progress.render("starting guest tooling bootstrap") } @@ -312,20 +269,6 @@ func (d *deps) startVMRunToolingHarness(ctx context.Context, client vmRunGuestCl if err := client.UploadFile(ctx, vmRunToolingHarnessPath(repoName), 0o755, []byte(vmRunToolingHarnessScript(plan)), &uploadLog); err != nil { return formatVMRunStepError("upload guest tooling bootstrap", err, uploadLog.String()) } - if wait { - var launchLog bytes.Buffer - out := io.Writer(&launchLog) - if syncOut != nil { - out = io.MultiWriter(syncOut, &launchLog) - } - if err := client.RunScript(ctx, vmRunToolingHarnessSyncScript(repoName), out); err != nil { - return formatVMRunStepError("run guest tooling bootstrap", err, launchLog.String()) - } - if progress != nil { - progress.render("guest tooling bootstrap done (log: " + vmRunToolingHarnessLogPath(repoName) + ")") - } - return nil - } var launchLog bytes.Buffer if err := client.RunScript(ctx, vmRunToolingHarnessLaunchScript(repoName), &launchLog); err != nil { return formatVMRunStepError("launch guest tooling bootstrap", err, launchLog.String()) @@ -424,20 +367,6 @@ func vmRunToolingHarnessLaunchScript(repoName string) string { return script.String() } -// vmRunToolingHarnessSyncScript is the foreground variant used by -// --detach: it tees the harness output to both the log file and the -// caller's stdout so the host-side CLI can stream live progress while -// still preserving the log for later inspection. -func vmRunToolingHarnessSyncScript(repoName string) string { - var script strings.Builder - script.WriteString("set -uo pipefail\n") - fmt.Fprintf(&script, "HELPER=%s\n", shellQuote(vmRunToolingHarnessPath(repoName))) - fmt.Fprintf(&script, "LOG=%s\n", shellQuote(vmRunToolingHarnessLogPath(repoName))) - script.WriteString("mkdir -p \"$(dirname \"$LOG\")\"\n") - script.WriteString("bash \"$HELPER\" 2>&1 | tee \"$LOG\"\n") - return script.String() -} - func formatVMRunStepError(action string, err error, log string) error { log = strings.TrimSpace(log) if log == "" { @@ -449,24 +378,13 @@ func formatVMRunStepError(action string, err error, log string) error { type vmRunProgressRenderer struct { out io.Writer enabled bool - inline bool - active bool lastLine string } -// newVMRunProgressRenderer wires up progress for `vm run`. Unlike the -// vm_create renderer, this one emits in line mode even on non-TTY -// writers (covers tests and piped output that the existing tooling -// already parses); inline mode kicks in only when stderr is a TTY, -// verbose is unset, and BANGER_NO_PROGRESS is unset. -func newVMRunProgressRenderer(out io.Writer, verbose bool) *vmRunProgressRenderer { - if out == nil { - return &vmRunProgressRenderer{} - } +func newVMRunProgressRenderer(out io.Writer) *vmRunProgressRenderer { return &vmRunProgressRenderer{ out: out, - enabled: true, - inline: writerSupportsProgress(out) && !verbose && !progressDisabledByEnv(), + enabled: out != nil, } } @@ -479,47 +397,6 @@ func (r *vmRunProgressRenderer) render(detail string) { return } r.lastLine = line - if r.inline { - _, _ = fmt.Fprint(r.out, "\r\x1b[K", line) - r.active = true - return - } - _, _ = fmt.Fprintln(r.out, line) -} - -// clear erases the live inline line so the caller can write a clean -// terminating message (warning, ssh attach, command output). No-op -// outside inline mode. -func (r *vmRunProgressRenderer) clear() { - if r == nil || !r.enabled || !r.inline || !r.active { - return - } - _, _ = fmt.Fprint(r.out, "\r\x1b[K") - r.active = false - r.lastLine = "" -} - -// commitLine prints detail as a final, persistent line. In inline -// mode it overwrites the live status; in line mode it just appends. -// Used for terminal messages like the --detach hand-off summary. -func (r *vmRunProgressRenderer) commitLine(detail string) { - if r == nil || !r.enabled { - return - } - line := formatVMRunProgress(detail) - if line == "" { - return - } - if r.inline { - _, _ = fmt.Fprint(r.out, "\r\x1b[K", line, "\n") - r.active = false - r.lastLine = "" - return - } - if line == r.lastLine { - return - } - r.lastLine = line _, _ = fmt.Fprintln(r.out, line) } diff --git a/internal/cli/vm_run_test.go b/internal/cli/vm_run_test.go deleted file mode 100644 index cab4f5d..0000000 --- a/internal/cli/vm_run_test.go +++ /dev/null @@ -1,278 +0,0 @@ -package cli - -import ( - "bytes" - "context" - "io" - "os" - "path/filepath" - "strings" - "testing" - "time" - - "banger/internal/api" - "banger/internal/model" - "banger/internal/toolingplan" -) - -func TestVMRunRejectsDetachWithRm(t *testing.T) { - cmd := NewBangerCommand() - cmd.SetArgs([]string{"vm", "run", "-d", "--rm"}) - - err := cmd.Execute() - if err == nil || !strings.Contains(err.Error(), "cannot combine --detach with --rm") { - t.Fatalf("Execute() error = %v, want --detach + --rm rejection", err) - } -} - -func TestVMRunRejectsDetachWithCommand(t *testing.T) { - cmd := NewBangerCommand() - cmd.SetArgs([]string{"vm", "run", "-d", "--", "whoami"}) - - err := cmd.Execute() - if err == nil || !strings.Contains(err.Error(), "cannot combine --detach with a guest command") { - t.Fatalf("Execute() error = %v, want --detach + command rejection", err) - } -} - -func TestRepoHasMiseFiles(t *testing.T) { - dir := t.TempDir() - got, err := repoHasMiseFiles(dir) - if err != nil { - t.Fatalf("repoHasMiseFiles(empty): %v", err) - } - if got { - t.Fatalf("repoHasMiseFiles(empty) = true, want false") - } - - if err := os.WriteFile(filepath.Join(dir, ".mise.toml"), []byte(""), 0o600); err != nil { - t.Fatalf("write .mise.toml: %v", err) - } - got, err = repoHasMiseFiles(dir) - if err != nil { - t.Fatalf("repoHasMiseFiles(.mise.toml): %v", err) - } - if !got { - t.Fatalf("repoHasMiseFiles(.mise.toml) = false, want true") - } - - dir2 := t.TempDir() - if err := os.WriteFile(filepath.Join(dir2, ".tool-versions"), []byte(""), 0o600); err != nil { - t.Fatalf("write .tool-versions: %v", err) - } - got, err = repoHasMiseFiles(dir2) - if err != nil { - t.Fatalf("repoHasMiseFiles(.tool-versions): %v", err) - } - if !got { - t.Fatalf("repoHasMiseFiles(.tool-versions) = false, want true") - } -} - -// runVMRunDepsRunningVM returns a deps wired so runVMRun reaches a -// point where it would create a VM and proceed — used by precondition -// tests that should refuse before any of these fakes get called. -func runVMRunDepsRunningVM(t *testing.T) (*deps, *model.VMRecord) { - t.Helper() - d := defaultDeps() - vm := &model.VMRecord{ - ID: "vm-id", - Name: "devbox", - Runtime: model.VMRuntime{ - State: model.VMStateRunning, - GuestIP: "172.16.0.2", - DNSName: "devbox.vm", - }, - } - d.vmCreateBegin = func(context.Context, string, api.VMCreateParams) (api.VMCreateBeginResult, error) { - return api.VMCreateBeginResult{Operation: api.VMCreateOperation{ID: "op-1", Stage: "ready", Done: true, Success: true, VM: vm}}, nil - } - d.guestWaitForSSH = func(context.Context, string, string, time.Duration) error { return nil } - d.vmWorkspacePrepare = func(context.Context, string, api.VMWorkspacePrepareParams) (api.VMWorkspacePrepareResult, error) { - return api.VMWorkspacePrepareResult{Workspace: model.WorkspacePrepareResult{VMID: vm.ID, GuestPath: "/root/repo", RepoName: "repo", RepoRoot: "/tmp/repo"}}, nil - } - d.buildVMRunToolingPlan = func(context.Context, string) toolingplan.Plan { - return toolingplan.Plan{} - } - d.vmHealth = func(context.Context, string, string) (api.VMHealthResult, error) { - return api.VMHealthResult{Healthy: true}, nil - } - d.sshExec = func(context.Context, io.Reader, io.Writer, io.Writer, []string) error { return nil } - return d, vm -} - -func TestRunVMRunRefusesBootstrapWithoutNAT(t *testing.T) { - repoRoot := t.TempDir() - if err := os.WriteFile(filepath.Join(repoRoot, ".mise.toml"), []byte(""), 0o600); err != nil { - t.Fatalf("write .mise.toml: %v", err) - } - - d := defaultDeps() - d.vmCreateBegin = func(context.Context, string, api.VMCreateParams) (api.VMCreateBeginResult, error) { - t.Fatal("vmCreateBegin should not be called when NAT precondition refuses") - return api.VMCreateBeginResult{}, nil - } - - repo := vmRunRepo{sourcePath: repoRoot} - var stdout, stderr bytes.Buffer - err := d.runVMRun( - context.Background(), - "/tmp/bangerd.sock", - model.DaemonConfig{SSHKeyPath: "/tmp/id_ed25519"}, - strings.NewReader(""), - &stdout, &stderr, - api.VMCreateParams{Name: "devbox", NATEnabled: false}, - &repo, - nil, - false, false, false, false, - ) - if err == nil || !strings.Contains(err.Error(), "tooling bootstrap requires --nat") { - t.Fatalf("runVMRun = %v, want NAT precondition refusal", err) - } -} - -func TestRunVMRunBootstrapPreconditionRespectsNoBootstrap(t *testing.T) { - repoRoot := t.TempDir() - if err := os.WriteFile(filepath.Join(repoRoot, ".mise.toml"), []byte(""), 0o600); err != nil { - t.Fatalf("write .mise.toml: %v", err) - } - - d, _ := runVMRunDepsRunningVM(t) - dialed := false - d.guestDial = func(context.Context, string, string) (vmRunGuestClient, error) { - dialed = true - return &testVMRunGuestClient{}, nil - } - - repo := vmRunRepo{sourcePath: repoRoot} - var stdout, stderr bytes.Buffer - err := d.runVMRun( - context.Background(), - "/tmp/bangerd.sock", - model.DaemonConfig{SSHKeyPath: "/tmp/id_ed25519"}, - strings.NewReader(""), - &stdout, &stderr, - api.VMCreateParams{Name: "devbox", NATEnabled: false}, - &repo, - nil, - false, false, true, false, // skipBootstrap = true - ) - if err != nil { - t.Fatalf("runVMRun: %v", err) - } - if dialed { - t.Fatal("guestDial should not be called when --no-bootstrap is set") - } -} - -func TestRunVMRunBootstrapPreconditionPassesWithoutMiseFiles(t *testing.T) { - repoRoot := t.TempDir() // empty repo, no mise files - - d, _ := runVMRunDepsRunningVM(t) - dialed := false - d.guestDial = func(context.Context, string, string) (vmRunGuestClient, error) { - dialed = true - return &testVMRunGuestClient{}, nil - } - - repo := vmRunRepo{sourcePath: repoRoot} - var stdout, stderr bytes.Buffer - err := d.runVMRun( - context.Background(), - "/tmp/bangerd.sock", - model.DaemonConfig{SSHKeyPath: "/tmp/id_ed25519"}, - strings.NewReader(""), - &stdout, &stderr, - api.VMCreateParams{Name: "devbox", NATEnabled: false}, - &repo, - nil, - false, false, false, false, - ) - if err != nil { - t.Fatalf("runVMRun: %v", err) - } - // Bootstrap dispatch happens (no mise file gating) but dial still - // gets called because the harness pipeline runs. - if !dialed { - t.Fatal("guestDial should be called for bootstrap dispatch") - } -} - -func TestRunVMRunDetachSkipsSshAttach(t *testing.T) { - d, _ := runVMRunDepsRunningVM(t) - d.guestDial = func(context.Context, string, string) (vmRunGuestClient, error) { - return &testVMRunGuestClient{}, nil - } - sshExecCalls := 0 - d.sshExec = func(context.Context, io.Reader, io.Writer, io.Writer, []string) error { - sshExecCalls++ - return nil - } - - var stdout, stderr bytes.Buffer - err := d.runVMRun( - context.Background(), - "/tmp/bangerd.sock", - model.DaemonConfig{SSHKeyPath: "/tmp/id_ed25519"}, - strings.NewReader(""), - &stdout, &stderr, - api.VMCreateParams{Name: "devbox"}, - nil, // bare mode - nil, // no command - false, true, false, false, // detach = true - ) - if err != nil { - t.Fatalf("runVMRun: %v", err) - } - if sshExecCalls != 0 { - t.Fatalf("sshExec called %d times, want 0 in detach mode", sshExecCalls) - } - if !strings.Contains(stderr.String(), "reconnect with: banger vm ssh devbox") { - t.Fatalf("stderr = %q, want reconnect hint", stderr.String()) - } -} - -func TestRunVMRunDetachUsesSyncBootstrapPath(t *testing.T) { - repoRoot := t.TempDir() - - d, _ := runVMRunDepsRunningVM(t) - fakeClient := &testVMRunGuestClient{} - d.guestDial = func(context.Context, string, string) (vmRunGuestClient, error) { - return fakeClient, nil - } - sshExecCalls := 0 - d.sshExec = func(context.Context, io.Reader, io.Writer, io.Writer, []string) error { - sshExecCalls++ - return nil - } - - repo := vmRunRepo{sourcePath: repoRoot} - var stdout, stderr bytes.Buffer - err := d.runVMRun( - context.Background(), - "/tmp/bangerd.sock", - model.DaemonConfig{SSHKeyPath: "/tmp/id_ed25519"}, - strings.NewReader(""), - &stdout, &stderr, - api.VMCreateParams{Name: "devbox", NATEnabled: true}, - &repo, - nil, - false, true, false, false, // detach = true - ) - if err != nil { - t.Fatalf("runVMRun: %v", err) - } - if sshExecCalls != 0 { - t.Fatalf("sshExec called %d times, want 0 in detach mode", sshExecCalls) - } - if len(fakeClient.uploads) != 1 { - t.Fatalf("uploads = %d, want 1 (harness upload)", len(fakeClient.uploads)) - } - // Sync mode should invoke the tee'd wrapper, not the nohup launcher. - if strings.Contains(fakeClient.launchScript, "nohup") { - t.Fatalf("detach mode should not use nohup launcher; got: %q", fakeClient.launchScript) - } - if !strings.Contains(fakeClient.launchScript, "tee") { - t.Fatalf("detach mode should tee output to log; got: %q", fakeClient.launchScript) - } -} diff --git a/internal/daemon/capabilities.go b/internal/daemon/capabilities.go index b99ba4a..89fa5e9 100644 --- a/internal/daemon/capabilities.go +++ b/internal/daemon/capabilities.go @@ -247,9 +247,6 @@ func (c workDiskCapability) PrepareHost(ctx context.Context, vm *model.VMRecord, if err := c.ws.ensureAuthorizedKeyOnWorkDisk(ctx, vm, image, prep); err != nil { return err } - if err := c.ws.ensureHushLoginOnWorkDisk(ctx, vm); err != nil { - return err - } if err := c.ws.ensureGitIdentityOnWorkDisk(ctx, vm); err != nil { return err } diff --git a/internal/daemon/fcproc/fcproc.go b/internal/daemon/fcproc/fcproc.go index fd23402..1d3eaac 100644 --- a/internal/daemon/fcproc/fcproc.go +++ b/internal/daemon/fcproc/fcproc.go @@ -25,16 +25,6 @@ import ( "banger/internal/system" ) -// errFirecrackerPIDNotFound is returned by findByJailerPidfile when the -// pidfile is missing, unreadable, or doesn't point at a live firecracker -// process. Surfaces to callers as a "this VM isn't running" signal, not -// as a hard failure. -var errFirecrackerPIDNotFound = errors.New("firecracker pid not found") - -// procDir is the kernel's per-process inspection directory. Var so tests -// can swap in a fake /proc-shaped fixture in t.TempDir(). -var procDir = "/proc" - // ErrWaitForExitTimeout is returned by WaitForExit when the deadline passes // before the process exits. Callers use errors.Is to detect it. var ErrWaitForExitTimeout = errors.New("timed out waiting for VM to exit") @@ -266,35 +256,9 @@ func chownChmodNoFollow(ctx context.Context, runner Runner, path string, uid, gi return nil } -// FindPID returns the PID of the firecracker process backing apiSock. -// -// Two strategies, tried in order: -// -// 1. pgrep -n -f apiSock — cheap, works for direct (non-jailer) launches -// because the host-side socket path appears verbatim in firecracker's -// cmdline. -// 2. Jailer pidfile — for jailer'd firecrackers, pgrep can't match -// because the cmdline only carries the chroot-relative -// `--api-sock /firecracker.socket`. Jailer (v1.x) writes the -// post-exec firecracker PID to `/firecracker.pid` by default. -// Read it; verify the PID is alive and its comm is `firecracker`. -// Caller must run with read access to the pidfile (root in the -// system-mode helper; daemon UID in dev mode where banger doesn't -// drop privs). -// -// This is what makes post-restart reconcile re-attach to surviving -// guests instead of mistaking them for stale. +// FindPID returns the PID of the firecracker process listening on apiSock, +// located via pgrep. func (m *Manager) FindPID(ctx context.Context, apiSock string) (int, error) { - if pid, err := m.findPIDByPgrep(ctx, apiSock); err == nil && pid > 0 { - return pid, nil - } - if pid, err := findByJailerPidfile(apiSock); err == nil && pid > 0 { - return pid, nil - } - return 0, errFirecrackerPIDNotFound -} - -func (m *Manager) findPIDByPgrep(ctx context.Context, apiSock string) (int, error) { out, err := m.runner.Run(ctx, "pgrep", "-n", "-f", apiSock) if err != nil { return 0, err @@ -302,43 +266,6 @@ func (m *Manager) findPIDByPgrep(ctx context.Context, apiSock string) (int, erro return strconv.Atoi(strings.TrimSpace(string(out))) } -// findByJailerPidfile reads the jailer-written pidfile that lives at -// `/firecracker.pid` (sibling of the api socket inside the -// chroot), verifies the PID is alive and its /proc//comm is -// `firecracker`, and returns it. -// -// Returns errFirecrackerPIDNotFound when the api-sock isn't a symlink -// (direct launch — pidfile shape doesn't apply), the pidfile is -// missing or unreadable (VM stopped, or caller lacks privileges), -// the pidfile content is garbage, or the PID points at a process -// that's gone or never was firecracker. -func findByJailerPidfile(apiSock string) (int, error) { - target, err := os.Readlink(apiSock) - if err != nil { - return 0, errFirecrackerPIDNotFound - } - if !filepath.IsAbs(target) { - target = filepath.Join(filepath.Dir(apiSock), target) - } - pidPath := filepath.Join(filepath.Dir(target), "firecracker.pid") - pidBytes, err := os.ReadFile(pidPath) - if err != nil { - return 0, errFirecrackerPIDNotFound - } - pid, err := strconv.Atoi(strings.TrimSpace(string(pidBytes))) - if err != nil || pid <= 0 { - return 0, errFirecrackerPIDNotFound - } - commBytes, err := os.ReadFile(filepath.Join(procDir, strconv.Itoa(pid), "comm")) - if err != nil { - return 0, errFirecrackerPIDNotFound - } - if strings.TrimSpace(string(commBytes)) != "firecracker" { - return 0, errFirecrackerPIDNotFound - } - return pid, nil -} - // ResolvePID prefers pgrep and falls back to the firecracker machine PID. // Returns 0 if neither source yields a PID. func (m *Manager) ResolvePID(ctx context.Context, machine *firecracker.Machine, apiSock string) int { diff --git a/internal/daemon/fcproc/findpid_jailer_test.go b/internal/daemon/fcproc/findpid_jailer_test.go deleted file mode 100644 index ae89deb..0000000 --- a/internal/daemon/fcproc/findpid_jailer_test.go +++ /dev/null @@ -1,173 +0,0 @@ -package fcproc - -import ( - "errors" - "fmt" - "os" - "path/filepath" - "testing" -) - -// pidfileFixture builds the on-disk shape findByJailerPidfile inspects: -// a /proc-like tree (one entry per pid with comm), an api-sock symlink -// pointing into a faux chroot, and the chroot's firecracker.pid file. -type pidfileFixture struct { - root string - proc string - runtime string - chroots string -} - -func newPidfileFixture(t *testing.T) *pidfileFixture { - t.Helper() - root := t.TempDir() - f := &pidfileFixture{ - root: root, - proc: filepath.Join(root, "proc"), - runtime: filepath.Join(root, "runtime"), - chroots: filepath.Join(root, "chroots"), - } - for _, dir := range []string{f.proc, f.runtime, f.chroots} { - if err := os.MkdirAll(dir, 0o755); err != nil { - t.Fatalf("mkdir %s: %v", dir, err) - } - } - prev := procDir - procDir = f.proc - t.Cleanup(func() { procDir = prev }) - return f -} - -// addProc writes /proc//comm. Mirrors the real /proc shape (comm -// has a trailing newline; production code TrimSpaces it). -func (f *pidfileFixture) addProc(t *testing.T, pid int, comm string) { - t.Helper() - pidDir := filepath.Join(f.proc, fmt.Sprint(pid)) - if err := os.MkdirAll(pidDir, 0o755); err != nil { - t.Fatalf("mkdir %s: %v", pidDir, err) - } - if err := os.WriteFile(filepath.Join(pidDir, "comm"), []byte(comm+"\n"), 0o644); err != nil { - t.Fatalf("write comm: %v", err) - } -} - -// buildVMSocket lays out the chroot for a VM and returns the api-sock -// path the test points findByJailerPidfile at. pidfileContent is what -// `cat /firecracker.pid` will return — pass an empty string to -// skip writing the pidfile. -func (f *pidfileFixture) buildVMSocket(t *testing.T, vmid, pidfileContent string) (apiSock string) { - t.Helper() - chroot := filepath.Join(f.chroots, vmid, "root") - if err := os.MkdirAll(chroot, 0o755); err != nil { - t.Fatalf("mkdir chroot: %v", err) - } - socketTarget := filepath.Join(chroot, "firecracker.socket") - if err := os.WriteFile(socketTarget, nil, 0o600); err != nil { - t.Fatalf("write socket placeholder: %v", err) - } - if pidfileContent != "" { - if err := os.WriteFile(filepath.Join(chroot, "firecracker.pid"), []byte(pidfileContent), 0o600); err != nil { - t.Fatalf("write pidfile: %v", err) - } - } - apiSock = filepath.Join(f.runtime, "fc-"+vmid+".sock") - if err := os.Symlink(socketTarget, apiSock); err != nil { - t.Fatalf("symlink api sock: %v", err) - } - return apiSock -} - -func TestFindByJailerPidfileHappyPath(t *testing.T) { - f := newPidfileFixture(t) - apiSock := f.buildVMSocket(t, "abc", "100\n") - f.addProc(t, 100, "firecracker") - - got, err := findByJailerPidfile(apiSock) - if err != nil { - t.Fatalf("unexpected error: %v", err) - } - if got != 100 { - t.Fatalf("pid = %d, want 100", got) - } -} - -func TestFindByJailerPidfileMissingPidfile(t *testing.T) { - f := newPidfileFixture(t) - // VM exists in the chroot layout but no pidfile (e.g. VM was created - // but never started, or stopped and pidfile cleared). - apiSock := f.buildVMSocket(t, "abc", "") - - _, err := findByJailerPidfile(apiSock) - if !errors.Is(err, errFirecrackerPIDNotFound) { - t.Fatalf("err = %v, want errFirecrackerPIDNotFound", err) - } -} - -func TestFindByJailerPidfileStalePID(t *testing.T) { - f := newPidfileFixture(t) - // Pidfile points at a PID with no /proc entry — the FC died but the - // pidfile was left behind. Reconcile must treat this as "not running" - // so the rediscoverHandles path can mark the VM stopped cleanly. - apiSock := f.buildVMSocket(t, "abc", "100\n") - // Deliberately don't addProc(100, ...). - - _, err := findByJailerPidfile(apiSock) - if !errors.Is(err, errFirecrackerPIDNotFound) { - t.Fatalf("err = %v, want errFirecrackerPIDNotFound", err) - } -} - -func TestFindByJailerPidfileWrongComm(t *testing.T) { - f := newPidfileFixture(t) - // PID was recycled by the kernel and now belongs to some other - // process. The comm check is what catches this — pidfile content is - // untrusted across reboots / PID-wraparound. - apiSock := f.buildVMSocket(t, "abc", "100\n") - f.addProc(t, 100, "bash") - - _, err := findByJailerPidfile(apiSock) - if !errors.Is(err, errFirecrackerPIDNotFound) { - t.Fatalf("err = %v, want errFirecrackerPIDNotFound", err) - } -} - -func TestFindByJailerPidfileGarbageContent(t *testing.T) { - f := newPidfileFixture(t) - apiSock := f.buildVMSocket(t, "abc", "not-a-pid\n") - - _, err := findByJailerPidfile(apiSock) - if !errors.Is(err, errFirecrackerPIDNotFound) { - t.Fatalf("err = %v, want errFirecrackerPIDNotFound", err) - } -} - -func TestFindByJailerPidfileNonSymlinkApiSock(t *testing.T) { - f := newPidfileFixture(t) - // Direct (non-jailer) launches produce a regular-file api sock with - // no chroot beside it. Pidfile lookup can't help; fall through cleanly. - apiSock := filepath.Join(f.runtime, "direct-launch.sock") - if err := os.WriteFile(apiSock, nil, 0o600); err != nil { - t.Fatalf("write apiSock: %v", err) - } - - _, err := findByJailerPidfile(apiSock) - if !errors.Is(err, errFirecrackerPIDNotFound) { - t.Fatalf("err = %v, want errFirecrackerPIDNotFound", err) - } -} - -func TestFindByJailerPidfileTrimsWhitespace(t *testing.T) { - f := newPidfileFixture(t) - // Some FC versions write the pidfile with stray whitespace; the - // parser must tolerate it. - apiSock := f.buildVMSocket(t, "abc", " 100 \n\n") - f.addProc(t, 100, "firecracker") - - got, err := findByJailerPidfile(apiSock) - if err != nil { - t.Fatalf("unexpected error: %v", err) - } - if got != 100 { - t.Fatalf("pid = %d, want 100", got) - } -} diff --git a/internal/daemon/sshd_config_test.go b/internal/daemon/sshd_config_test.go index 46cae4a..5b89e2f 100644 --- a/internal/daemon/sshd_config_test.go +++ b/internal/daemon/sshd_config_test.go @@ -20,11 +20,6 @@ func TestSshdGuestConfig_Hardened(t *testing.T) { "PasswordAuthentication no", "KbdInteractiveAuthentication no", "AuthorizedKeysFile /root/.ssh/authorized_keys", - // Quiet-login: short-lived sandboxes don't need the Debian - // MOTD or the "Last login" line. .hushlogin in /root covers - // pam_motd; these two cover sshd's own paths. - "PrintMotd no", - "PrintLastLog no", } for _, line := range mustContain { if !strings.Contains(cfg, line) { diff --git a/internal/daemon/tap_pool.go b/internal/daemon/tap_pool.go index d91debf..c0e5f60 100644 --- a/internal/daemon/tap_pool.go +++ b/internal/daemon/tap_pool.go @@ -6,7 +6,6 @@ import ( "strconv" "strings" "sync" - "sync/atomic" ) const tapPoolPrefix = "tap-pool-" @@ -17,16 +16,8 @@ type tapPool struct { mu sync.Mutex entries []string next int - warming bool } -// maxConcurrentTapWarmup caps the number of `priv.create_tap` RPCs the -// warmup loop runs in parallel. Each tap creation is ~4 root-helper -// shell-outs serialized within one RPC handler; running too many at -// once just contends on netlink. 8 is the production sweet spot for -// SMOKE_JOBS=8. -const maxConcurrentTapWarmup = 8 - // initializeTapPool seeds the monotonic pool index from the set of // tap names already in use by running/stopped VMs, so newly warmed // pool entries don't collide with existing ones. Callers (Daemon.Open) @@ -50,23 +41,6 @@ func (n *HostNetwork) ensureTapPool(ctx context.Context) { if n.config.TapPoolSize <= 0 { return } - - // Dedupe concurrent warmup invocations. Releases trigger a fresh - // ensureTapPool in a goroutine; without this, N parallel releases - // would each spin up their own warmup loop racing on n.tapPool.next. - n.tapPool.mu.Lock() - if n.tapPool.warming { - n.tapPool.mu.Unlock() - return - } - n.tapPool.warming = true - n.tapPool.mu.Unlock() - defer func() { - n.tapPool.mu.Lock() - n.tapPool.warming = false - n.tapPool.mu.Unlock() - }() - for { select { case <-ctx.Done(): @@ -77,54 +51,28 @@ func (n *HostNetwork) ensureTapPool(ctx context.Context) { } n.tapPool.mu.Lock() - deficit := n.config.TapPoolSize - len(n.tapPool.entries) - if deficit <= 0 { + if len(n.tapPool.entries) >= n.config.TapPoolSize { n.tapPool.mu.Unlock() return } - batch := deficit - if batch > maxConcurrentTapWarmup { - batch = maxConcurrentTapWarmup - } - // Reserve names up front so concurrent goroutines can't collide - // on n.tapPool.next. - names := make([]string, batch) - for i := range names { - names[i] = fmt.Sprintf("%s%d", tapPoolPrefix, n.tapPool.next) - n.tapPool.next++ - } + tapName := fmt.Sprintf("%s%d", tapPoolPrefix, n.tapPool.next) + n.tapPool.next++ n.tapPool.mu.Unlock() - var ( - wg sync.WaitGroup - progress atomic.Int32 - ) - for _, tapName := range names { - wg.Add(1) - go func(tapName string) { - defer wg.Done() - if err := n.createTap(ctx, tapName); err != nil { - if n.logger != nil { - n.logger.Warn("tap pool warmup failed", "tap_device", tapName, "error", err.Error()) - } - return - } - n.tapPool.mu.Lock() - n.tapPool.entries = append(n.tapPool.entries, tapName) - n.tapPool.mu.Unlock() - progress.Add(1) - if n.logger != nil { - n.logger.Debug("tap added to idle pool", "tap_device", tapName) - } - }(tapName) - } - wg.Wait() - - // Whole batch failed → bail rather than burn names indefinitely - // (the original sequential loop bailed on first error too). - if progress.Load() == 0 { + if err := n.createTap(ctx, tapName); err != nil { + if n.logger != nil { + n.logger.Warn("tap pool warmup failed", "tap_device", tapName, "error", err.Error()) + } return } + + n.tapPool.mu.Lock() + n.tapPool.entries = append(n.tapPool.entries, tapName) + n.tapPool.mu.Unlock() + + if n.logger != nil { + n.logger.Debug("tap added to idle pool", "tap_device", tapName) + } } } diff --git a/internal/daemon/vm_authsync.go b/internal/daemon/vm_authsync.go index 117014a..b4feaaa 100644 --- a/internal/daemon/vm_authsync.go +++ b/internal/daemon/vm_authsync.go @@ -86,15 +86,6 @@ func provisionAuthorizedKey(ctx context.Context, runner system.CommandRunner, im return system.WriteExt4FileOwned(ctx, runner, imagePath, "/.ssh/authorized_keys", 0o600, 0, 0, merged) } -// ensureHushLoginOnWorkDisk lands /root/.hushlogin in the guest by -// writing /.hushlogin at the root of the work disk (which mounts at -// /root inside the guest). pam_motd checks $HOME/.hushlogin and stays -// silent when it exists — combined with sshd's PrintMotd no / PrintLastLog no -// that suppresses the Debian-style banner on `banger vm run`. -func (s *WorkspaceService) ensureHushLoginOnWorkDisk(ctx context.Context, vm *model.VMRecord) error { - return system.WriteExt4FileOwned(ctx, s.runner, vm.Runtime.WorkDiskPath, "/.hushlogin", 0o644, 0, 0, nil) -} - func (s *WorkspaceService) ensureGitIdentityOnWorkDisk(ctx context.Context, vm *model.VMRecord) error { runner := s.runner if runner == nil { diff --git a/internal/daemon/vm_disk.go b/internal/daemon/vm_disk.go index fe5db6d..5d689f5 100644 --- a/internal/daemon/vm_disk.go +++ b/internal/daemon/vm_disk.go @@ -159,16 +159,6 @@ func (s *VMService) ensureWorkDisk(ctx context.Context, vm *model.VMRecord, imag // Pins the lookup path so the banger-written file always wins, // regardless of distro default ($HOME/.ssh/authorized_keys) and // regardless of any per-image weirdness. -// -// - PrintMotd no / PrintLastLog no -// Banger VMs are short-lived sandboxes. The Debian-style MOTD -// ("Linux ... GNU/Linux comes with ABSOLUTELY NO WARRANTY …") and -// the "Last login" line are pure noise for `vm run -- echo hi` -// style invocations. Pair this with the .hushlogin landed on the -// work disk (see ensureHushLoginOnWorkDisk) so pam_motd also stays -// silent on distros that read /etc/motd through PAM rather than -// sshd. The work disk mounts at /root, so the file has to live on -// that disk — a write to the rootfs overlay would be shadowed. func sshdGuestConfig() string { return strings.Join([]string{ "PermitRootLogin prohibit-password", @@ -176,8 +166,6 @@ func sshdGuestConfig() string { "PasswordAuthentication no", "KbdInteractiveAuthentication no", "AuthorizedKeysFile /root/.ssh/authorized_keys", - "PrintMotd no", - "PrintLastLog no", "", }, "\n") } diff --git a/internal/daemon/vm_lifecycle.go b/internal/daemon/vm_lifecycle.go index ca0aad7..e759bc6 100644 --- a/internal/daemon/vm_lifecycle.go +++ b/internal/daemon/vm_lifecycle.go @@ -131,27 +131,44 @@ func (s *VMService) stopVMLocked(ctx context.Context, current model.VMRecord) (v } return vm, nil } + pid := s.vmHandles(vm.ID).PID op.stage("graceful_shutdown") - // Reach into the guest over SSH to force a sync + queue a poweroff. - // The sync is what keeps stop() from losing data: every dirty page - // the guest hasn't flushed through virtio-blk to the work disk is - // written out before this RPC returns. Once sync completes, - // root.ext4 on the host is consistent and cleanupRuntime's SIGKILL - // is safe — there is no benefit to waiting for the guest's - // poweroff.target to finish, so we skip waitForExit entirely. + // Reach into the guest over SSH to force a sync + queue a poweroff + // before falling back on FC's SendCtrlAltDel. The sync is what + // keeps stop() from losing data: every dirty page the guest hasn't + // flushed through virtio-blk to the work disk is written out + // before this RPC returns. Without it, files freshly created via + // `vm workspace prepare` can disappear across stop+start, because + // the 10-second wait_for_exit window expires (FC doesn't exit on + // SendCtrlAltDel — Debian routes ctrl-alt-del.target → reboot.target, + // not poweroff) and the fallback SIGKILL drops everything still + // in FC's userspace I/O path. // - // When SSH is unreachable (broken sshd, network down, drifted host - // key) we drop straight to SIGKILL via cleanupRuntime. The - // previous fallback was SendCtrlAltDel + a 10-second wait for FC - // to exit, but on Debian ctrl+alt+del routes to reboot.target, so - // FC never exits on it — the wait was always a wasted 10s. We pay - // the data-loss cost we already paid before (after the timeout - // expired the old code SIGKILLed too), but without the latency. + // `systemctl --no-block poweroff` is queued for the same reason + // SendCtrlAltDel was here originally — it's how stop() asks the + // guest to halt. That request is best-effort; FC may or may not + // exit before the SIGKILL fallback fires. Either way, sync + // already ran, so the on-host root.ext4 is consistent regardless. + // + // SendCtrlAltDel survives as a fallback for guests where SSH + // itself is unreachable (broken sshd, network down, drifted host + // key); it doesn't fix the data-loss path, but it's the existing + // last-resort signal and is at least no worse than today. if err := s.requestGuestPoweroff(ctx, vm); err != nil { if s.logger != nil { - s.logger.Warn("guest ssh poweroff failed; SIGKILL without sync", + s.logger.Warn("guest ssh poweroff failed; falling back to ctrl+alt+del", append(vmLogAttrs(vm), "error", err.Error())...) } + if fallbackErr := s.net.sendCtrlAltDel(ctx, vm.Runtime.APISockPath); fallbackErr != nil { + return model.VMRecord{}, fallbackErr + } + } + op.stage("wait_for_exit", "pid", pid) + if err := s.net.waitForExit(ctx, pid, vm.Runtime.APISockPath, gracefulShutdownWait); err != nil { + if !errors.Is(err, errWaitForExitTimeout) { + return model.VMRecord{}, err + } + op.stage("graceful_shutdown_timeout", "pid", pid) } op.stage("cleanup_runtime") if err := s.cleanupRuntime(ctx, vm, true); err != nil { @@ -173,16 +190,16 @@ func (s *VMService) stopVMLocked(ctx context.Context, current model.VMRecord) (v // comment in stopVMLocked. Returns the dial / SSH error if the guest // is unreachable; the caller treats that as a fallback signal. // -// Bounded by a hard 2-second SSH-dial timeout. A reachable guest on -// the host bridge dials in single-digit milliseconds; if we haven't -// connected in 2s the guest is effectively gone, so we fail fast and -// let the caller SIGKILL rather than burning latency on a doomed dial. +// Bounded by a hard 5-second SSH-dial timeout so a half-broken guest +// doesn't extend the overall stop window past the existing +// gracefulShutdownWait. If the dial doesn't succeed in that window we +// surface an error and let the caller take the SendCtrlAltDel path. func (s *VMService) requestGuestPoweroff(ctx context.Context, vm model.VMRecord) error { guestIP := strings.TrimSpace(vm.Runtime.GuestIP) if guestIP == "" { return errors.New("guest IP unknown") } - dialCtx, cancel := context.WithTimeout(ctx, 2*time.Second) + dialCtx, cancel := context.WithTimeout(ctx, 5*time.Second) defer cancel() address := net.JoinHostPort(guestIP, "22") client, err := guest.Dial(dialCtx, address, s.config.SSHKeyPath, s.layout.KnownHostsPath) diff --git a/internal/daemon/vm_test.go b/internal/daemon/vm_test.go index a747104..131c55f 100644 --- a/internal/daemon/vm_test.go +++ b/internal/daemon/vm_test.go @@ -1592,7 +1592,7 @@ func TestDeleteStoppedNATVMDoesNotFailWithoutTapDevice(t *testing.T) { } } -func TestStopVMSIGKILLsWhenSSHUnreachable(t *testing.T) { +func TestStopVMFallsBackToForcedCleanupAfterGracefulTimeout(t *testing.T) { ctx := context.Background() db := openDaemonStore(t) apiSock := filepath.Join(t.TempDir(), "fc.sock") @@ -1606,6 +1606,12 @@ func TestStopVMSIGKILLsWhenSSHUnreachable(t *testing.T) { } }) + oldGracefulWait := gracefulShutdownWait + gracefulShutdownWait = 50 * time.Millisecond + t.Cleanup(func() { + gracefulShutdownWait = oldGracefulWait + }) + vm := testVM("stubborn", "image-stubborn", "172.16.0.23") vm.State = model.VMStateRunning vm.Runtime.State = model.VMStateRunning @@ -1616,6 +1622,8 @@ func TestStopVMSIGKILLsWhenSSHUnreachable(t *testing.T) { scriptedRunner: &scriptedRunner{ t: t, steps: []runnerStep{ + sudoStep("", nil, "chmod", "600", apiSock), + sudoStep("", nil, "chown", "-h", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), apiSock), {call: runnerCall{name: "pgrep", args: []string{"-n", "-f", apiSock}}, out: []byte(strconv.Itoa(fake.Process.Pid) + "\n")}, sudoStep("", nil, "kill", "-KILL", strconv.Itoa(fake.Process.Pid)), }, diff --git a/internal/firecracker/client.go b/internal/firecracker/client.go index 3a96acf..93a346a 100644 --- a/internal/firecracker/client.go +++ b/internal/firecracker/client.go @@ -196,15 +196,6 @@ func buildConfig(cfg MachineConfig) sdk.Config { Smt: sdk.Bool(false), }, VMID: cfg.VMID, - // Disable the SDK's signal-forwarding goroutine. Default - // (nil) makes the SDK install a handler that catches - // SIGTERM/SIGINT/SIGHUP/SIGQUIT/SIGABRT in the parent process - // and forwards them to the firecracker child — which means - // `systemctl stop bangerd-root.service` (sends SIGTERM to the - // helper) ends up signaling every firecracker the helper has - // launched, killing every running VM. Empty slice (not nil) - // short-circuits setupSignals at len()==0. - ForwardSignals: []os.Signal{}, } if cfg.Jailer != nil { // The path fields above are already chroot-translated by the @@ -286,10 +277,9 @@ func buildProcessRunner(cfg MachineConfig, logFile *os.File) *exec.Cmd { args = []string{"--api-sock", cfg.SocketPath, "--id", cfg.VMID} } var cmd *exec.Cmd - switch { - case os.Geteuid() == 0: + if os.Geteuid() == 0 { cmd = exec.Command(bin, args...) - default: + } else { cmd = exec.Command("sudo", append([]string{"-n", "-E", bin}, args...)...) } cmd.Stdin = nil diff --git a/internal/installmeta/installmeta_test.go b/internal/installmeta/installmeta_test.go index 1b9044c..3901d88 100644 --- a/internal/installmeta/installmeta_test.go +++ b/internal/installmeta/installmeta_test.go @@ -1,11 +1,7 @@ package installmeta import ( - "errors" - "os" - "os/user" "path/filepath" - "strconv" "testing" "time" ) @@ -35,157 +31,6 @@ func TestSaveLoadRoundTrip(t *testing.T) { } } -func TestSaveCreatesParentDir(t *testing.T) { - path := filepath.Join(t.TempDir(), "nested", "dir", "install.toml") - meta := Metadata{OwnerUser: "dev", OwnerUID: 1, OwnerGID: 1, OwnerHome: "/home/dev"} - if err := Save(path, meta); err != nil { - t.Fatalf("Save: %v", err) - } - if _, err := os.Stat(path); err != nil { - t.Fatalf("file not written: %v", err) - } -} - -func TestSaveRejectsInvalidMetadata(t *testing.T) { - path := filepath.Join(t.TempDir(), "install.toml") - if err := Save(path, Metadata{OwnerUID: 1, OwnerGID: 1, OwnerHome: "/home/dev"}); err == nil { - t.Fatal("Save() = nil, want validation error") - } - if _, err := os.Stat(path); !errors.Is(err, os.ErrNotExist) { - t.Fatalf("Save wrote a file despite validation error: stat err = %v", err) - } -} - -func TestLoadMissingFile(t *testing.T) { - _, err := Load(filepath.Join(t.TempDir(), "missing.toml")) - if !errors.Is(err, os.ErrNotExist) { - t.Fatalf("Load() = %v, want os.ErrNotExist", err) - } -} - -func TestLoadInvalidTOML(t *testing.T) { - path := filepath.Join(t.TempDir(), "install.toml") - if err := os.WriteFile(path, []byte("not = valid = toml\n"), 0o644); err != nil { - t.Fatal(err) - } - if _, err := Load(path); err == nil { - t.Fatal("Load() = nil, want TOML parse error") - } -} - -func TestLoadRejectsInvalidPersistedMetadata(t *testing.T) { - // File parses but fails Validate (no owner_user) — Load must surface - // the validation error rather than returning a zero-value Metadata. - path := filepath.Join(t.TempDir(), "install.toml") - if err := os.WriteFile(path, []byte("owner_uid = 1\nowner_gid = 1\nowner_home = \"/home/dev\"\n"), 0o644); err != nil { - t.Fatal(err) - } - if _, err := Load(path); err == nil { - t.Fatal("Load() = nil, want validation error") - } -} - -func TestValidate(t *testing.T) { - tests := []struct { - name string - m Metadata - ok bool - }{ - {"valid", Metadata{OwnerUser: "dev", OwnerUID: 1, OwnerGID: 1, OwnerHome: "/home/dev"}, true}, - {"missing owner_user", Metadata{OwnerUID: 1, OwnerGID: 1, OwnerHome: "/home/dev"}, false}, - {"whitespace owner_user", Metadata{OwnerUser: " ", OwnerUID: 1, OwnerGID: 1, OwnerHome: "/home/dev"}, false}, - {"negative uid", Metadata{OwnerUser: "dev", OwnerUID: -1, OwnerGID: 1, OwnerHome: "/home/dev"}, false}, - {"negative gid", Metadata{OwnerUser: "dev", OwnerUID: 1, OwnerGID: -1, OwnerHome: "/home/dev"}, false}, - {"empty home", Metadata{OwnerUser: "dev", OwnerUID: 1, OwnerGID: 1, OwnerHome: ""}, false}, - {"relative home", Metadata{OwnerUser: "dev", OwnerUID: 1, OwnerGID: 1, OwnerHome: "home/dev"}, false}, - } - for _, tc := range tests { - t.Run(tc.name, func(t *testing.T) { - err := tc.m.Validate() - if tc.ok && err != nil { - t.Fatalf("Validate() = %v, want nil", err) - } - if !tc.ok && err == nil { - t.Fatal("Validate() = nil, want error") - } - }) - } -} - -func TestLookupOwnerEmpty(t *testing.T) { - if _, err := LookupOwner(""); err == nil { - t.Fatal("LookupOwner(\"\") = nil, want error") - } - if _, err := LookupOwner(" "); err == nil { - t.Fatal("LookupOwner(\" \") = nil, want error") - } -} - -func TestLookupOwnerMissing(t *testing.T) { - if _, err := LookupOwner("definitely-no-such-user-banger-test"); err == nil { - t.Fatal("LookupOwner(missing) = nil, want error") - } -} - -func TestLookupOwnerCurrentUser(t *testing.T) { - cur, err := user.Current() - if err != nil { - t.Skipf("user.Current: %v", err) - } - got, err := LookupOwner(cur.Username) - if err != nil { - t.Fatalf("LookupOwner(%q): %v", cur.Username, err) - } - wantUID, _ := strconv.Atoi(cur.Uid) - wantGID, _ := strconv.Atoi(cur.Gid) - if got.OwnerUser != cur.Username || got.OwnerUID != wantUID || got.OwnerGID != wantGID || got.OwnerHome != cur.HomeDir { - t.Fatalf("LookupOwner = %+v, want user=%s uid=%d gid=%d home=%s", - got, cur.Username, wantUID, wantGID, cur.HomeDir) - } -} - -func TestUpdateBuildInfo(t *testing.T) { - path := filepath.Join(t.TempDir(), "install.toml") - original := Metadata{ - OwnerUser: "dev", - OwnerUID: 1000, - OwnerGID: 1000, - OwnerHome: "/home/dev", - InstalledAt: time.Unix(1710000000, 0).UTC(), - Version: "v0.1.0", - Commit: "old", - BuiltAt: "2026-01-01T00:00:00Z", - } - if err := Save(path, original); err != nil { - t.Fatalf("Save: %v", err) - } - - if err := UpdateBuildInfo(path, " v0.2.0 ", " new ", " 2026-04-30T00:00:00Z "); err != nil { - t.Fatalf("UpdateBuildInfo: %v", err) - } - - got, err := Load(path) - if err != nil { - t.Fatalf("Load: %v", err) - } - if got.Version != "v0.2.0" || got.Commit != "new" || got.BuiltAt != "2026-04-30T00:00:00Z" { - t.Fatalf("build fields = %q/%q/%q, want trimmed values", got.Version, got.Commit, got.BuiltAt) - } - // Identity must be preserved. - if got.OwnerUser != original.OwnerUser || got.OwnerUID != original.OwnerUID || - got.OwnerGID != original.OwnerGID || got.OwnerHome != original.OwnerHome || - !got.InstalledAt.Equal(original.InstalledAt) { - t.Fatalf("identity changed: got %+v, want %+v", got, original) - } -} - -func TestUpdateBuildInfoMissingFile(t *testing.T) { - err := UpdateBuildInfo(filepath.Join(t.TempDir(), "missing.toml"), "v1", "c", "t") - if !errors.Is(err, os.ErrNotExist) { - t.Fatalf("UpdateBuildInfo() = %v, want os.ErrNotExist", err) - } -} - func TestValidateRejectsMissingOwner(t *testing.T) { err := Metadata{OwnerUID: 1000, OwnerGID: 1000, OwnerHome: "/home/dev"}.Validate() if err == nil { diff --git a/internal/roothelper/roothelper.go b/internal/roothelper/roothelper.go index 3aec14e..f164b5d 100644 --- a/internal/roothelper/roothelper.go +++ b/internal/roothelper/roothelper.go @@ -1296,24 +1296,18 @@ func validateIPv4(ip string) error { return nil } -// validateResolverAddr confirms s parses as an IP address, optionally -// with a ":port" suffix. resolvectl accepts both bare IPs and the -// "IP:port" form (used to point at a non-default DNS port — banger's -// in-process server binds to 127.0.0.1:42069). Reject anything that -// doesn't parse so a compromised daemon can't wedge resolved with -// garbage input. +// validateResolverAddr confirms s parses as an IP address (v4 or v6). +// resolvectl accepts either; reject anything that doesn't parse so a +// compromised daemon can't wedge resolved with garbage input. func validateResolverAddr(s string) error { s = strings.TrimSpace(s) if s == "" { return errors.New("resolver address is required") } - if net.ParseIP(s) != nil { - return nil + if net.ParseIP(s) == nil { + return fmt.Errorf("invalid resolver address %q", s) } - if host, _, err := net.SplitHostPort(s); err == nil && net.ParseIP(host) != nil { - return nil - } - return fmt.Errorf("invalid resolver address %q", s) + return nil } func validateTapName(tapName string) error { diff --git a/internal/roothelper/roothelper_test.go b/internal/roothelper/roothelper_test.go index 441a1e4..ac698c3 100644 --- a/internal/roothelper/roothelper_test.go +++ b/internal/roothelper/roothelper_test.go @@ -566,11 +566,8 @@ func TestValidateResolverAddr(t *testing.T) { }{ {name: "ipv4", arg: "192.168.1.1", ok: true}, {name: "ipv6", arg: "fe80::1", ok: true}, - {name: "ipv4_with_port", arg: "127.0.0.1:42069", ok: true}, - {name: "ipv6_with_port", arg: "[fe80::1]:42069", ok: true}, {name: "empty", arg: "", ok: false}, {name: "garbage", arg: "resolver.example", ok: false}, - {name: "garbage_with_port", arg: "resolver.example:53", ok: false}, } { tc := tc t.Run(tc.name, func(t *testing.T) { diff --git a/internal/smoketest/doc.go b/internal/smoketest/doc.go deleted file mode 100644 index af7d17e..0000000 --- a/internal/smoketest/doc.go +++ /dev/null @@ -1,24 +0,0 @@ -//go:build smoke - -// Package smoketest is the end-to-end smoke gate for banger's supported -// two-service systemd model. It runs only when the build is tagged -// `smoke`, which keeps it out of `go test ./...` on contributor -// machines and CI. -// -// The suite touches global host state: it installs instrumented -// bangerd.service + bangerd-root.service, drives real Firecracker/KVM -// scenarios, copies covdata back out, then purges the smoke-owned -// install on exit. It refuses to run if a non-smoke install is already -// on the host (see the marker file under /etc/banger). -// -// The harness expects three env vars, normally set by `make smoke`: -// -// BANGER_SMOKE_BIN_DIR — instrumented banger / bangerd / vsock-agent -// BANGER_SMOKE_COVER_DIR — coverage output directory (GOCOVERDIR) -// BANGER_SMOKE_XDG_DIR — scratch root for fake homes, fake repos, etc. -// -// Coverage: the test binary itself is not instrumented, but every -// banger / bangerd subprocess it spawns is, and writes covdata into -// BANGER_SMOKE_COVER_DIR. Service-side covdata under /var/lib/banger -// is copied out at teardown. -package smoketest diff --git a/internal/smoketest/fixtures_test.go b/internal/smoketest/fixtures_test.go deleted file mode 100644 index b6e1105..0000000 --- a/internal/smoketest/fixtures_test.go +++ /dev/null @@ -1,50 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "fmt" - "os" - "os/exec" - "path/filepath" -) - -// setupRepoFixture builds the throwaway git repo at runtimeDir/fake-repo -// that every repodir-class scenario consumes. Mirrors -// scripts/smoke.sh:441-456. The path is stored in the package-level -// repoDir so scenarios can reference it directly. -func setupRepoFixture() error { - repoDir = filepath.Join(runtimeDir, "fake-repo") - if err := os.MkdirAll(repoDir, 0o755); err != nil { - return fmt.Errorf("setupRepoFixture: mkdir %s: %w", repoDir, err) - } - steps := [][]string{ - {"git", "init", "-q", "-b", "main"}, - {"git", "config", "commit.gpgsign", "false"}, - {"git", "config", "user.name", "smoke"}, - {"git", "config", "user.email", "smoke@smoke"}, - } - for _, args := range steps { - cmd := exec.Command(args[0], args[1:]...) - cmd.Dir = repoDir - if out, err := cmd.CombinedOutput(); err != nil { - return fmt.Errorf("setupRepoFixture: %s: %w\n%s", args, err, out) - } - } - marker := filepath.Join(repoDir, "smoke-file.txt") - if err := os.WriteFile(marker, []byte("smoke-workspace-marker\n"), 0o644); err != nil { - return fmt.Errorf("setupRepoFixture: write marker: %w", err) - } - commit := [][]string{ - {"git", "add", "."}, - {"git", "commit", "-q", "-m", "init"}, - } - for _, args := range commit { - cmd := exec.Command(args[0], args[1:]...) - cmd.Dir = repoDir - if out, err := cmd.CombinedOutput(); err != nil { - return fmt.Errorf("setupRepoFixture: %s: %w\n%s", args, err, out) - } - } - return nil -} diff --git a/internal/smoketest/helpers_test.go b/internal/smoketest/helpers_test.go deleted file mode 100644 index 4379e73..0000000 --- a/internal/smoketest/helpers_test.go +++ /dev/null @@ -1,201 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "bytes" - "os" - "os/exec" - "strings" - "testing" - "time" -) - -// result captures the output and exit status of a banger invocation. -// stdout / stderr are kept separate so assertions can target one or the -// other (matches the bash suite's `out=$(cmd)` vs `2>&1` patterns). -type result struct { - stdout string - stderr string - rc int -} - -// runCmd executes the given exec.Cmd, capturing stdout and stderr into -// the returned result. Non-zero exits are returned as a non-zero rc, not -// as an error — scenarios decide for themselves whether non-zero is a -// failure or the assertion under test. -func runCmd(t *testing.T, cmd *exec.Cmd) result { - t.Helper() - var outBuf, errBuf bytes.Buffer - cmd.Stdout = &outBuf - cmd.Stderr = &errBuf - err := cmd.Run() - res := result{stdout: outBuf.String(), stderr: errBuf.String()} - if err != nil { - if exitErr, ok := err.(*exec.ExitError); ok { - res.rc = exitErr.ExitCode() - } else { - t.Fatalf("exec %s: %v\nstderr: %s", strings.Join(cmd.Args, " "), err, res.stderr) - } - } - return res -} - -// banger runs the instrumented `banger` binary with the given arguments -// and returns the captured result. GOCOVERDIR is inherited from the -// process environment (TestMain exports it), so child covdata lands -// under BANGER_SMOKE_COVER_DIR automatically. -func banger(t *testing.T, args ...string) result { - t.Helper() - return runCmd(t, exec.Command(bangerBin, args...)) -} - -// mustBanger runs `banger` and Fatals if it exits non-zero. Returns the -// captured stdout for downstream `wantContains`. Most happy-path -// scenarios use this; scenarios that assert on non-zero exits use -// banger() directly. -func mustBanger(t *testing.T, args ...string) string { - t.Helper() - res := banger(t, args...) - if res.rc != 0 { - t.Fatalf("banger %s: exit %d\nstdout: %s\nstderr: %s", - strings.Join(args, " "), res.rc, res.stdout, res.stderr) - } - return res.stdout -} - -// sudoBanger runs `banger` under `sudo env GOCOVERDIR=...`. Sudo strips -// the env by default; explicit re-export keeps coverage flowing for -// scenarios that exercise the privileged path (system install / restart -// / update / daemon stop). -func sudoBanger(t *testing.T, args ...string) result { - t.Helper() - full := append([]string{"env", "GOCOVERDIR=" + coverDir, bangerBin}, args...) - return runCmd(t, exec.Command("sudo", full...)) -} - -// wantContains asserts that haystack contains needle. label is a short -// human-readable identifier for the failure message. -func wantContains(t *testing.T, haystack, needle, label string) { - t.Helper() - if !strings.Contains(haystack, needle) { - t.Fatalf("%s missing %q\ngot: %s", label, needle, haystack) - } -} - -// wantNotContains is the negative-assertion counterpart. Used by -// scenarios that verify a warning has been suppressed (e.g. the post- -// auto-prepare clean-state check in vm_exec) or that an export patch -// did NOT capture a guest-side commit. -func wantNotContains(t *testing.T, haystack, needle, label string) { - t.Helper() - if strings.Contains(haystack, needle) { - t.Fatalf("%s unexpectedly contains %q\ngot: %s", label, needle, haystack) - } -} - -// wantExit asserts the captured result exited with want. Used for -// scenarios that test exit-code propagation or refusal paths. -func wantExit(t *testing.T, got result, want int, label string) { - t.Helper() - if got.rc != want { - t.Fatalf("%s: exit %d, want %d\nstdout: %s\nstderr: %s", - label, got.rc, want, got.stdout, got.stderr) - } -} - -// vmDelete removes a VM, ignoring failure. Used in t.Cleanup hooks -// where the VM may already be gone (deleted by the scenario itself). -func vmDelete(name string) { - cmd := exec.Command(bangerBin, "vm", "delete", name) - _ = cmd.Run() -} - -// vmCreate creates a VM with the given name and registers a cleanup -// hook to delete it. extraArgs is forwarded after `vm create --name X` -// so callers can pass --vcpu N / --nat / --no-start / etc. Fatals if -// creation fails — every scenario that uses vmCreate needs the VM up. -func vmCreate(t *testing.T, name string, extraArgs ...string) { - t.Helper() - args := append([]string{"vm", "create", "--name", name}, extraArgs...) - mustBanger(t, args...) - t.Cleanup(func() { vmDelete(name) }) -} - -// bangerHome runs `banger` with HOME overridden to the given directory. -// Used by ssh-config scenarios that mutate ~/.ssh/config under a fake -// home so the test doesn't touch the contributor's real config. -func bangerHome(t *testing.T, home string, args ...string) result { - t.Helper() - cmd := exec.Command(bangerBin, args...) - cmd.Env = append(os.Environ(), "HOME="+home) - return runCmd(t, cmd) -} - -// mustBangerHome is bangerHome + Fatal-on-non-zero. Returns stdout. -func mustBangerHome(t *testing.T, home string, args ...string) string { - t.Helper() - res := bangerHome(t, home, args...) - if res.rc != 0 { - t.Fatalf("banger %s (HOME=%s): exit %d\nstdout: %s\nstderr: %s", - strings.Join(args, " "), home, res.rc, res.stdout, res.stderr) - } - return res.stdout -} - -// waitForSSH polls `banger vm ssh -- true` until SSH answers, -// up to 120 seconds. The original bash suite used 60s and occasionally -// flaked under load (post-update VM, large parallel pool); 120s gives -// enough headroom for the post-update / post-rollback paths where the -// daemon has just restarted, without making genuine breakage slow to -// surface. -func waitForSSH(t *testing.T, name string) { - t.Helper() - const timeout = 120 * time.Second - deadline := time.Now().Add(timeout) - for time.Now().Before(deadline) { - cmd := exec.Command(bangerBin, "vm", "ssh", name, "--", "true") - if err := cmd.Run(); err == nil { - return - } - time.Sleep(1 * time.Second) - } - t.Fatalf("vm %q ssh did not come up within %s", name, timeout) -} - -// requirePasswordlessSudo skips the test if `sudo -n true` cannot run. -// Mirrors the bash `if ! sudo -n true 2>/dev/null; then return 0; fi` -// pattern used by scenarios that exercise privileged paths. -func requirePasswordlessSudo(t *testing.T) { - t.Helper() - if err := exec.Command("sudo", "-n", "true").Run(); err != nil { - t.Skip("passwordless sudo unavailable") - } -} - -// requireSudoIptables skips the test if iptables can't be queried under -// `sudo -n`. Used by the NAT scenario whose assertions read POSTROUTING. -func requireSudoIptables(t *testing.T) { - t.Helper() - if err := exec.Command("sudo", "-n", "iptables", "-t", "nat", "-S", "POSTROUTING").Run(); err != nil { - t.Skip("passwordless sudo iptables unavailable") - } -} - -// installedVersion reads `/usr/local/bin/banger --version` and returns -// the version token. This is the *installed* binary that `banger update` -// swaps out — the smoke CLI under $BANGER_SMOKE_BIN_DIR is separate -// (and unaffected by update). Mirrors the bash `installed_version` -// helper at scripts/smoke.sh:1156-1162. -func installedVersion(t *testing.T) string { - t.Helper() - out, err := exec.Command("/usr/local/bin/banger", "--version").Output() - if err != nil { - t.Fatalf("read installed version: %v", err) - } - parts := strings.Fields(string(out)) - if len(parts) < 2 { - t.Fatalf("unparseable installed --version output: %q", string(out)) - } - return parts[1] -} diff --git a/internal/smoketest/release_server_test.go b/internal/smoketest/release_server_test.go deleted file mode 100644 index 45d5398..0000000 --- a/internal/smoketest/release_server_test.go +++ /dev/null @@ -1,310 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "archive/tar" - "compress/gzip" - "crypto/ecdsa" - "crypto/elliptic" - "crypto/rand" - "crypto/sha256" - "crypto/x509" - "encoding/base64" - "encoding/pem" - "fmt" - "io" - "net/http" - "net/http/httptest" - "os" - "os/exec" - "path/filepath" - "strings" - "sync" -) - -// Release-server state set up lazily by prepareSmokeReleases. The HTTP -// server stays up for the duration of TestMain (shut down in teardown). -// smokeRelOnce serializes concurrent first-callers; smokeRelErr is the -// stored result for replay so subsequent callers see the same outcome. -var ( - smokeRelOnce sync.Once - smokeRelErr error - manifestURL string - pubkeyFile string - releaseHTTPServer *httptest.Server - releaseRelDir string - smokeRelKey *ecdsa.PrivateKey -) - -const ( - smokeReleaseGood = "v0.smoke.0" - smokeReleaseBroken = "v0.smoke.broken-bangerd" -) - -// prepareSmokeReleases is the Go port of scripts/smoke.sh's -// prepare_smoke_releases. It generates an ECDSA P-256 keypair (matching -// cosign blob signatures, which are ASN.1 DER ECDSA over SHA256(body), -// base64-encoded), builds two coverage-instrumented release tarballs -// signed with that key, writes a manifest, and stands up an httptest -// file server. The hidden --manifest-url / --pubkey-file flags on -// `banger update` redirect the updater at this fake bucket. -// -// Idempotent. The first caller pays the build/server cost; later -// callers replay the cached result. -func prepareSmokeReleases() error { - smokeRelOnce.Do(func() { - smokeRelErr = doPrepareSmokeReleases() - }) - return smokeRelErr -} - -func doPrepareSmokeReleases() error { - releaseRelDir = filepath.Join(scratchRoot, "release") - if err := os.RemoveAll(releaseRelDir); err != nil { - return fmt.Errorf("clean release dir: %w", err) - } - if err := os.MkdirAll(releaseRelDir, 0o755); err != nil { - return fmt.Errorf("mkdir release dir: %w", err) - } - - priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader) - if err != nil { - return fmt.Errorf("generate ECDSA key: %w", err) - } - smokeRelKey = priv - - pubDER, err := x509.MarshalPKIXPublicKey(&priv.PublicKey) - if err != nil { - return fmt.Errorf("marshal pub key: %w", err) - } - pubPEM := pem.EncodeToMemory(&pem.Block{Type: "PUBLIC KEY", Bytes: pubDER}) - pubkeyFile = filepath.Join(releaseRelDir, "cosign.pub") - if err := os.WriteFile(pubkeyFile, pubPEM, 0o644); err != nil { - return fmt.Errorf("write pub key: %w", err) - } - - if err := buildSmokeReleaseTarball(smokeReleaseGood); err != nil { - return err - } - if err := buildSmokeReleaseTarball(smokeReleaseBroken); err != nil { - return err - } - - releaseHTTPServer = httptest.NewServer(http.FileServer(http.Dir(releaseRelDir))) - manifestPath := filepath.Join(releaseRelDir, "manifest.json") - if err := writeSmokeManifest(manifestPath, releaseHTTPServer.URL); err != nil { - return err - } - manifestURL = releaseHTTPServer.URL + "/manifest.json" - return nil -} - -func shutdownReleaseServer() { - if releaseHTTPServer != nil { - releaseHTTPServer.Close() - } -} - -// buildSmokeReleaseTarball is the Go port of build_smoke_release_tarball -// from scripts/smoke.sh. It compiles banger / bangerd / banger-vsock-agent -// with the requested Version baked in, packages them as a gzip tarball, -// and writes SHA256SUMS + SHA256SUMS.sig alongside. -// -// The v0.smoke.broken-* family ships a shell-script bangerd that passes -// `--check-migrations` (so the swap proceeds) but exits non-zero in -// service mode (so the post-swap restart fails and rollbackAndWrap -// fires). Same trick the bash version uses. -func buildSmokeReleaseTarball(version string) error { - outDir := filepath.Join(releaseRelDir, version) - stage := filepath.Join(outDir, ".stage") - if err := os.MkdirAll(stage, 0o755); err != nil { - return fmt.Errorf("mkdir stage: %w", err) - } - - ldflags := "-X banger/internal/buildinfo.Version=" + version + - " -X banger/internal/buildinfo.Commit=smoke" + - " -X banger/internal/buildinfo.BuiltAt=2026-04-30T00:00:00Z" - - root, err := repoRoot() - if err != nil { - return err - } - - build := func(target, output string, extraEnv ...string) error { - cmd := exec.Command("go", "build", "-ldflags", ldflags, "-o", output, target) - cmd.Dir = root - if len(extraEnv) > 0 { - cmd.Env = append(os.Environ(), extraEnv...) - } - if out, err := cmd.CombinedOutput(); err != nil { - return fmt.Errorf("build %s@%s: %w\n%s", target, version, err, out) - } - return nil - } - - if err := build("./cmd/banger", filepath.Join(stage, "banger")); err != nil { - return err - } - - if strings.HasPrefix(version, "v0.smoke.broken-") { - const brokenScript = `#!/bin/sh -case "$*" in - *--check-migrations*) - printf 'compatible: smoke broken-bangerd pretends to be ready\n' - exit 0 - ;; - *) - printf 'smoke broken-bangerd: refusing to run as daemon\n' >&2 - exit 1 - ;; -esac -` - if err := os.WriteFile(filepath.Join(stage, "bangerd"), []byte(brokenScript), 0o755); err != nil { - return fmt.Errorf("write broken bangerd: %w", err) - } - } else { - if err := build("./cmd/bangerd", filepath.Join(stage, "bangerd")); err != nil { - return err - } - } - - if err := build("./cmd/banger-vsock-agent", filepath.Join(stage, "banger-vsock-agent"), - "CGO_ENABLED=0", "GOOS=linux", "GOARCH=amd64"); err != nil { - return err - } - - tarballName := fmt.Sprintf("banger-%s-linux-amd64.tar.gz", version) - tarballPath := filepath.Join(outDir, tarballName) - if err := writeTarGz(stage, tarballPath); err != nil { - return fmt.Errorf("tar %s: %w", version, err) - } - - body, err := os.ReadFile(tarballPath) - if err != nil { - return fmt.Errorf("read tarball: %w", err) - } - hash := sha256.Sum256(body) - sumsBody := fmt.Sprintf("%x %s\n", hash, tarballName) - if err := os.WriteFile(filepath.Join(outDir, "SHA256SUMS"), []byte(sumsBody), 0o644); err != nil { - return fmt.Errorf("write SHA256SUMS: %w", err) - } - - sig, err := signCosignBlob(smokeRelKey, []byte(sumsBody)) - if err != nil { - return fmt.Errorf("sign SHA256SUMS for %s: %w", version, err) - } - if err := os.WriteFile(filepath.Join(outDir, "SHA256SUMS.sig"), []byte(sig), 0o644); err != nil { - return fmt.Errorf("write sig: %w", err) - } - - return os.RemoveAll(stage) -} - -// signCosignBlob produces a cosign-compatible blob signature: ASN.1 DER -// ECDSA over SHA256(body), base64 encoded with no newline. This is the -// exact wire format cosign produces and the Go updater verifies, and -// matches the bash chain `openssl dgst -sha256 -sign | base64 -w0`. -func signCosignBlob(priv *ecdsa.PrivateKey, body []byte) (string, error) { - hash := sha256.Sum256(body) - sig, err := ecdsa.SignASN1(rand.Reader, priv, hash[:]) - if err != nil { - return "", err - } - return base64.StdEncoding.EncodeToString(sig), nil -} - -// writeTarGz packages every regular file in srcDir at the root of a -// gzip tarball at dst. Mirrors the bash `tar czf` of the staged binary -// trio (banger, bangerd, banger-vsock-agent). -func writeTarGz(srcDir, dst string) error { - out, err := os.Create(dst) - if err != nil { - return err - } - defer out.Close() - gw := gzip.NewWriter(out) - defer gw.Close() - tw := tar.NewWriter(gw) - defer tw.Close() - - entries, err := os.ReadDir(srcDir) - if err != nil { - return err - } - for _, e := range entries { - if !e.Type().IsRegular() { - continue - } - path := filepath.Join(srcDir, e.Name()) - st, err := os.Stat(path) - if err != nil { - return err - } - hdr := &tar.Header{ - Name: e.Name(), - Mode: int64(st.Mode().Perm()), - Size: st.Size(), - ModTime: st.ModTime(), - } - if err := tw.WriteHeader(hdr); err != nil { - return err - } - f, err := os.Open(path) - if err != nil { - return err - } - if _, err := io.Copy(tw, f); err != nil { - f.Close() - return err - } - f.Close() - } - return nil -} - -func writeSmokeManifest(path, base string) error { - body := fmt.Sprintf(`{ - "schema_version": 1, - "latest_stable": %q, - "releases": [ - { - "version": %q, - "tarball_url": "%s/%s/banger-%s-linux-amd64.tar.gz", - "sha256sums_url": "%s/%s/SHA256SUMS", - "sha256sums_sig_url": "%s/%s/SHA256SUMS.sig", - "released_at": "2026-04-29T00:00:00Z" - }, - { - "version": %q, - "tarball_url": "%s/%s/banger-%s-linux-amd64.tar.gz", - "sha256sums_url": "%s/%s/SHA256SUMS", - "sha256sums_sig_url": "%s/%s/SHA256SUMS.sig", - "released_at": "2026-04-30T00:00:00Z" - } - ] -} -`, - smokeReleaseGood, - smokeReleaseGood, - base, smokeReleaseGood, smokeReleaseGood, - base, smokeReleaseGood, - base, smokeReleaseGood, - smokeReleaseBroken, - base, smokeReleaseBroken, smokeReleaseBroken, - base, smokeReleaseBroken, - base, smokeReleaseBroken, - ) - return os.WriteFile(path, []byte(body), 0o644) -} - -// repoRoot resolves the repo root (where go.mod lives) from the test -// binary's cwd. `go test` runs each package's tests from that package's -// source dir, so internal/smoketest -> ../.. lands at the root. -func repoRoot() (string, error) { - cwd, err := os.Getwd() - if err != nil { - return "", err - } - return filepath.Abs(filepath.Join(cwd, "..", "..")) -} diff --git a/internal/smoketest/scenarios_global_test.go b/internal/smoketest/scenarios_global_test.go deleted file mode 100644 index b75ea49..0000000 --- a/internal/smoketest/scenarios_global_test.go +++ /dev/null @@ -1,368 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "os/exec" - "regexp" - "strings" - "testing" -) - -// testInvalidSpec is the Go port of scenario_invalid_spec. Asserts that -// `vm run --rm --vcpu 0 ...` is rejected and that no VM row is leaked -// in the process. Global-class because it asserts on host-wide vm-list -// counts; running concurrently with pure-class VM creation would race. -func testInvalidSpec(t *testing.T) { - preCount := vmListAllCount(t) - - res := banger(t, "vm", "run", "--rm", "--vcpu", "0", "--", "echo", "unused") - if res.rc == 0 { - t.Fatalf("invalid spec: vm run unexpectedly succeeded with --vcpu 0\nstdout: %s\nstderr: %s", - res.stdout, res.stderr) - } - - postCount := vmListAllCount(t) - if preCount != postCount { - t.Fatalf("invalid spec leaked a VM row: pre=%d, post=%d", preCount, postCount) - } -} - -// vmListAllCount returns the line count of `banger vm list --all`. -// Mirrors the bash `vm list --all | wc -l` idiom; the absolute count -// doesn't matter, only that it doesn't change across the rejected -// invocation. -func vmListAllCount(t *testing.T) int { - t.Helper() - out := mustBanger(t, "vm", "list", "--all") - return strings.Count(out, "\n") -} - -// testVMPrune ports scenario_vm_prune. `vm prune -f` should remove -// stopped VMs while preserving running ones. Global-class because it -// asserts on host-wide vm-list contents. -func testVMPrune(t *testing.T) { - mustBanger(t, "vm", "create", "--name", "smoke-prune-running") - t.Cleanup(func() { vmDelete("smoke-prune-running") }) - mustBanger(t, "vm", "create", "--name", "smoke-prune-stopped") - t.Cleanup(func() { vmDelete("smoke-prune-stopped") }) - mustBanger(t, "vm", "stop", "smoke-prune-stopped") - - mustBanger(t, "vm", "prune", "-f") - - if banger(t, "vm", "show", "smoke-prune-running").rc != 0 { - t.Fatalf("vm prune: running VM was deleted (regression!)") - } - if banger(t, "vm", "show", "smoke-prune-stopped").rc == 0 { - t.Fatalf("vm prune: stopped VM survived prune") - } -} - -// guestIPRE captures `"guest_ip": "172.16.0.X"` from `vm show` JSON. -// Used by testNAT to map VMs to their POSTROUTING rule subjects. -var guestIPRE = regexp.MustCompile(`"guest_ip":\s*"([^"]+)"`) - -// vmGuestIP returns the guest_ip field from `vm show`. Fatals if -// missing — every running VM has one. -func vmGuestIP(t *testing.T, name string) string { - t.Helper() - show := mustBanger(t, "vm", "show", name) - m := guestIPRE.FindStringSubmatch(show) - if len(m) != 2 { - t.Fatalf("could not read guest_ip from vm show %q:\n%s", name, show) - } - return m[1] -} - -// testNAT ports scenario_nat. Verifies that `--nat` installs a per-VM -// MASQUERADE rule, that the rule survives stop/start, and that delete -// cleans it up. The control VM (no --nat) must NOT have a rule. -func testNAT(t *testing.T) { - requireSudoIptables(t) - - mustBanger(t, "vm", "create", "--name", "smoke-nat", "--nat") - t.Cleanup(func() { vmDelete("smoke-nat") }) - mustBanger(t, "vm", "create", "--name", "smoke-nocnat") - t.Cleanup(func() { vmDelete("smoke-nocnat") }) - - natIP := vmGuestIP(t, "smoke-nat") - ctlIP := vmGuestIP(t, "smoke-nocnat") - - postrouting := iptablesPostrouting(t) - natRule := "-s " + natIP + "/32" - if !strings.Contains(postrouting, natRule) || !strings.Contains(postrouting, "MASQUERADE") { - t.Fatalf("NAT: --nat VM has no POSTROUTING MASQUERADE rule for %s; got:\n%s", natIP, postrouting) - } - if strings.Contains(postrouting, "-s "+ctlIP+"/32") { - t.Fatalf("NAT: control VM unexpectedly has a MASQUERADE rule for %s", ctlIP) - } - - mustBanger(t, "vm", "stop", "smoke-nat") - mustBanger(t, "vm", "start", "smoke-nat") - postrouting = iptablesPostrouting(t) - count := strings.Count(postrouting, natRule) - if count != 1 { - t.Fatalf("NAT: MASQUERADE rule count for %s = %d after restart, want 1", natIP, count) - } - - mustBanger(t, "vm", "delete", "smoke-nat") - mustBanger(t, "vm", "delete", "smoke-nocnat") - postrouting = iptablesPostrouting(t) - if strings.Contains(postrouting, natRule) { - t.Fatalf("NAT: delete left a MASQUERADE rule behind for %s", natIP) - } -} - -func iptablesPostrouting(t *testing.T) string { - t.Helper() - out, err := exec.Command("sudo", "-n", "iptables", "-t", "nat", "-S", "POSTROUTING").Output() - if err != nil { - t.Fatalf("read iptables POSTROUTING: %v", err) - } - return string(out) -} - -// testInvalidName ports scenario_invalid_name. A handful of malformed -// names must all be rejected and none of them may leak a VM row. -func testInvalidName(t *testing.T) { - preCount := vmListAllCount(t) - for _, bad := range []string{"MyBox", "my box", "box.vm", "-box"} { - res := banger(t, "vm", "create", "--name", bad, "--no-start") - if res.rc == 0 { - t.Fatalf("invalid name: vm create accepted %q", bad) - } - } - if postCount := vmListAllCount(t); postCount != preCount { - t.Fatalf("invalid name leaked VM row(s): pre=%d, post=%d", preCount, postCount) - } -} - -// updateBaseArgs are the manifest/pubkey flags every update scenario -// needs to redirect the updater away from the production R2 bucket -// and at our smoke release server. Built lazily because manifestURL / -// pubkeyFile are populated by prepareSmokeReleases. -func updateBaseArgs() []string { - return []string{"--manifest-url", manifestURL, "--pubkey-file", pubkeyFile} -} - -// testUpdateCheck ports scenario_update_check. `update --check` must -// succeed against the smoke release server and announce the available -// version on stdout. -func testUpdateCheck(t *testing.T) { - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - args := append([]string{"update", "--check"}, updateBaseArgs()...) - res := banger(t, args...) - if res.rc != 0 { - t.Fatalf("update --check failed: rc=%d\nstdout: %s\nstderr: %s", - res.rc, res.stdout, res.stderr) - } - wantContains(t, res.stdout+res.stderr, "update available: ", "update --check stdout") -} - -// testUpdateToUnknown ports scenario_update_to_unknown. Asking for a -// version not in the manifest must fail before any host mutation — -// the installed binary's version stays put. -func testUpdateToUnknown(t *testing.T) { - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - preVer := installedVersion(t) - args := append([]string{"update", "--to", "v9.9.9"}, updateBaseArgs()...) - res := banger(t, args...) - if res.rc == 0 { - t.Fatalf("update --to v9.9.9: exit 0 (out: %s%s)", res.stdout, res.stderr) - } - combined := strings.ToLower(res.stdout + res.stderr) - if !strings.Contains(combined, "not found") { - t.Fatalf("update --to v9.9.9: error doesn't say 'not found'; got: %s%s", res.stdout, res.stderr) - } - if postVer := installedVersion(t); preVer != postVer { - t.Fatalf("update --to v9.9.9 mutated the install: %s -> %s", preVer, postVer) - } -} - -// testUpdateNoRoot ports scenario_update_no_root. Non-sudo invocation -// of `update --to` must refuse with a root-required error and leave -// the install untouched. -func testUpdateNoRoot(t *testing.T) { - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - preVer := installedVersion(t) - args := append([]string{"update", "--to", smokeReleaseGood}, updateBaseArgs()...) - res := banger(t, args...) - if res.rc == 0 { - t.Fatalf("update without sudo: exit 0 (out: %s%s)", res.stdout, res.stderr) - } - combined := strings.ToLower(res.stdout + res.stderr) - if !strings.Contains(combined, "root") { - t.Fatalf("update without sudo: error doesn't mention root; got: %s%s", res.stdout, res.stderr) - } - if postVer := installedVersion(t); preVer != postVer { - t.Fatalf("update without sudo mutated the install: %s -> %s", preVer, postVer) - } -} - -// testUpdateDryRun ports scenario_update_dry_run. `--dry-run` fetches -// + verifies the new release but must not swap the binary. -func testUpdateDryRun(t *testing.T) { - requirePasswordlessSudo(t) - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - preVer := installedVersion(t) - args := append([]string{"update", "--to", smokeReleaseGood, "--dry-run"}, updateBaseArgs()...) - res := sudoBanger(t, args...) - if res.rc != 0 { - t.Fatalf("update --dry-run failed: %s%s", res.stdout, res.stderr) - } - wantContains(t, res.stdout+res.stderr, "dry-run:", "update --dry-run stdout") - if postVer := installedVersion(t); preVer != postVer { - t.Fatalf("update --dry-run swapped the binary: %s -> %s", preVer, postVer) - } -} - -// vmBootID reads /proc/sys/kernel/random/boot_id from the guest. The -// kernel regenerates it on every boot, so an unchanged value across a -// daemon restart proves the firecracker process survived. Used by both -// update scenarios that assert "the VM stays alive". -func vmBootID(t *testing.T, name string) string { - t.Helper() - out, _ := exec.Command(bangerBin, "vm", "ssh", name, "--", "cat", "/proc/sys/kernel/random/boot_id").Output() - return strings.TrimSpace(string(out)) -} - -var installTomlVersionRE = regexp.MustCompile(`(?m)^version\s*=\s*"([^"]+)"`) - -// installedTomlVersion reads /etc/banger/install.toml's version field -// (under sudo since the dir is not always world-readable). -func installedTomlVersion(t *testing.T) string { - t.Helper() - out, err := exec.Command("sudo", "cat", "/etc/banger/install.toml").Output() - if err != nil { - t.Fatalf("read /etc/banger/install.toml: %v", err) - } - m := installTomlVersionRE.FindStringSubmatch(string(out)) - if len(m) != 2 { - t.Fatalf("install.toml: no version field in:\n%s", out) - } - return m[1] -} - -// testUpdateKeepsVMAlive ports scenario_update_keeps_vm_alive. The -// long-running update scenario: a real swap to v0.smoke.0, must not -// reboot the running VM, must update the install metadata, and the VM -// must still answer SSH afterwards. -func testUpdateKeepsVMAlive(t *testing.T) { - requirePasswordlessSudo(t) - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - const name = "smoke-update" - vmCreate(t, name) - waitForSSH(t, name) - preBoot := vmBootID(t, name) - if preBoot == "" { - t.Fatalf("pre-update boot_id capture failed") - } - preVer := installedVersion(t) - - args := append([]string{"update", "--to", smokeReleaseGood}, updateBaseArgs()...) - if res := sudoBanger(t, args...); res.rc != 0 { - t.Fatalf("update --to %s failed: %s%s", smokeReleaseGood, res.stdout, res.stderr) - } - - postVer := installedVersion(t) - if postVer != smokeReleaseGood { - t.Fatalf("post-update /usr/local/bin/banger version = %s, want %s", postVer, smokeReleaseGood) - } - if preVer == postVer { - t.Fatalf("update did not change the binary version (pre==post=%s)", postVer) - } - if metaVer := installedTomlVersion(t); metaVer != smokeReleaseGood { - t.Fatalf("install.toml version = %q, want %s", metaVer, smokeReleaseGood) - } - - waitForSSH(t, name) - postBoot := vmBootID(t, name) - if postBoot == "" { - t.Fatalf("post-update boot_id read failed") - } - if preBoot != postBoot { - t.Fatalf("VM rebooted during update: boot_id %s -> %s", preBoot, postBoot) - } -} - -// testUpdateRollbackKeepsVMAlive ports scenario_update_rollback_keeps_vm_alive. -// Rollback drill: install the broken-bangerd release, which passes the -// pre-swap migration sanity but fails as a service. runUpdate's -// rollbackAndWrap must restore the previous binaries, and the VM must -// survive the whole drill. -func testUpdateRollbackKeepsVMAlive(t *testing.T) { - requirePasswordlessSudo(t) - if err := prepareSmokeReleases(); err != nil { - t.Fatalf("prepare smoke releases: %v", err) - } - preVer := installedVersion(t) - - const name = "smoke-rollback" - vmCreate(t, name) - waitForSSH(t, name) - preBoot := vmBootID(t, name) - if preBoot == "" { - t.Fatalf("pre-drill boot_id capture failed") - } - - args := append([]string{"update", "--to", smokeReleaseBroken}, updateBaseArgs()...) - res := sudoBanger(t, args...) - if res.rc == 0 { - t.Fatalf("rollback drill: update returned exit 0 despite broken bangerd\nstdout: %s\nstderr: %s", - res.stdout, res.stderr) - } - - if postVer := installedVersion(t); postVer != preVer { - t.Fatalf("rollback drill: post-rollback version = %s, want %s", postVer, preVer) - } - - waitForSSH(t, name) - postBoot := vmBootID(t, name) - if postBoot == "" { - t.Fatalf("post-rollback boot_id read failed") - } - if preBoot != postBoot { - t.Fatalf("VM rebooted during rollback drill: boot_id %s -> %s", preBoot, postBoot) - } -} - -// testDaemonAdmin ports scenario_daemon_admin. MUST be the last global -// scenario in the run order: `banger daemon stop` tears the installed -// services down, so anything after it that talks to the daemon would -// fail. The teardown path re-stops idempotently. -func testDaemonAdmin(t *testing.T) { - socket := strings.TrimSpace(mustBanger(t, "daemon", "socket")) - if socket != "/run/banger/bangerd.sock" { - t.Fatalf("daemon socket: got %q, want /run/banger/bangerd.sock", socket) - } - - migOut, err := exec.Command(bangerdBin, "--system", "--check-migrations").CombinedOutput() - if err != nil { - t.Fatalf("bangerd --check-migrations: %v\n%s", err, migOut) - } - if !strings.HasPrefix(strings.TrimSpace(string(migOut)), "compatible:") { - t.Fatalf("bangerd --check-migrations: stdout missing 'compatible:' prefix; got: %s", migOut) - } - - requirePasswordlessSudo(t) - if res := sudoBanger(t, "daemon", "stop"); res.rc != 0 { - t.Fatalf("banger daemon stop: %s%s", res.stdout, res.stderr) - } - status, _ := exec.Command(bangerBin, "system", "status").Output() - if !regexp.MustCompile(`(?m)^active\s+inactive`).Match(status) { - t.Fatalf("owner daemon still active after daemon stop:\n%s", status) - } - if !regexp.MustCompile(`(?m)^helper_active\s+inactive`).Match(status) { - t.Fatalf("root helper still active after daemon stop:\n%s", status) - } -} diff --git a/internal/smoketest/scenarios_pure_test.go b/internal/smoketest/scenarios_pure_test.go deleted file mode 100644 index fd92add..0000000 --- a/internal/smoketest/scenarios_pure_test.go +++ /dev/null @@ -1,311 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "os" - "os/exec" - "path/filepath" - "regexp" - "strings" - "sync" - "testing" -) - -// testBareRun is the Go port of scenario_bare_run from -// scripts/smoke.sh. Bare ephemeral VM run: create + start + ssh + -// echo + --rm. -func testBareRun(t *testing.T) { - t.Parallel() - out := mustBanger(t, "vm", "run", "--rm", "--", "echo", "smoke-bare-ok") - wantContains(t, out, "smoke-bare-ok", "bare vm run stdout") -} - -// testExitCode is the Go port of scenario_exit_code. Asserts that -// `vm run -- sh -c 'exit 42'` propagates rc=42 verbatim. -func testExitCode(t *testing.T) { - t.Parallel() - res := banger(t, "vm", "run", "--rm", "--", "sh", "-c", "exit 42") - wantExit(t, res, 42, "exit-code propagation") -} - -// testConcurrentRun fires two `vm run --rm` invocations simultaneously -// and asserts both succeed and emit their respective markers. Bash uses -// `& ; wait`; Go uses two goroutines that capture the result and a -// WaitGroup. Note: t.Fatalf cannot be called from a goroutine, so the -// children write to result slots and assertions run on the main goroutine. -func testConcurrentRun(t *testing.T) { - t.Parallel() - var wg sync.WaitGroup - var resA, resB result - run := func(dst *result, marker string) { - defer wg.Done() - cmd := exec.Command(bangerBin, "vm", "run", "--rm", "--", "echo", marker) - var out, errBuf strings.Builder - cmd.Stdout = &out - cmd.Stderr = &errBuf - err := cmd.Run() - dst.stdout = out.String() - dst.stderr = errBuf.String() - if err != nil { - if exitErr, ok := err.(*exec.ExitError); ok { - dst.rc = exitErr.ExitCode() - } else { - dst.rc = -1 - dst.stderr += "\nexec error: " + err.Error() - } - } - } - wg.Add(2) - go run(&resA, "smoke-concurrent-a") - go run(&resB, "smoke-concurrent-b") - wg.Wait() - wantExit(t, resA, 0, "concurrent A exit") - wantExit(t, resB, 0, "concurrent B exit") - wantContains(t, resA.stdout, "smoke-concurrent-a", "concurrent A stdout") - wantContains(t, resB.stdout, "smoke-concurrent-b", "concurrent B stdout") -} - -// testDetachRun ports scenario_detach_run. Verifies -d combined with -// --rm or with a guest command is rejected before VM creation, then -// that -d --name leaves the VM running and ssh-able. -func testDetachRun(t *testing.T) { - t.Parallel() - - res := banger(t, "vm", "run", "-d", "--rm") - if res.rc == 0 { - t.Fatalf("detach: -d --rm should be rejected before VM creation") - } - - res = banger(t, "vm", "run", "-d", "--", "echo", "hi") - if res.rc == 0 { - t.Fatalf("detach: -d -- should be rejected before VM creation") - } - - const name = "smoke-detach" - mustBanger(t, "vm", "run", "-d", "--name", name) - t.Cleanup(func() { vmDelete(name) }) - - show := mustBanger(t, "vm", "show", name) - wantContains(t, show, `"state": "running"`, "detach: post-detach state") - - out := mustBanger(t, "vm", "ssh", name, "--", "echo", "detach-marker") - wantContains(t, out, "detach-marker", "detach: ssh stdout") -} - -// testBootstrapPrecondition ports scenario_bootstrap_precondition. -// A workspace with .mise.toml requires NAT (or --no-bootstrap) to run. -// The fake repo lives in a TempDir so it doesn't pollute the shared -// repodir fixture used by repodir-class scenarios. -func testBootstrapPrecondition(t *testing.T) { - t.Parallel() - miseRepo := t.TempDir() - gitInit := func(args ...string) { - t.Helper() - cmd := exec.Command(args[0], args[1:]...) - cmd.Dir = miseRepo - if out, err := cmd.CombinedOutput(); err != nil { - t.Fatalf("setup mise repo: %s: %v\n%s", args, err, out) - } - } - gitInit("git", "init", "-q") - gitInit("git", "-c", "user.email=smoke@banger", "-c", "user.name=smoke", - "commit", "--allow-empty", "-q", "-m", "init") - if err := os.WriteFile(filepath.Join(miseRepo, ".mise.toml"), []byte("[tools]\n"), 0o644); err != nil { - t.Fatalf("write .mise.toml: %v", err) - } - gitInit("git", "add", ".mise.toml") - gitInit("git", "-c", "user.email=smoke@banger", "-c", "user.name=smoke", - "commit", "-q", "-m", "add mise") - - res := banger(t, "vm", "run", "--rm", miseRepo, "--", "echo", "nope") - if res.rc == 0 { - t.Fatalf("bootstrap: workspace with .mise.toml should refuse without --nat / --no-bootstrap") - } - - out := mustBanger(t, "vm", "run", "--rm", "--no-bootstrap", miseRepo, "--", "echo", "no-bootstrap-ok") - wantContains(t, out, "no-bootstrap-ok", "bootstrap: --no-bootstrap stdout") -} - -// testVMLifecycle ports scenario_vm_lifecycle. Drives an explicit -// create / show / ssh / stop / start / ssh / delete and asserts the -// state transitions are visible in `vm show`. -func testVMLifecycle(t *testing.T) { - t.Parallel() - const name = "smoke-lifecycle" - vmCreate(t, name) - - show := mustBanger(t, "vm", "show", name) - wantContains(t, show, `"state": "running"`, "post-create state") - - waitForSSH(t, name) - out := mustBanger(t, "vm", "ssh", name, "--", "echo", "hello-1") - wantContains(t, out, "hello-1", "vm ssh #1") - - mustBanger(t, "vm", "stop", name) - show = mustBanger(t, "vm", "show", name) - wantContains(t, show, `"state": "stopped"`, "post-stop state") - - mustBanger(t, "vm", "start", name) - show = mustBanger(t, "vm", "show", name) - wantContains(t, show, `"state": "running"`, "post-start state") - - waitForSSH(t, name) - out = mustBanger(t, "vm", "ssh", name, "--", "echo", "hello-2") - wantContains(t, out, "hello-2", "vm ssh #2 (post-restart)") - - mustBanger(t, "vm", "delete", name) - res := banger(t, "vm", "show", name) - if res.rc == 0 { - t.Fatalf("vm show still finds %q after delete\nstdout: %s", name, res.stdout) - } -} - -// testVMSet ports scenario_vm_set. Creates with --vcpu 2, asserts -// guest sees 2 CPUs, reconfigures to 4 while stopped, asserts guest -// sees 4 after restart. -func testVMSet(t *testing.T) { - t.Parallel() - const name = "smoke-set" - vmCreate(t, name, "--vcpu", "2") - waitForSSH(t, name) - - out := mustBanger(t, "vm", "ssh", name, "--", "nproc") - if got := strings.TrimSpace(out); got != "2" { - t.Fatalf("vm set: initial nproc got %q, want 2", got) - } - - mustBanger(t, "vm", "stop", name) - mustBanger(t, "vm", "set", name, "--vcpu", "4") - mustBanger(t, "vm", "start", name) - waitForSSH(t, name) - - out = mustBanger(t, "vm", "ssh", name, "--", "nproc") - if got := strings.TrimSpace(out); got != "4" { - t.Fatalf("vm set: post-reconfig nproc got %q, want 4 (spec change didn't land)", got) - } -} - -// testVMRestart ports scenario_vm_restart. Reads /proc boot_id before -// and after `vm restart`; the kernel regenerates it on every boot, so -// distinct values prove the verb actually rebooted the guest. -func testVMRestart(t *testing.T) { - t.Parallel() - const name = "smoke-restart" - vmCreate(t, name) - waitForSSH(t, name) - - bootBefore := strings.TrimSpace(mustBanger(t, "vm", "ssh", name, "--", "cat", "/proc/sys/kernel/random/boot_id")) - if bootBefore == "" { - t.Fatalf("vm restart: could not read initial boot_id") - } - - mustBanger(t, "vm", "restart", name) - waitForSSH(t, name) - - bootAfter := strings.TrimSpace(mustBanger(t, "vm", "ssh", name, "--", "cat", "/proc/sys/kernel/random/boot_id")) - if bootAfter == "" { - t.Fatalf("vm restart: could not read post-restart boot_id") - } - if bootBefore == bootAfter { - t.Fatalf("vm restart: boot_id unchanged (%s); verb didn't actually reboot the guest", bootBefore) - } -} - -// dmDevRE captures the dm-snapshot device name from `vm show` JSON. -// Used by testVMKill to check that `vm kill --signal KILL` cleans up -// the dm device alongside the firecracker process. -var dmDevRE = regexp.MustCompile(`"dm_dev":\s*"(fc-rootfs-[^"]+)"`) - -// testVMKill ports scenario_vm_kill. `vm kill --signal KILL` must stop -// the VM and clean up its dm-snapshot device. The dm-name capture -// degrades gracefully — older builds without the field still pass the -// state-check half. -func testVMKill(t *testing.T) { - t.Parallel() - const name = "smoke-kill" - vmCreate(t, name) - - show := mustBanger(t, "vm", "show", name) - var dmName string - if m := dmDevRE.FindStringSubmatch(show); len(m) == 2 { - dmName = m[1] - } - - mustBanger(t, "vm", "kill", "--signal", "KILL", name) - show = mustBanger(t, "vm", "show", name) - wantContains(t, show, `"state": "stopped"`, "post-kill state") - - if dmName != "" { - out, _ := exec.Command("sudo", "-n", "dmsetup", "ls").CombinedOutput() - for _, line := range strings.Split(string(out), "\n") { - fields := strings.Fields(line) - if len(fields) > 0 && fields[0] == dmName { - t.Fatalf("vm kill: dm device %q still mapped (cleanup didn't run)", dmName) - } - } - } -} - -// testVMPorts ports scenario_vm_ports. Asserts `vm ports` reports the -// guest's sshd listener under the VM's DNS name. -func testVMPorts(t *testing.T) { - t.Parallel() - const name = "smoke-ports" - vmCreate(t, name) - waitForSSH(t, name) - - out := mustBanger(t, "vm", "ports", name) - wantContains(t, out, "smoke-ports.vm:22", "vm ports stdout (host:port)") - wantContains(t, out, "sshd", "vm ports stdout (process name)") -} - -// testSSHConfig ports scenario_ssh_config. Drives ssh-config -// install/uninstall against a fake $HOME so the contributor's real -// ~/.ssh/config is never touched. Verifies idempotent install, -// preservation of pre-existing user content, and clean uninstall. -func testSSHConfig(t *testing.T) { - t.Parallel() - fakeHome := t.TempDir() - if err := os.MkdirAll(filepath.Join(fakeHome, ".ssh"), 0o700); err != nil { - t.Fatalf("mkdir .ssh: %v", err) - } - cfg := filepath.Join(fakeHome, ".ssh", "config") - if err := os.WriteFile(cfg, []byte("Host myserver\n HostName example.invalid\n"), 0o600); err != nil { - t.Fatalf("write fake config: %v", err) - } - - mustBangerHome(t, fakeHome, "ssh-config", "--install") - cfgBytes, err := os.ReadFile(cfg) - if err != nil { - t.Fatalf("read fake config after install: %v", err) - } - body := string(cfgBytes) - if !strings.Contains(body, "\nInclude ") && !strings.HasPrefix(body, "Include ") { - t.Fatalf("ssh-config: install didn't add Include line:\n%s", body) - } - wantContains(t, body, "Host myserver", "ssh-config: install must preserve user content") - - mustBangerHome(t, fakeHome, "ssh-config", "--install") - cfgBytes, _ = os.ReadFile(cfg) - body = string(cfgBytes) - includeCount := 0 - for _, line := range strings.Split(body, "\n") { - if strings.HasPrefix(line, "Include ") && strings.Contains(line, "banger") { - includeCount++ - } - } - if includeCount != 1 { - t.Fatalf("ssh-config: install not idempotent (Include appeared %d times)", includeCount) - } - - mustBangerHome(t, fakeHome, "ssh-config", "--uninstall") - cfgBytes, _ = os.ReadFile(cfg) - body = string(cfgBytes) - for _, line := range strings.Split(body, "\n") { - if strings.HasPrefix(line, "Include ") && strings.Contains(line, "banger") { - t.Fatalf("ssh-config: uninstall left the Include line behind:\n%s", body) - } - } - wantContains(t, body, "Host myserver", "ssh-config: uninstall must keep user content") -} diff --git a/internal/smoketest/scenarios_repodir_test.go b/internal/smoketest/scenarios_repodir_test.go deleted file mode 100644 index 65f1e22..0000000 --- a/internal/smoketest/scenarios_repodir_test.go +++ /dev/null @@ -1,205 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "os" - "os/exec" - "path/filepath" - "strings" - "testing" -) - -// testWorkspaceRun ports scenario_workspace_run. Ships the throwaway -// git repo to a fresh VM and reads the marker file from the guest. -func testWorkspaceRun(t *testing.T) { - out := mustBanger(t, "vm", "run", "--rm", repoDir, "--", "cat", "/root/repo/smoke-file.txt") - wantContains(t, out, "smoke-workspace-marker", "workspace vm run guest read") -} - -// testWorkspaceDryrun ports scenario_workspace_dryrun. `--dry-run` -// lists the tracked files and the resolved transfer mode without -// creating a VM. -func testWorkspaceDryrun(t *testing.T) { - out := mustBanger(t, "vm", "run", "--dry-run", repoDir) - wantContains(t, out, "smoke-file.txt", "dry-run file list") - wantContains(t, out, "mode: tracked only", "dry-run mode line") -} - -// testIncludeUntracked ports scenario_include_untracked. Drops an -// untracked file in the fixture and asserts --include-untracked picks -// it up. The cleanup hook removes the file even if the scenario fails -// so downstream repodir scenarios see the original tree. -func testIncludeUntracked(t *testing.T) { - untracked := filepath.Join(repoDir, "smoke-untracked.txt") - if err := os.WriteFile(untracked, []byte("untracked-marker\n"), 0o644); err != nil { - t.Fatalf("write untracked file: %v", err) - } - t.Cleanup(func() { _ = os.Remove(untracked) }) - - out := mustBanger(t, "vm", "run", "--rm", "--include-untracked", repoDir, - "--", "cat", "/root/repo/smoke-untracked.txt") - wantContains(t, out, "untracked-marker", "include-untracked guest read") -} - -// testWorkspaceExport ports scenario_workspace_export. Round-trips a -// guest-side edit back out as a patch via `vm workspace export`. -func testWorkspaceExport(t *testing.T) { - const name = "smoke-export" - vmCreate(t, name, "--image", "debian-bookworm") - mustBanger(t, "vm", "workspace", "prepare", name, repoDir) - mustBanger(t, "vm", "ssh", name, "--", "sh", "-c", - "echo guest-edit > /root/repo/new-guest-file.txt") - - patch := filepath.Join(runtimeDir, "smoke-export.diff") - mustBanger(t, "vm", "workspace", "export", name, "--output", patch) - - st, err := os.Stat(patch) - if err != nil { - t.Fatalf("export: stat patch %s: %v", patch, err) - } - if st.Size() == 0 { - t.Fatalf("export: patch file empty at %s", patch) - } - body, err := os.ReadFile(patch) - if err != nil { - t.Fatalf("export: read patch: %v", err) - } - wantContains(t, string(body), "new-guest-file.txt", "export: patch must reference new-guest-file.txt") -} - -// testWorkspaceFullCopy ports scenario_workspace_full_copy. Verifies -// the alternate transfer path (--mode full_copy) lands the same fixture -// in the guest. -func testWorkspaceFullCopy(t *testing.T) { - const name = "smoke-fc" - vmCreate(t, name) - mustBanger(t, "vm", "workspace", "prepare", name, repoDir, "--mode", "full_copy") - - out := mustBanger(t, "vm", "ssh", name, "--", "cat", "/root/repo/smoke-file.txt") - wantContains(t, out, "smoke-workspace-marker", "full_copy: marker missing in guest") -} - -// testWorkspaceBasecommit ports scenario_workspace_basecommit. Confirms -// that `vm workspace export` without --base-commit captures only the -// working-copy diff, while --base-commit also captures guest-side -// commits made on top of HEAD. -func testWorkspaceBasecommit(t *testing.T) { - const name = "smoke-basecommit" - vmCreate(t, name) - mustBanger(t, "vm", "workspace", "prepare", name, repoDir) - - baseSHA := strings.TrimSpace(mustBanger(t, "vm", "ssh", name, "--", - "sh", "-c", "cd /root/repo && git rev-parse HEAD")) - if len(baseSHA) != 40 { - t.Fatalf("export base: bad base sha: %q", baseSHA) - } - - mustBanger(t, "vm", "ssh", name, "--", "sh", "-c", - "cd /root/repo && "+ - "git -c user.email=smoke@smoke -c user.name=smoke checkout -b smoke-branch >/dev/null 2>&1 && "+ - "echo committed-marker > smoke-committed.txt && "+ - "git add smoke-committed.txt && "+ - "git -c user.email=smoke@smoke -c user.name=smoke commit -q -m 'guest side'") - - plain := filepath.Join(runtimeDir, "smoke-plain.diff") - mustBanger(t, "vm", "workspace", "export", name, "--output", plain) - if body, err := os.ReadFile(plain); err == nil { - wantNotContains(t, string(body), "smoke-committed.txt", - "export base: plain export must NOT capture guest-side commit") - } - - base := filepath.Join(runtimeDir, "smoke-base.diff") - mustBanger(t, "vm", "workspace", "export", name, "--base-commit", baseSHA, "--output", base) - st, err := os.Stat(base) - if err != nil || st.Size() == 0 { - t.Fatalf("export base: --base-commit patch empty/missing: stat=%v err=%v", st, err) - } - body, _ := os.ReadFile(base) - wantContains(t, string(body), "smoke-committed.txt", - "export base: --base-commit patch must include committed marker") -} - -// testWorkspaceRestart ports scenario_workspace_restart. Verifies the -// workspace marker survives a stop/start cycle (rootfs persistence). -func testWorkspaceRestart(t *testing.T) { - const name = "smoke-wsrestart" - vmCreate(t, name) - mustBanger(t, "vm", "workspace", "prepare", name, repoDir) - - pre := mustBanger(t, "vm", "ssh", name, "--", "cat", "/root/repo/smoke-file.txt") - wantContains(t, pre, "smoke-workspace-marker", "workspace stop/start: pre-cycle marker") - - mustBanger(t, "vm", "stop", name) - mustBanger(t, "vm", "start", name) - waitForSSH(t, name) - - post := mustBanger(t, "vm", "ssh", name, "--", "cat", "/root/repo/smoke-file.txt") - wantContains(t, post, "smoke-workspace-marker", "workspace stop/start: post-cycle marker") -} - -// testVMExec ports scenario_vm_exec. The longest scenario in the suite -// — covers auto-cd, exit-code propagation, stale-workspace detection, -// --auto-prepare resync, and the not-running refusal. The repodir -// commit added mid-scenario is rolled back via t.Cleanup so subsequent -// repodir-chain scenarios see the original fixture state. -func testVMExec(t *testing.T) { - const name = "smoke-exec" - vmCreate(t, name) - mustBanger(t, "vm", "workspace", "prepare", name, repoDir) - - show := mustBanger(t, "vm", "show", name) - wantContains(t, show, `"guest_path": "/root/repo"`, - "vm exec: workspace.guest_path not persisted") - - out := mustBanger(t, "vm", "exec", name, "--", "cat", "smoke-file.txt") - wantContains(t, out, "smoke-workspace-marker", "vm exec: workspace marker") - - if got := strings.TrimSpace(mustBanger(t, "vm", "exec", name, "--", "pwd")); got != "/root/repo" { - t.Fatalf("vm exec: pwd got %q, want /root/repo (auto-cd didn't happen)", got) - } - - res := banger(t, "vm", "exec", name, "--", "sh", "-c", "exit 17") - wantExit(t, res, 17, "vm exec: exit-code propagation") - - // Advance host HEAD so the workspace goes stale, register the - // rollback before mutating so a Fatal anywhere below still - // restores the fixture. - t.Cleanup(func() { - cmd := exec.Command("git", "reset", "--hard", "HEAD~1", "-q") - cmd.Dir = repoDir - _ = cmd.Run() - }) - for _, args := range [][]string{ - {"sh", "-c", "echo post-prepare-marker > smoke-exec-new.txt"}, - {"git", "add", "smoke-exec-new.txt"}, - {"git", "commit", "-q", "-m", "add smoke-exec-new.txt after prepare"}, - } { - cmd := exec.Command(args[0], args[1:]...) - cmd.Dir = repoDir - if out, err := cmd.CombinedOutput(); err != nil { - t.Fatalf("vm exec: stage host commit: %s: %v\n%s", args, err, out) - } - } - - stale := banger(t, "vm", "exec", name, "--", "ls", "smoke-exec-new.txt") - if stale.rc == 0 { - t.Fatalf("vm exec: stale workspace already had the new file (dirty path didn't take effect)") - } - wantContains(t, stale.stderr, "workspace stale", "vm exec: stale-workspace warning on stderr") - wantContains(t, stale.stderr, "--auto-prepare", "vm exec: stale warning must mention --auto-prepare") - - auto := mustBanger(t, "vm", "exec", name, "--auto-prepare", "--", "cat", "smoke-exec-new.txt") - wantContains(t, auto, "post-prepare-marker", "vm exec: --auto-prepare didn't re-sync new file") - - clean := banger(t, "vm", "exec", name, "--", "true") - wantExit(t, clean, 0, "vm exec: post-auto-prepare run") - wantNotContains(t, clean.stderr, "workspace stale", "vm exec: stale warning persisted after --auto-prepare") - - mustBanger(t, "vm", "stop", name) - stopped := banger(t, "vm", "exec", name, "--", "true") - if stopped.rc == 0 { - t.Fatalf("vm exec: exec on stopped VM unexpectedly succeeded") - } - wantContains(t, stopped.stderr, "not running", "vm exec: stopped-VM error message") -} diff --git a/internal/smoketest/smoke_main_test.go b/internal/smoketest/smoke_main_test.go deleted file mode 100644 index e03b3ce..0000000 --- a/internal/smoketest/smoke_main_test.go +++ /dev/null @@ -1,305 +0,0 @@ -//go:build smoke - -package smoketest - -import ( - "errors" - "fmt" - "io" - "os" - "os/exec" - "os/user" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// Package-level state set up in TestMain and consumed by every test. -// Lowercase, file-scope; tests in this package don't share globals -// with other packages because of the build tag. -var ( - bangerBin string - bangerdBin string - vsockBin string - coverDir string - scratchRoot string - runtimeDir string - repoDir string - smokeOwner string -) - -const ( - serviceCoverDir = "/var/lib/banger" - smokeMarker = "/etc/banger/.smoke-owned" - ownerService = "bangerd.service" - rootService = "bangerd-root.service" -) - -// smokeConfigTOML is the smoke-tuned daemon config dropped at -// /etc/banger/config.toml after install (mirrors scripts/smoke.sh:404-415). -// Small VMs by default — scenarios that need full-size resources override -// --vcpu / --memory / --disk-size explicitly. -const smokeConfigTOML = `# Smoke-tuned defaults — every VM starts small unless the scenario -# overrides --vcpu / --memory / --disk-size explicitly. -[vm_defaults] -vcpu = 2 -memory_mib = 1024 -disk_size = "2G" -system_overlay_size = "2G" -` - -func TestMain(m *testing.M) { - // `go test -list ...` (used by `make smoke-list`) just enumerates - // the test names. Skip the install preamble and let m.Run() print - // the listing — env vars + KVM aren't needed for discovery. - if isListMode() { - os.Exit(m.Run()) - } - - if err := requireEnv(); err != nil { - fmt.Fprintf(os.Stderr, "[smoke] %v\n", err) - // Skip cleanly when run outside `make smoke`. Returning 0 - // prevents `go test` from being mistaken for a real failure - // when a contributor accidentally runs the smoke package - // directly without the harness env. - os.Exit(0) - } - - // Export GOCOVERDIR so every banger / bangerd subprocess this - // test binary spawns lands its covdata under BANGER_SMOKE_COVER_DIR. - // The test binary itself is not instrumented; only the smoke - // binaries are (they were built with `go build -cover`). - if err := os.Setenv("GOCOVERDIR", coverDir); err != nil { - fmt.Fprintf(os.Stderr, "[smoke] setenv GOCOVERDIR: %v\n", err) - os.Exit(1) - } - - if err := installPreamble(); err != nil { - fmt.Fprintf(os.Stderr, "[smoke] install preamble failed: %v\n", err) - os.Exit(1) - } - - if err := setupRepoFixture(); err != nil { - fmt.Fprintf(os.Stderr, "[smoke] fixture setup failed: %v\n", err) - teardown() - os.Exit(1) - } - - code := m.Run() - teardown() - os.Exit(code) -} - -// isListMode returns true when the test binary was invoked with the -// `-test.list` flag, which `go test -list ...` translates into. In that -// mode the harness only enumerates names and never spawns a test, so -// requireEnv / installPreamble would needlessly block discovery on a -// fresh checkout (no KVM, no sudo). -func isListMode() bool { - for _, a := range os.Args[1:] { - if a == "-test.list" || strings.HasPrefix(a, "-test.list=") { - return true - } - } - return false -} - -// requireEnv reads and validates the three BANGER_SMOKE_* env vars and -// confirms the binaries they point at exist and are executable. Returns -// a single descriptive error so a contributor running by hand sees -// exactly which variable is missing. -func requireEnv() error { - binDir := os.Getenv("BANGER_SMOKE_BIN_DIR") - if binDir == "" { - return errors.New("BANGER_SMOKE_BIN_DIR not set; run via `make smoke`") - } - cov := os.Getenv("BANGER_SMOKE_COVER_DIR") - if cov == "" { - return errors.New("BANGER_SMOKE_COVER_DIR not set; run via `make smoke`") - } - xdg := os.Getenv("BANGER_SMOKE_XDG_DIR") - if xdg == "" { - return errors.New("BANGER_SMOKE_XDG_DIR not set; run via `make smoke`") - } - - bangerBin = filepath.Join(binDir, "banger") - bangerdBin = filepath.Join(binDir, "bangerd") - vsockBin = filepath.Join(binDir, "banger-vsock-agent") - coverDir = cov - scratchRoot = xdg - - for _, bin := range []string{bangerBin, bangerdBin, vsockBin} { - st, err := os.Stat(bin) - if err != nil { - return fmt.Errorf("smoke binary missing: %s: %w", bin, err) - } - if st.Mode()&0o111 == 0 { - return fmt.Errorf("smoke binary not executable: %s", bin) - } - } - - if err := os.MkdirAll(coverDir, 0o755); err != nil { - return fmt.Errorf("mkdir cover dir: %w", err) - } - // Reset the scratch root each run — leftover state from a prior - // crashed run would otherwise leak into this one's fixtures. - if err := os.RemoveAll(scratchRoot); err != nil { - return fmt.Errorf("clean scratch root: %w", err) - } - if err := os.MkdirAll(scratchRoot, 0o755); err != nil { - return fmt.Errorf("mkdir scratch root: %w", err) - } - - rt, err := os.MkdirTemp(scratchRoot, "runtime-") - if err != nil { - return fmt.Errorf("mktemp runtime: %w", err) - } - runtimeDir = rt - - u, err := user.Current() - if err != nil { - return fmt.Errorf("user.Current: %w", err) - } - smokeOwner = u.Username - - return nil -} - -// installPreamble mirrors scripts/smoke.sh's install_preamble. Refuses to -// overwrite a non-smoke install, otherwise installs the instrumented -// services, runs doctor, drops the smoke-tuned config, and restarts. -func installPreamble() error { - if installExists() { - if markerExists() { - fmt.Fprintln(os.Stderr, "[smoke] found stale smoke-owned install; purging it first") - _ = exec.Command("sudo", "env", "GOCOVERDIR="+coverDir, bangerBin, - "system", "uninstall", "--purge").Run() - } else { - return errors.New("banger is already installed on this host; supported-path smoke refuses to overwrite a non-smoke install") - } - } - - // Wipe the user-side known_hosts. Fresh VMs reuse guest IPs with - // new host keys every run; a stale entry trips StrictHostKeyChecking. - // scripts/smoke.sh:374-380 explains why this is host-side, not - // daemon-side state. - if home, err := os.UserHomeDir(); err == nil { - _ = os.Remove(filepath.Join(home, ".local", "state", "banger", "ssh", "known_hosts")) - } - - fmt.Fprintln(os.Stderr, "[smoke] installing smoke-owned services") - install := exec.Command("sudo", "env", - "GOCOVERDIR="+coverDir, - "BANGER_SYSTEM_GOCOVERDIR="+serviceCoverDir, - "BANGER_ROOT_HELPER_GOCOVERDIR="+serviceCoverDir, - bangerBin, "system", "install", "--owner", smokeOwner, - ) - if out, err := install.CombinedOutput(); err != nil { - return fmt.Errorf("system install: %w\n%s", err, out) - } - if out, err := exec.Command("sudo", "touch", smokeMarker).CombinedOutput(); err != nil { - return fmt.Errorf("touch smoke marker: %w\n%s", err, out) - } - - if err := assertServicesActive("after install"); err != nil { - return err - } - - fmt.Fprintln(os.Stderr, "[smoke] doctor: checking host readiness") - if out, err := exec.Command(bangerBin, "doctor").CombinedOutput(); err != nil { - return fmt.Errorf("doctor reported failures; fix the host before running smoke:\n%s", out) - } - - fmt.Fprintln(os.Stderr, "[smoke] writing smoke-tuned daemon config") - if err := writeSmokeConfig(); err != nil { - return err - } - - fmt.Fprintln(os.Stderr, "[smoke] system restart: services should come back cleanly") - restart := exec.Command("sudo", "env", "GOCOVERDIR="+coverDir, - bangerBin, "system", "restart") - if out, err := restart.CombinedOutput(); err != nil { - return fmt.Errorf("system restart: %w\n%s", err, out) - } - return assertServicesActive("after restart") -} - -// installExists checks /etc/banger/install.toml under sudo (the dir is -// not always world-readable). -func installExists() bool { - return exec.Command("sudo", "test", "-f", "/etc/banger/install.toml").Run() == nil -} - -func markerExists() bool { - return exec.Command("sudo", "test", "-f", smokeMarker).Run() == nil -} - -var ( - statusOwnerRE = regexp.MustCompile(`(?m)^active\s+active\b`) - statusHelperRE = regexp.MustCompile(`(?m)^helper_active\s+active\b`) -) - -func assertServicesActive(label string) error { - out, err := exec.Command(bangerBin, "system", "status").CombinedOutput() - if err != nil { - return fmt.Errorf("system status %s: %w\n%s", label, err, out) - } - if !statusOwnerRE.Match(out) { - return fmt.Errorf("owner daemon not active %s:\n%s", label, out) - } - if !statusHelperRE.Match(out) { - return fmt.Errorf("root helper not active %s:\n%s", label, out) - } - return nil -} - -// writeSmokeConfig drops smokeConfigTOML at /etc/banger/config.toml via -// `sudo tee`. tee is the path of least resistance for "write to a root- -// owned file from a non-root process". -func writeSmokeConfig() error { - cmd := exec.Command("sudo", "tee", "/etc/banger/config.toml") - cmd.Stdin = strings.NewReader(smokeConfigTOML) - cmd.Stdout = io.Discard - cmd.Stderr = os.Stderr - if err := cmd.Run(); err != nil { - return fmt.Errorf("write smoke config: %w", err) - } - return nil -} - -// teardown is the equivalent of scripts/smoke.sh's `cleanup` trap. It -// best-efforts every step — partial failures during teardown should -// not mask the test outcome. -func teardown() { - shutdownReleaseServer() - stopServicesForCoverage() - collectServiceCoverage() - _ = exec.Command("sudo", "env", "GOCOVERDIR="+coverDir, bangerBin, - "system", "uninstall", "--purge").Run() - _ = os.RemoveAll(scratchRoot) -} - -func stopServicesForCoverage() { - _ = exec.Command("sudo", "systemctl", "stop", ownerService, rootService).Run() -} - -// collectServiceCoverage copies covmeta.* / covcounters.* out of -// /var/lib/banger into BANGER_SMOKE_COVER_DIR, chowning to the test -// user so subsequent `go tool covdata` invocations can read them. -// Mirrors the inline `sudo bash -lc '...'` in scripts/smoke.sh:307-325. -func collectServiceCoverage() { - uid := fmt.Sprint(os.Getuid()) - gid := fmt.Sprint(os.Getgid()) - const script = ` -shopt -s nullglob -for file in "$1"/covmeta.* "$1"/covcounters.*; do - base="${file##*/}" - cp "$file" "$2/$base" - chown "$3:$4" "$2/$base" - chmod 0644 "$2/$base" -done -` - _ = exec.Command("sudo", "bash", "-c", script, "bash", - serviceCoverDir, coverDir, uid, gid).Run() -} diff --git a/internal/smoketest/smoke_test.go b/internal/smoketest/smoke_test.go deleted file mode 100644 index 53544b7..0000000 --- a/internal/smoketest/smoke_test.go +++ /dev/null @@ -1,72 +0,0 @@ -//go:build smoke - -package smoketest - -import "testing" - -// TestSmoke is the single top-level test that pins run-order across -// scenario classes: -// -// - "pool" runs pure scenarios concurrently (each calls t.Parallel) -// alongside the repodir chain, which runs its own subtests -// sequentially. The pool subtest only returns once every t.Parallel -// child has finished. -// - "global" runs after pool, serially, in registry order. These -// scenarios assert host-wide state (iptables, vm row counts, -// ssh-config under a fake HOME, the update / rollback flow, daemon -// stop) and would race with the parallel pool. -// -// `go test -parallel N` controls fan-out within the pool. `-run -// TestSmoke/pool/bare_run` runs a single scenario without changing -// the install preamble path. -func TestSmoke(t *testing.T) { - t.Run("pool", func(t *testing.T) { - // Pure scenarios — t.Parallel inside each, fan out under -parallel. - t.Run("bare_run", testBareRun) - t.Run("exit_code", testExitCode) - t.Run("concurrent_run", testConcurrentRun) - t.Run("detach_run", testDetachRun) - t.Run("bootstrap_precondition", testBootstrapPrecondition) - t.Run("vm_lifecycle", testVMLifecycle) - t.Run("vm_set", testVMSet) - t.Run("vm_restart", testVMRestart) - t.Run("vm_kill", testVMKill) - t.Run("vm_ports", testVMPorts) - t.Run("ssh_config", testSSHConfig) - - // Repodir chain — single virtual job in the pool. Subtests run - // sequentially because they share the throwaway git repo at - // repoDir and mutate it; t.Parallel() is intentionally absent. - // The chain itself competes with the pure scenarios for a - // parallel slot at this outer level. - t.Run("repodir_chain", func(t *testing.T) { - t.Parallel() - t.Run("workspace_run", testWorkspaceRun) - t.Run("workspace_dryrun", testWorkspaceDryrun) - t.Run("include_untracked", testIncludeUntracked) - t.Run("workspace_export", testWorkspaceExport) - t.Run("workspace_full_copy", testWorkspaceFullCopy) - t.Run("workspace_basecommit", testWorkspaceBasecommit) - t.Run("workspace_restart", testWorkspaceRestart) - t.Run("vm_exec", testVMExec) - }) - }) - - // Global scenarios — serial, after the pool drains. Order matters: - // daemon_admin tears the installed services down and must be LAST. - // The order otherwise mirrors scripts/smoke.sh's SMOKE_SCENARIOS - // registry so the run shape is comparable. - t.Run("global", func(t *testing.T) { - t.Run("vm_prune", testVMPrune) - t.Run("nat", testNAT) - t.Run("invalid_spec", testInvalidSpec) - t.Run("invalid_name", testInvalidName) - t.Run("update_check", testUpdateCheck) - t.Run("update_to_unknown", testUpdateToUnknown) - t.Run("update_no_root", testUpdateNoRoot) - t.Run("update_dry_run", testUpdateDryRun) - t.Run("update_keeps_vm_alive", testUpdateKeepsVMAlive) - t.Run("update_rollback_keeps_vm_alive", testUpdateRollbackKeepsVMAlive) - t.Run("daemon_admin", testDaemonAdmin) - }) -} diff --git a/internal/updater/manifest.go b/internal/updater/manifest.go index 1ae35d0..96156f8 100644 --- a/internal/updater/manifest.go +++ b/internal/updater/manifest.go @@ -75,23 +75,15 @@ type Release struct { // Release. const ManifestSchemaVersion = 1 -// FetchManifest downloads the release manifest from the embedded -// canonical URL and validates its shape. Returns an error if the -// server is unreachable, returns non-2xx, exceeds the size cap, or -// the schema_version is newer than this CLI knows. +// FetchManifest downloads the release manifest and validates its +// shape. Returns an error if the server is unreachable, returns +// non-2xx, exceeds the size cap, or the schema_version is newer +// than this CLI knows. func FetchManifest(ctx context.Context, client *http.Client) (Manifest, error) { - return FetchManifestFrom(ctx, client, manifestURL) -} - -// FetchManifestFrom is FetchManifest against an explicit URL. Used by -// the smoke suite (via `banger update --manifest-url …`) to drive the -// updater against a locally-served fake manifest. Production callers -// stick with FetchManifest. -func FetchManifestFrom(ctx context.Context, client *http.Client, url string) (Manifest, error) { if client == nil { client = http.DefaultClient } - req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + req, err := http.NewRequestWithContext(ctx, http.MethodGet, manifestURL, nil) if err != nil { return Manifest{}, err } diff --git a/internal/updater/verify_signature.go b/internal/updater/verify_signature.go index d2a9985..e239743 100644 --- a/internal/updater/verify_signature.go +++ b/internal/updater/verify_signature.go @@ -61,26 +61,18 @@ var ErrSignatureRequired = errors.New("banger release public key is the placehol // VerifyBlobSignature checks that sigBase64 is a valid cosign-blob // signature over body, made with the private counterpart of -// BangerReleasePublicKey. +// BangerReleasePublicKey. cosign's blob signature format is a +// base64-encoded ASN.1-DER ECDSA signature over SHA256(body) — that's +// what the package's ecdsa.VerifyASN1 verifies natively. +// +// Refuses outright if the embedded public key is still the build- +// time placeholder, so an unset key can't slip through as +// "verification disabled." func VerifyBlobSignature(body, sigBase64 []byte) error { - return VerifyBlobSignatureWithKey(body, sigBase64, BangerReleasePublicKey) -} - -// VerifyBlobSignatureWithKey is VerifyBlobSignature against an -// explicit PEM-encoded public key. Used by the smoke suite (via -// `banger update --pubkey-file …`) so an end-to-end update test can -// trust a locally-generated keypair without rebuilding the binary. -// -// Refuses outright if pubKeyPEM is the build-time placeholder so an -// unset key can't slip through as "verification disabled". -// -// cosign's blob signature format is a base64-encoded ASN.1-DER ECDSA -// signature over SHA256(body) — that's what ecdsa.VerifyASN1 takes. -func VerifyBlobSignatureWithKey(body, sigBase64 []byte, pubKeyPEM string) error { - if isPlaceholderKey(pubKeyPEM) { + if isPlaceholderKey(BangerReleasePublicKey) { return ErrSignatureRequired } - block, _ := pem.Decode([]byte(pubKeyPEM)) + block, _ := pem.Decode([]byte(BangerReleasePublicKey)) if block == nil { return fmt.Errorf("decode banger release public key: no PEM block") } @@ -104,21 +96,15 @@ func VerifyBlobSignatureWithKey(body, sigBase64 []byte, pubKeyPEM string) error } // FetchAndVerifySignature pulls the SHA256SUMS.sig URL from the -// release, downloads it (capped), and verifies it against sumsBody. -// Returns nil on a clean pass, or an error describing exactly why -// verification failed. +// release, downloads it (capped), and verifies it against +// sumsBody. Returns nil on a clean pass, or an error describing +// exactly why verification failed. // // If release.SHA256SumsSigURL is empty, treat that as "release was // not signed" — refuse rather than silently proceeding. v0.1.0 // requires every release to be cosign-signed; an unsigned release // is a manifest publishing bug we'd rather catch loudly. func FetchAndVerifySignature(ctx context.Context, client *http.Client, release Release, sumsBody []byte) error { - return FetchAndVerifySignatureWithKey(ctx, client, release, sumsBody, BangerReleasePublicKey) -} - -// FetchAndVerifySignatureWithKey is FetchAndVerifySignature against -// an explicit PEM-encoded public key. -func FetchAndVerifySignatureWithKey(ctx context.Context, client *http.Client, release Release, sumsBody []byte, pubKeyPEM string) error { if strings.TrimSpace(release.SHA256SumsSigURL) == "" { return fmt.Errorf("release %s has no sha256sums_sig_url; refusing to install an unsigned release", release.Version) } @@ -129,7 +115,7 @@ func FetchAndVerifySignatureWithKey(ctx context.Context, client *http.Client, re if err != nil { return fmt.Errorf("fetch signature: %w", err) } - if err := VerifyBlobSignatureWithKey(sumsBody, sig, pubKeyPEM); err != nil { + if err := VerifyBlobSignature(sumsBody, sig); err != nil { return fmt.Errorf("verify SHA256SUMS signature: %w", err) } return nil diff --git a/internal/updater/verify_smoke_check_test.go b/internal/updater/verify_smoke_check_test.go deleted file mode 100644 index 6929880..0000000 --- a/internal/updater/verify_smoke_check_test.go +++ /dev/null @@ -1,54 +0,0 @@ -package updater - -import ( - "os/exec" - "path/filepath" - "testing" -) - -// TestVerifyBlobSignatureWithOpenSSL is a confidence test for the -// smoke release-builder path: openssl's `dgst -sha256 -sign` produces -// the exact same encoding cosign emits for blob signatures (base64 -// ASN.1 ECDSA over SHA256(body)). If this ever stops verifying, the -// smoke update scenarios will silently skip the signature check — -// catching it here avoids a heisenbug in scripts/smoke.sh. -func TestVerifyBlobSignatureWithOpenSSL(t *testing.T) { - if _, err := exec.LookPath("openssl"); err != nil { - t.Skip("openssl not on PATH") - } - dir := t.TempDir() - keyPath := filepath.Join(dir, "cosign.key") - pubPath := filepath.Join(dir, "cosign.pub") - bodyPath := filepath.Join(dir, "body.txt") - sigPath := filepath.Join(dir, "body.sig") - - mustRun := func(name string, args ...string) { - t.Helper() - out, err := exec.Command(name, args...).CombinedOutput() - if err != nil { - t.Fatalf("%s %v: %v\n%s", name, args, err, string(out)) - } - } - - mustRun("openssl", "ecparam", "-name", "prime256v1", "-genkey", "-noout", "-out", keyPath) - mustRun("openssl", "ec", "-in", keyPath, "-pubout", "-out", pubPath) - mustRun("sh", "-c", "printf 'banger smoke release sums\n' > "+bodyPath) - mustRun("sh", "-c", "openssl dgst -sha256 -sign "+keyPath+" "+bodyPath+" | base64 -w0 > "+sigPath) - - body := readFile(t, bodyPath) - sig := readFile(t, sigPath) - pub := readFile(t, pubPath) - - if err := VerifyBlobSignatureWithKey(body, sig, string(pub)); err != nil { - t.Fatalf("VerifyBlobSignatureWithKey: %v", err) - } -} - -func readFile(t *testing.T, p string) []byte { - t.Helper() - out, err := exec.Command("cat", p).Output() - if err != nil { - t.Fatalf("read %s: %v", p, err) - } - return out -} diff --git a/scripts/install.sh b/scripts/install.sh index 9b8f0fd..a19edd5 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -168,16 +168,9 @@ About to install banger $TARGET_VERSION (requires sudo): /etc/systemd/system/bangerd.service (background daemon) /etc/systemd/system/bangerd-root.service (privileged helper) -banger needs your permission to: - - • set up VM networking (bridges, NAT, DNS routing for .vm) - • manage VM storage (rootfs snapshots, loop devices, image files) - • launch and stop firecracker processes under jailer isolation - • install the binaries to /usr/local and the systemd units above - -Once installed, day-to-day commands like 'banger vm run' and -'banger image pull' run as you. Only the narrow set of operations -above goes through the privileged helper service. +Why sudo: banger needs permission to automatically manage network +access for the VMs you launch. The privileged work runs in a small +helper service; the rest runs as you. For details, see: $TRUST_DOC_URL @@ -227,8 +220,8 @@ banger $TARGET_VERSION installed. Next steps: banger doctor # confirm host readiness + banger image pull debian-bookworm # fetch a default image banger vm run # boot a sandbox - banger ssh-config --install # optional: enable 'ssh .vm' Updates land via: banger update --check diff --git a/scripts/smoke.sh b/scripts/smoke.sh new file mode 100644 index 0000000..0df7744 --- /dev/null +++ b/scripts/smoke.sh @@ -0,0 +1,1038 @@ +#!/usr/bin/env bash +# +# scripts/smoke.sh — end-to-end smoke suite for banger's supported +# two-service systemd model. +# +# Installs instrumented binaries as temporary bangerd.service + +# bangerd-root.service, drives real Firecracker/KVM scenarios, collects +# covdata from both services plus the CLI, then purges the smoke-owned +# install on exit. +# +# Because the supported path is global host state, smoke refuses to +# overwrite a pre-existing non-smoke install. If a prior smoke crashed, +# rerun `make smoke-clean` or `make smoke`; the smoke marker lets the +# harness purge only its own stale install safely. +# +# Scratch files live under $BANGER_SMOKE_XDG_DIR (historic name kept for +# make-compat). Service state uses the real supported system paths and is +# purged by the smoke cleanup path. +# +# Usage: +# scripts/smoke.sh # full suite, serial +# scripts/smoke.sh --list # cheap discovery, no install +# scripts/smoke.sh --scenario NAME # single scenario +# scripts/smoke.sh --scenario a,b,c # comma list, registry order +# scripts/smoke.sh --jobs N # parallel dispatch (default 1) +# scripts/smoke.sh -h | --help # this help +# +# Exit codes: +# 0 success +# 1 assertion failed +# 2 usage error (unknown scenario, bad flag) +# 77 scenario explicitly selected but env can't run it (autotools "skip") + +set -euo pipefail + +log() { printf '[smoke] %s\n' "$*" >&2; } +die() { printf '[smoke] FAIL: %s\n' "$*" >&2; exit 1; } +usage_die() { printf '[smoke] usage: %s\n' "$*" >&2; exit 2; } + +wait_for_ssh() { + local vm="$1" + local deadline=$(( $(date +%s) + 60 )) + while (( $(date +%s) < deadline )); do + if "$BANGER" vm ssh "$vm" -- true >/dev/null 2>&1; then + return 0 + fi + sleep 1 + done + return 1 +} + +# --------------------------------------------------------------------- +# Scenario registry. Order in SMOKE_SCENARIOS is the run order for full +# suite mode and the order shown in --list. Class drives parallelism: +# pure — independent VMs, parallel-safe +# repodir — share $repodir mutations; serial chain in registry order +# global — assert host-global state (iptables, vm row counts, ssh-config +# on a fake HOME); run serially after everything else +# Names are bash function suffixes — `scenario_` must exist. +# --------------------------------------------------------------------- +SMOKE_SCENARIOS=( + bare_run + workspace_run + exit_code + workspace_dryrun + include_untracked + workspace_export + concurrent_run + vm_lifecycle + vm_set + vm_restart + vm_kill + vm_prune + vm_ports + workspace_full_copy + workspace_basecommit + workspace_restart + vm_exec + ssh_config + nat + invalid_spec + invalid_name +) + +declare -A SMOKE_DESCS=( + [bare_run]="bare vm run: create + start + ssh + echo + --rm" + [workspace_run]="workspace vm run: ship git repo, read file in guest" + [exit_code]="exit-code propagation: guest sh -c 'exit 42' returns rc=42" + [workspace_dryrun]="workspace dry-run: list tracked files without a VM" + [include_untracked]="--include-untracked ships files outside the git index" + [workspace_export]="workspace export round-trip: guest edit -> patch marker" + [concurrent_run]="two parallel --rm invocations both succeed" + [vm_lifecycle]="explicit create / stop / start / ssh / delete" + [vm_set]="reconfigure vcpu while stopped; guest sees new count" + [vm_restart]="restart verb: boot_id changes" + [vm_kill]="vm kill --signal KILL: stopped, no leaked dm device" + [vm_prune]="prune -f removes stopped VMs, preserves running ones" + [vm_ports]="vm ports: sshd :22 visible via VM DNS name" + [workspace_full_copy]="workspace prepare --mode full_copy: alternate transfer path" + [workspace_basecommit]="workspace export --base-commit: guest commits captured" + [workspace_restart]="workspace prepare -> stop -> start preserves marker" + [vm_exec]="vm exec: auto-cd, exit-code, stale-warn, --auto-prepare resync" + [ssh_config]="ssh-config --install / --uninstall: idempotent, HOME-isolated" + [nat]="--nat installs per-VM MASQUERADE; control VM does not" + [invalid_spec]="--vcpu 0 rejected, no VM row leaked" + [invalid_name]="bad names (uppercase/space/dot/leading-hyphen) all rejected" +) + +declare -A SMOKE_CLASS=( + [bare_run]=pure + [workspace_run]=repodir + [exit_code]=pure + [workspace_dryrun]=repodir + [include_untracked]=repodir + [workspace_export]=repodir + [concurrent_run]=pure + [vm_lifecycle]=pure + [vm_set]=pure + [vm_restart]=pure + [vm_kill]=pure + [vm_prune]=global + [vm_ports]=pure + [workspace_full_copy]=repodir + [workspace_basecommit]=repodir + [workspace_restart]=repodir + [vm_exec]=repodir + [ssh_config]=pure + [nat]=global + [invalid_spec]=global + [invalid_name]=global +) + +usage() { + cat <<'EOF' +scripts/smoke.sh — banger end-to-end smoke suite + +Usage: + scripts/smoke.sh run the full suite (serial) + scripts/smoke.sh --list list all scenarios (no install) + scripts/smoke.sh --scenario NAME run a single scenario + scripts/smoke.sh --scenario a,b,c run a comma-separated list + scripts/smoke.sh --jobs N parallel dispatch (default 1) + scripts/smoke.sh -h | --help this help + +Notes: + --list works on a fresh checkout — no sudo, no KVM, no smoke-build. + --jobs N caps at min(N, 8). Smoke-tuned VMs default to 1 GiB RAM / + 2 GiB work disk, so 8 parallel slots fit comfortably on most hosts. + Scenarios in the 'repodir' class share fixture mutations and run as + a serial chain regardless of --jobs. Scenarios in 'global' (vm prune, + NAT, invalid-spec/name) run serially after the parallel pool because + they assert host-wide state. + +Exit codes: 0 ok, 1 fail, 2 usage error, 77 explicit selection skipped. +EOF +} + +list_scenarios() { + local name + for name in "${SMOKE_SCENARIOS[@]}"; do + printf ' %-22s %s\n' "$name" "${SMOKE_DESCS[$name]}" + done +} + +# --------------------------------------------------------------------- +# Argument parsing. Done before env-var checks so --list / --help work +# on a fresh checkout, and so a typo in --scenario fails before we +# touch sudo / system install. +# --------------------------------------------------------------------- +SMOKE_LIST=0 +SMOKE_FILTER="" +SMOKE_EXPLICIT=0 +SMOKE_JOBS=1 + +while (( $# > 0 )); do + case "$1" in + --list) + SMOKE_LIST=1; shift ;; + --scenario) + [[ $# -ge 2 ]] || usage_die "--scenario requires a name (see --list)" + SMOKE_FILTER="$2"; SMOKE_EXPLICIT=1; shift 2 ;; + --scenario=*) + SMOKE_FILTER="${1#--scenario=}"; SMOKE_EXPLICIT=1; shift ;; + --jobs) + [[ $# -ge 2 ]] || usage_die "--jobs requires N" + SMOKE_JOBS="$2"; shift 2 ;; + --jobs=*) + SMOKE_JOBS="${1#--jobs=}"; shift ;; + -h|--help) + usage; exit 0 ;; + *) + usage_die "unknown argument: $1 (try --help)" ;; + esac +done + +if (( SMOKE_LIST )); then + list_scenarios + exit 0 +fi + +# Validate --jobs. +if ! [[ "$SMOKE_JOBS" =~ ^[1-9][0-9]*$ ]]; then + usage_die "--jobs must be a positive integer; got '$SMOKE_JOBS'" +fi +if (( SMOKE_JOBS > 8 )); then + log "capping --jobs at 8 (each parallel slot runs an 8 GiB VM)" + SMOKE_JOBS=8 +fi + +# Resolve --scenario filter into SMOKE_SELECTED in registry order. +SMOKE_SELECTED=() +if [[ -n "$SMOKE_FILTER" ]]; then + declare -A _requested=() + IFS=',' read -r -a _names <<<"$SMOKE_FILTER" + for name in "${_names[@]}"; do + name="${name// /}" + [[ -n "$name" ]] || continue + if [[ -z "${SMOKE_DESCS[$name]+x}" ]]; then + printf '[smoke] unknown scenario: %s\n' "$name" >&2 + printf '[smoke] available scenarios:\n' >&2 + list_scenarios >&2 + exit 2 + fi + _requested[$name]=1 + done + for name in "${SMOKE_SCENARIOS[@]}"; do + if [[ -n "${_requested[$name]+x}" ]]; then + SMOKE_SELECTED+=("$name") + fi + done + unset _requested _names +else + SMOKE_SELECTED=("${SMOKE_SCENARIOS[@]}") +fi + +if (( ${#SMOKE_SELECTED[@]} == 0 )); then + usage_die "no scenarios selected" +fi + +# --------------------------------------------------------------------- +# Env checks. Required for any scenario; not required for --list/--help. +# --------------------------------------------------------------------- +: "${BANGER_SMOKE_BIN_DIR:?must point at the instrumented binary dir, set by make smoke}" +: "${BANGER_SMOKE_COVER_DIR:?must point at the coverage dir, set by make smoke}" +: "${BANGER_SMOKE_XDG_DIR:?must point at the smoke scratch root, set by make smoke}" + +BANGER="$BANGER_SMOKE_BIN_DIR/banger" +BANGERD="$BANGER_SMOKE_BIN_DIR/bangerd" +VSOCK_AGENT="$BANGER_SMOKE_BIN_DIR/banger-vsock-agent" + +for bin in "$BANGER" "$BANGERD" "$VSOCK_AGENT"; do + [[ -x "$bin" ]] || die "binary missing or not executable: $bin" +done + +scratch_root="$BANGER_SMOKE_XDG_DIR" +runtime_dir= +repodir= +smoke_owner="$(id -un)" +smoke_marker='/etc/banger/.smoke-owned' +service_cover_dir='/var/lib/banger' +owner_service='bangerd.service' +root_service='bangerd-root.service' + +mkdir -p "$BANGER_SMOKE_COVER_DIR" +rm -rf "$scratch_root" +mkdir -p "$scratch_root" +runtime_dir="$(mktemp -d "$scratch_root/runtime-XXXXXX")" + +# The CLI binary itself is instrumented, so keep its covdata local. +export GOCOVERDIR="$BANGER_SMOKE_COVER_DIR" + +cleanup_export_vm() { + "$BANGER" vm delete smoke-export >/dev/null 2>&1 || true +} + +cleanup_prune() { + "$BANGER" vm delete smoke-prune-running >/dev/null 2>&1 || true + "$BANGER" vm delete smoke-prune-stopped >/dev/null 2>&1 || true +} + +collect_service_coverage() { + local uid gid + uid="$(id -u)" + gid="$(id -g)" + sudo bash -lc ' + set -euo pipefail + shopt -s nullglob + dst="$1" + uid="$2" + gid="$3" + src="$4" + for file in "$src"/covmeta.* "$src"/covcounters.*; do + base="${file##*/}" + cp "$file" "$dst/$base" + chown "$uid:$gid" "$dst/$base" + chmod 0644 "$dst/$base" + done + ' bash "$BANGER_SMOKE_COVER_DIR" "$uid" "$gid" "$service_cover_dir" +} + +stop_services_for_coverage() { + sudo systemctl stop "$owner_service" "$root_service" >/dev/null 2>&1 || true +} + +sudo_banger() { + sudo env GOCOVERDIR="$BANGER_SMOKE_COVER_DIR" "$@" +} + +cleanup() { + set +e + for vm in \ + smoke-lifecycle smoke-set smoke-restart smoke-kill smoke-ports smoke-fc \ + smoke-basecommit smoke-exec smoke-wsrestart smoke-nat smoke-nocnat; do + "$BANGER" vm delete "$vm" >/dev/null 2>&1 || true + done + cleanup_export_vm + cleanup_prune + stop_services_for_coverage + collect_service_coverage + sudo_banger "$BANGER" system uninstall --purge >/dev/null 2>&1 || true + rm -rf "$scratch_root" +} +trap cleanup EXIT + +install_preamble() { + if sudo test -f /etc/banger/install.toml; then + if sudo test -f "$smoke_marker"; then + log 'found stale smoke-owned install; purging it first' + sudo_banger "$BANGER" system uninstall --purge >/dev/null 2>&1 || true + else + die 'banger is already installed on this host; supported-path smoke refuses to overwrite a non-smoke install' + fi + fi + + # Wipe the user-side known_hosts. `system uninstall --purge` clears + # /var/lib/banger but the user-state known_hosts at + # ~/.local/state/banger/ssh/known_hosts is by-design left alone — it's + # the user's data, not the daemon's. Smoke creates VMs that reuse + # guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a + # leftover entry from a prior run trips StrictHostKeyChecking and + # the daemon's wait-for-ssh sees only timeouts. Removing the file + # is safe — the daemon recreates it on first connect. + rm -f "$HOME/.local/state/banger/ssh/known_hosts" 2>/dev/null || true + + log 'installing smoke-owned services' + sudo env \ + GOCOVERDIR="$BANGER_SMOKE_COVER_DIR" \ + BANGER_SYSTEM_GOCOVERDIR="$service_cover_dir" \ + BANGER_ROOT_HELPER_GOCOVERDIR="$service_cover_dir" \ + "$BANGER" system install --owner "$smoke_owner" >/dev/null \ + || die 'system install failed' + sudo touch "$smoke_marker" + + local status_out + status_out="$("$BANGER" system status)" || die 'system status failed after install' + grep -qE '^active +active' <<<"$status_out" || die "owner daemon not active after install: $status_out" + grep -qE '^helper_active +active' <<<"$status_out" || die "root helper not active after install: $status_out" + + log 'doctor: checking host readiness' + if ! "$BANGER" doctor; then + die 'doctor reported failures; fix the host before running smoke' + fi + + # Drop a smoke-tuned config in place before the restart so the + # respawned daemon picks up small VM defaults: 2 vCPU / 1 GiB RAM / + # 2 GiB work disk / 2 GiB system overlay. Smoke scenarios assert + # behaviour, not capacity — full-size 4-vCPU / 8 GiB / 8 GiB / 8 GiB + # VMs are pure overhead here, and the size matters once `--jobs` + # multiplies it across slots. `vm_set` overrides --vcpu explicitly, + # so its 2→4 reconfigure check is unaffected by this default. + log 'writing smoke-tuned daemon config' + sudo tee /etc/banger/config.toml >/dev/null <<'TOML' || die 'failed to write smoke config' +# Smoke-tuned defaults — every VM starts small unless the scenario +# overrides --vcpu / --memory / --disk-size explicitly. +[vm_defaults] +vcpu = 2 +memory_mib = 1024 +disk_size = "2G" +system_overlay_size = "2G" +TOML + + log 'system restart: services should come back cleanly' + sudo_banger "$BANGER" system restart >/dev/null || die 'system restart failed' + status_out="$("$BANGER" system status)" || die 'system status failed after restart' + grep -qE '^active +active' <<<"$status_out" || die "owner daemon not active after restart: $status_out" + grep -qE '^helper_active +active' <<<"$status_out" || die "root helper not active after restart: $status_out" +} + +# setup_fixtures builds the throwaway git repo at $repodir that every +# 'repodir'-class scenario consumes. Pulled out of scenario_workspace_run +# so single-scenario invocations (e.g. --scenario workspace_dryrun) get +# the fixture even when the scenario that historically created it is +# not selected. +setup_fixtures() { + log 'setup_fixtures: preparing throwaway git repo for repodir-class scenarios' + repodir="$runtime_dir/fake-repo" + mkdir -p "$repodir" + ( + cd "$repodir" + git init -q -b main + git config commit.gpgsign false + git config user.name smoke + git config user.email smoke@smoke + echo 'smoke-workspace-marker' > smoke-file.txt + git add . + git commit -q -m init + ) +} + +# --------------------------------------------------------------------- +# Scenario implementations. Each is a function `scenario_` that +# logs its description first and then runs assertions. Bodies are the +# pre-refactor inline blocks, modulo the workspace_run fixture move. +# --------------------------------------------------------------------- + +scenario_bare_run() { + log "${SMOKE_DESCS[bare_run]}" + local bare_out + bare_out="$("$BANGER" vm run --rm -- echo smoke-bare-ok)" || die "bare vm run exit $?" + grep -q 'smoke-bare-ok' <<<"$bare_out" || die "bare vm run stdout missing marker: $bare_out" +} + +scenario_workspace_run() { + log "${SMOKE_DESCS[workspace_run]}" + local ws_out + ws_out="$("$BANGER" vm run --rm "$repodir" -- cat /root/repo/smoke-file.txt)" || die "workspace vm run exit $?" + grep -q 'smoke-workspace-marker' <<<"$ws_out" || die "workspace vm run didn't ship smoke-file.txt: $ws_out" +} + +scenario_exit_code() { + log "${SMOKE_DESCS[exit_code]}" + local rc + set +e + "$BANGER" vm run --rm -- sh -c 'exit 42' + rc=$? + set -e + [[ "$rc" -eq 42 ]] || die "exit-code propagation: got rc=$rc, want 42" +} + +scenario_workspace_dryrun() { + log "${SMOKE_DESCS[workspace_dryrun]}" + local dry_out + dry_out="$("$BANGER" vm run --dry-run "$repodir")" || die "dry-run exit $?" + grep -q 'smoke-file.txt' <<<"$dry_out" || die "dry-run didn't list smoke-file.txt: $dry_out" + grep -q 'mode: tracked only' <<<"$dry_out" || die "dry-run mode line missing or wrong: $dry_out" +} + +scenario_include_untracked() { + log "${SMOKE_DESCS[include_untracked]}" + echo 'untracked-marker' > "$repodir/smoke-untracked.txt" + local inc_out + inc_out="$("$BANGER" vm run --rm --include-untracked "$repodir" -- cat /root/repo/smoke-untracked.txt)" || die "include-untracked vm run exit $?" + grep -q 'untracked-marker' <<<"$inc_out" || die "--include-untracked didn't ship the untracked file: $inc_out" + # Self-cleanup: scenario added an untracked file, scenario removes it. + rm -f "$repodir/smoke-untracked.txt" +} + +scenario_workspace_export() { + log "${SMOKE_DESCS[workspace_export]}" + "$BANGER" vm create --name smoke-export --image debian-bookworm >/dev/null \ + || die "export: vm create exit $?" + "$BANGER" vm workspace prepare smoke-export "$repodir" >/dev/null \ + || die "export: workspace prepare exit $?" + "$BANGER" vm ssh smoke-export -- sh -c 'echo guest-edit > /root/repo/new-guest-file.txt' \ + || die "export: guest-side file write exit $?" + local export_patch="$runtime_dir/smoke-export.diff" + "$BANGER" vm workspace export smoke-export --output "$export_patch" \ + || die "export: workspace export exit $?" + [[ -s "$export_patch" ]] || die "export: patch file empty at $export_patch" + grep -q 'new-guest-file.txt' "$export_patch" \ + || die "export: patch missing new-guest-file.txt marker (head: $(head -c 400 "$export_patch"))" + cleanup_export_vm +} + +scenario_concurrent_run() { + log "${SMOKE_DESCS[concurrent_run]}" + local tmpA="$runtime_dir/concurrent-a.out" + local tmpB="$runtime_dir/concurrent-b.out" + "$BANGER" vm run --rm -- echo smoke-concurrent-a > "$tmpA" 2>&1 & + local pidA=$! + "$BANGER" vm run --rm -- echo smoke-concurrent-b > "$tmpB" 2>&1 & + local pidB=$! + wait "$pidA" || die "concurrent VM A exited non-zero: $(cat "$tmpA")" + wait "$pidB" || die "concurrent VM B exited non-zero: $(cat "$tmpB")" + grep -q 'smoke-concurrent-a' "$tmpA" || die "concurrent VM A missing marker: $(cat "$tmpA")" + grep -q 'smoke-concurrent-b' "$tmpB" || die "concurrent VM B missing marker: $(cat "$tmpB")" +} + +scenario_vm_lifecycle() { + log "${SMOKE_DESCS[vm_lifecycle]}" + local lifecycle_name=smoke-lifecycle + local show_out ssh_out rc + "$BANGER" vm create --name "$lifecycle_name" >/dev/null || die "vm create $lifecycle_name failed" + show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after create failed" + grep -q '"state": "running"' <<<"$show_out" || die "post-create state not running: $show_out" + + wait_for_ssh "$lifecycle_name" || die 'vm lifecycle: ssh did not come up after create' + ssh_out="$("$BANGER" vm ssh "$lifecycle_name" -- echo hello-1)" || die "vm ssh #1 failed" + grep -q 'hello-1' <<<"$ssh_out" || die "vm ssh #1 missing marker: $ssh_out" + + "$BANGER" vm stop "$lifecycle_name" >/dev/null || die "vm stop failed" + show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after stop failed" + grep -q '"state": "stopped"' <<<"$show_out" || die "post-stop state not stopped: $show_out" + + "$BANGER" vm start "$lifecycle_name" >/dev/null || die "vm start (from stopped) failed" + show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after start failed" + grep -q '"state": "running"' <<<"$show_out" || die "post-start state not running: $show_out" + + wait_for_ssh "$lifecycle_name" || die 'vm lifecycle: ssh did not come up after restart' + ssh_out="$("$BANGER" vm ssh "$lifecycle_name" -- echo hello-2)" || die "vm ssh #2 (post-restart) failed" + grep -q 'hello-2' <<<"$ssh_out" || die "vm ssh #2 missing marker: $ssh_out" + + "$BANGER" vm delete "$lifecycle_name" >/dev/null || die "vm delete failed" + set +e + "$BANGER" vm show "$lifecycle_name" >/dev/null 2>&1 + rc=$? + set -e + [[ "$rc" -ne 0 ]] || die "vm show still finds $lifecycle_name after delete" +} + +scenario_vm_set() { + log "${SMOKE_DESCS[vm_set]}" + local nproc_before nproc_after rc + "$BANGER" vm create --name smoke-set --vcpu 2 >/dev/null || die 'vm set: create failed' + wait_for_ssh smoke-set || die 'vm set: initial ssh did not come up' + + set +e + nproc_before="$("$BANGER" vm ssh smoke-set -- nproc 2>/dev/null)" + rc=$? + set -e + [[ "$rc" -eq 0 ]] || die "vm set: initial nproc ssh exit $rc" + [[ "$(printf '%s' "$nproc_before" | tr -d '[:space:]')" == "2" ]] \ + || die "vm set: initial nproc got '$nproc_before', want 2" + + "$BANGER" vm stop smoke-set >/dev/null || die 'vm set: stop failed' + "$BANGER" vm set smoke-set --vcpu 4 >/dev/null || die 'vm set: reconfigure failed' + "$BANGER" vm start smoke-set >/dev/null || die 'vm set: restart failed' + wait_for_ssh smoke-set || die 'vm set: post-reconfig ssh did not come up' + + set +e + nproc_after="$("$BANGER" vm ssh smoke-set -- nproc 2>/dev/null)" + rc=$? + set -e + [[ "$rc" -eq 0 ]] || die "vm set: post-reconfig nproc ssh exit $rc" + [[ "$(printf '%s' "$nproc_after" | tr -d '[:space:]')" == "4" ]] \ + || die "vm set: post-reconfig nproc got '$nproc_after', want 4 (spec change didn't land)" + + "$BANGER" vm delete smoke-set >/dev/null || die 'vm set: delete failed' +} + +scenario_vm_restart() { + log "${SMOKE_DESCS[vm_restart]}" + local boot_before boot_after + "$BANGER" vm create --name smoke-restart >/dev/null || die 'vm restart: create failed' + wait_for_ssh smoke-restart || die 'vm restart: initial ssh never came up' + boot_before="$("$BANGER" vm ssh smoke-restart -- cat /proc/sys/kernel/random/boot_id | tr -d '[:space:]')" + [[ -n "$boot_before" ]] || die 'vm restart: could not read initial boot_id' + + "$BANGER" vm restart smoke-restart >/dev/null || die 'vm restart: verb failed' + wait_for_ssh smoke-restart || die 'vm restart: ssh did not come up after restart' + boot_after="$("$BANGER" vm ssh smoke-restart -- cat /proc/sys/kernel/random/boot_id | tr -d '[:space:]')" + [[ -n "$boot_after" ]] || die 'vm restart: could not read post-restart boot_id' + [[ "$boot_before" != "$boot_after" ]] \ + || die "vm restart: boot_id unchanged ($boot_before); verb didn't actually reboot the guest" + + "$BANGER" vm delete smoke-restart >/dev/null || die 'vm restart: delete failed' +} + +scenario_vm_kill() { + log "${SMOKE_DESCS[vm_kill]}" + local dm_name show_out + "$BANGER" vm create --name smoke-kill >/dev/null || die 'vm kill: create failed' + dm_name="$("$BANGER" vm show smoke-kill 2>/dev/null | awk -F'"' '/"dm_dev"|fc-rootfs-/ {for(i=1;i<=NF;i++) if($i~/^fc-rootfs-/) print $i}' | head -1 || true)" + "$BANGER" vm kill --signal KILL smoke-kill >/dev/null || die 'vm kill: verb failed' + show_out="$("$BANGER" vm show smoke-kill)" || die 'vm kill: show after kill failed' + grep -q '"state": "stopped"' <<<"$show_out" || die "vm kill: post-kill state not stopped: $show_out" + if [[ -n "$dm_name" ]]; then + if sudo -n dmsetup ls 2>/dev/null | awk '{print $1}' | grep -qx "$dm_name"; then + die "vm kill: dm device $dm_name still mapped (cleanup didn't run)" + fi + fi + "$BANGER" vm delete smoke-kill >/dev/null || die 'vm kill: delete failed' +} + +scenario_vm_prune() { + log "${SMOKE_DESCS[vm_prune]}" + "$BANGER" vm create --name smoke-prune-running >/dev/null || die 'vm prune: create running failed' + "$BANGER" vm create --name smoke-prune-stopped >/dev/null || die 'vm prune: create stopped failed' + "$BANGER" vm stop smoke-prune-stopped >/dev/null || die 'vm prune: stop the stopped one failed' + + "$BANGER" vm prune -f >/dev/null || die 'vm prune: verb failed' + + "$BANGER" vm show smoke-prune-running >/dev/null 2>&1 || die 'vm prune: running VM was deleted (regression!)' + if "$BANGER" vm show smoke-prune-stopped >/dev/null 2>&1; then + die 'vm prune: stopped VM survived prune' + fi + + "$BANGER" vm delete smoke-prune-running >/dev/null || die 'vm prune: cleanup delete failed' +} + +scenario_vm_ports() { + log "${SMOKE_DESCS[vm_ports]}" + local ports_out + "$BANGER" vm create --name smoke-ports >/dev/null || die 'vm ports: create failed' + wait_for_ssh smoke-ports || die 'vm ports: ssh did not come up' + + ports_out="$("$BANGER" vm ports smoke-ports 2>&1)" \ + || die "vm ports: verb failed: $ports_out" + grep -q 'smoke-ports.vm:22' <<<"$ports_out" \ + || die "vm ports: expected 'smoke-ports.vm:22' in output; got: $ports_out" + grep -q 'sshd' <<<"$ports_out" \ + || die "vm ports: expected process 'sshd' in output; got: $ports_out" + + "$BANGER" vm delete smoke-ports >/dev/null || die 'vm ports: delete failed' +} + +scenario_workspace_full_copy() { + log "${SMOKE_DESCS[workspace_full_copy]}" + local fc_out + "$BANGER" vm create --name smoke-fc >/dev/null || die 'workspace fc: create failed' + "$BANGER" vm workspace prepare smoke-fc "$repodir" --mode full_copy >/dev/null \ + || die 'workspace fc: prepare --mode full_copy failed' + fc_out="$("$BANGER" vm ssh smoke-fc -- cat /root/repo/smoke-file.txt)" \ + || die 'workspace fc: guest read failed' + grep -q 'smoke-workspace-marker' <<<"$fc_out" \ + || die "workspace fc: marker missing in full_copy workspace: $fc_out" + + "$BANGER" vm delete smoke-fc >/dev/null || die 'workspace fc: delete failed' +} + +scenario_workspace_basecommit() { + log "${SMOKE_DESCS[workspace_basecommit]}" + "$BANGER" vm create --name smoke-basecommit >/dev/null || die 'export base: create failed' + "$BANGER" vm workspace prepare smoke-basecommit "$repodir" >/dev/null \ + || die 'export base: prepare failed' + + local base_sha + base_sha="$("$BANGER" vm ssh smoke-basecommit -- sh -c 'cd /root/repo && git rev-parse HEAD' | tr -d '[:space:]')" + [[ "${#base_sha}" -eq 40 ]] || die "export base: bad base sha: $base_sha" + + "$BANGER" vm ssh smoke-basecommit -- sh -c "cd /root/repo && git -c user.email=smoke@smoke -c user.name=smoke checkout -b smoke-branch >/dev/null 2>&1 && echo committed-marker > smoke-committed.txt && git add smoke-committed.txt && git -c user.email=smoke@smoke -c user.name=smoke commit -q -m 'guest side'" \ + || die 'export base: guest-side commit failed' + + local plain_patch="$runtime_dir/smoke-plain.diff" + "$BANGER" vm workspace export smoke-basecommit --output "$plain_patch" \ + || die 'export base: plain export failed' + if [[ -f "$plain_patch" ]] && grep -q 'smoke-committed.txt' "$plain_patch"; then + die 'export base: plain export unexpectedly captured the guest-side commit' + fi + + local base_patch="$runtime_dir/smoke-base.diff" + "$BANGER" vm workspace export smoke-basecommit --base-commit "$base_sha" --output "$base_patch" \ + || die 'export base: --base-commit export failed' + [[ -s "$base_patch" ]] || die 'export base: patch file empty' + grep -q 'smoke-committed.txt' "$base_patch" \ + || die "export base: --base-commit patch missing committed marker (head: $(head -c 400 "$base_patch"))" + + "$BANGER" vm delete smoke-basecommit >/dev/null || die 'export base: delete failed' +} + +scenario_workspace_restart() { + log "${SMOKE_DESCS[workspace_restart]}" + "$BANGER" vm create --name smoke-wsrestart >/dev/null \ + || die 'workspace stop/start: create failed' + "$BANGER" vm workspace prepare smoke-wsrestart "$repodir" >/dev/null \ + || die 'workspace stop/start: prepare failed' + + # Sanity: marker is present before the stop/start cycle. + local pre_out + pre_out="$("$BANGER" vm ssh smoke-wsrestart -- cat /root/repo/smoke-file.txt)" \ + || die 'workspace stop/start: pre-cycle ssh read failed' + grep -q 'smoke-workspace-marker' <<<"$pre_out" \ + || die "workspace stop/start: marker missing pre-cycle: $pre_out" + + "$BANGER" vm stop smoke-wsrestart >/dev/null \ + || die 'workspace stop/start: stop failed' + "$BANGER" vm start smoke-wsrestart >/dev/null \ + || die 'workspace stop/start: start after stop failed (rootfs corrupt?)' + wait_for_ssh smoke-wsrestart \ + || die 'workspace stop/start: ssh did not come up after restart' + + local post_out + post_out="$("$BANGER" vm ssh smoke-wsrestart -- cat /root/repo/smoke-file.txt)" \ + || die 'workspace stop/start: post-cycle ssh read failed' + grep -q 'smoke-workspace-marker' <<<"$post_out" \ + || die "workspace stop/start: marker lost across stop/start: $post_out" + + "$BANGER" vm delete smoke-wsrestart >/dev/null \ + || die 'workspace stop/start: delete failed' +} + +scenario_vm_exec() { + log "${SMOKE_DESCS[vm_exec]}" + local show_out exec_cat exec_pwd rc + "$BANGER" vm create --name smoke-exec >/dev/null || die 'vm exec: create failed' + "$BANGER" vm workspace prepare smoke-exec "$repodir" >/dev/null \ + || die 'vm exec: workspace prepare failed' + + # WORKSPACE column populated in vm show after prepare. + show_out="$("$BANGER" vm show smoke-exec)" || die 'vm exec: vm show after prepare failed' + grep -q '"guest_path": "/root/repo"' <<<"$show_out" \ + || die "vm exec: workspace.guest_path not persisted on VM record: $show_out" + + # Basic happy path: cd happens, file is read from the workspace. + exec_cat="$("$BANGER" vm exec smoke-exec -- cat smoke-file.txt)" \ + || die "vm exec: cat smoke-file.txt failed" + grep -q 'smoke-workspace-marker' <<<"$exec_cat" \ + || die "vm exec: stdout missing workspace marker: $exec_cat" + + # pwd confirms the auto-cd into the prepared guest path. + exec_pwd="$("$BANGER" vm exec smoke-exec -- pwd | tr -d '[:space:]')" \ + || die 'vm exec: pwd failed' + [[ "$exec_pwd" == "/root/repo" ]] \ + || die "vm exec: pwd got '$exec_pwd', want '/root/repo' (auto-cd didn't happen)" + + # Exit-code propagation: 17 must come back as 17, verbatim. + set +e + "$BANGER" vm exec smoke-exec -- sh -c 'exit 17' >/dev/null 2>&1 + rc=$? + set -e + [[ "$rc" -eq 17 ]] || die "vm exec: exit-code propagation got rc=$rc, want 17" + + # Dirty detection: advance host HEAD, run `vm exec` without --auto-prepare, + # expect a stale-workspace warning on stderr and the new file NOT present in + # the guest (workspace was not re-synced). + ( + cd "$repodir" + echo 'post-prepare-marker' > smoke-exec-new.txt + git add smoke-exec-new.txt + git commit -q -m 'add smoke-exec-new.txt after prepare' + ) + local stale_stderr="$runtime_dir/smoke-exec-stale.err" + local ls_rc + set +e + "$BANGER" vm exec smoke-exec -- ls smoke-exec-new.txt >/dev/null 2>"$stale_stderr" + ls_rc=$? + set -e + [[ "$ls_rc" -ne 0 ]] \ + || die 'vm exec: stale workspace unexpectedly already had the new file (dirty path didn'"'"'t take effect)' + grep -q 'workspace stale' "$stale_stderr" \ + || die "vm exec: stale-workspace warning missing on stderr; got: $(cat "$stale_stderr")" + grep -q -- '--auto-prepare' "$stale_stderr" \ + || die "vm exec: stale warning didn't mention --auto-prepare hint; got: $(cat "$stale_stderr")" + + # --auto-prepare: re-syncs workspace, then runs the command. New file appears. + local auto_out + auto_out="$("$BANGER" vm exec smoke-exec --auto-prepare -- cat smoke-exec-new.txt)" \ + || die 'vm exec: --auto-prepare run failed' + grep -q 'post-prepare-marker' <<<"$auto_out" \ + || die "vm exec: --auto-prepare didn't re-sync new file; got: $auto_out" + + # After auto-prepare, the warning must NOT reappear on the next exec — + # stored HEAD should now match the host. + local clean_stderr="$runtime_dir/smoke-exec-clean.err" + "$BANGER" vm exec smoke-exec -- true 2>"$clean_stderr" \ + || die 'vm exec: post-auto-prepare exec failed' + if grep -q 'workspace stale' "$clean_stderr"; then + die "vm exec: stale warning persisted after --auto-prepare; got: $(cat "$clean_stderr")" + fi + + # Self-cleanup: scenario added a host-side commit, scenario rolls it back + # so downstream repodir-class scenarios see the original tree. + ( + cd "$repodir" + git reset --hard HEAD~1 -q + ) + + # Refusal when VM is not running: exec on a stopped VM must error out + # with a clear "not running" message. Done last so we can delete from + # the stopped state without needing a restart. + "$BANGER" vm stop smoke-exec >/dev/null || die 'vm exec: stop for not-running test failed' + local stopped_err + set +e + stopped_err="$("$BANGER" vm exec smoke-exec -- true 2>&1)" + rc=$? + set -e + [[ "$rc" -ne 0 ]] || die 'vm exec: exec on stopped VM unexpectedly succeeded' + grep -q 'not running' <<<"$stopped_err" \ + || die "vm exec: stopped-VM error missing 'not running' phrase: $stopped_err" + + "$BANGER" vm delete smoke-exec >/dev/null || die 'vm exec: delete failed' +} + +scenario_ssh_config() { + log "${SMOKE_DESCS[ssh_config]}" + local fake_home="$scratch_root/fake-home" + mkdir -p "$fake_home/.ssh" + printf 'Host myserver\n HostName example.invalid\n' > "$fake_home/.ssh/config" + + ( + export HOME="$fake_home" + "$BANGER" ssh-config --install >/dev/null || die 'ssh-config: install failed' + grep -q '^Include ' "$fake_home/.ssh/config" \ + || die "ssh-config: install didn't add Include line to ~/.ssh/config" + grep -q '^Host myserver' "$fake_home/.ssh/config" \ + || die 'ssh-config: install clobbered pre-existing content (!!)' + + "$BANGER" ssh-config --install >/dev/null || die 'ssh-config: second install failed' + local include_count + include_count="$(grep -c '^Include .*banger' "$fake_home/.ssh/config")" + [[ "$include_count" == "1" ]] \ + || die "ssh-config: install not idempotent (Include appeared $include_count times)" + + "$BANGER" ssh-config --uninstall >/dev/null || die 'ssh-config: uninstall failed' + if grep -q '^Include .*banger' "$fake_home/.ssh/config"; then + die 'ssh-config: uninstall left the Include line behind' + fi + grep -q '^Host myserver' "$fake_home/.ssh/config" \ + || die 'ssh-config: uninstall nuked user content (!!)' + ) +} + +scenario_nat() { + log "${SMOKE_DESCS[nat]}" + if ! sudo -n iptables -t nat -S POSTROUTING >/dev/null 2>&1; then + # Env-skip semantics: + # - implicit (no --scenario, or mixed --scenario list): soft-skip. + # - explicit (only "nat" selected): exit 77 to distinguish from + # a real failure for callers that care. + if (( SMOKE_EXPLICIT == 1 )) && (( ${#SMOKE_SELECTED[@]} == 1 )) \ + && [[ "${SMOKE_SELECTED[0]}" == "nat" ]]; then + log 'NAT: passwordless sudo iptables unavailable; explicit selection — exiting 77 (autotools skip)' + exit 77 + fi + log 'NAT: skipping — passwordless sudo iptables unavailable' + return 0 + fi + + "$BANGER" vm create --name smoke-nat --nat >/dev/null || die 'NAT: create --nat failed' + "$BANGER" vm create --name smoke-nocnat >/dev/null || die 'NAT: control create failed' + + local nat_ip ctl_ip postrouting rule_count + nat_ip="$("$BANGER" vm show smoke-nat 2>/dev/null | awk -F'"' '/"guest_ip"/ {print $4}')" + ctl_ip="$("$BANGER" vm show smoke-nocnat 2>/dev/null | awk -F'"' '/"guest_ip"/ {print $4}')" + [[ -n "$nat_ip" && -n "$ctl_ip" ]] || die "NAT: couldn't read guest IPs (nat='$nat_ip', ctl='$ctl_ip')" + + postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)" + grep -q -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting" \ + || die "NAT: --nat VM has no POSTROUTING MASQUERADE rule for $nat_ip; got:"$'\n'"$postrouting" + if grep -q -- "-s $ctl_ip/32.*-j MASQUERADE" <<<"$postrouting"; then + die "NAT: control VM unexpectedly has a MASQUERADE rule for $ctl_ip" + fi + + "$BANGER" vm stop smoke-nat >/dev/null || die 'NAT: stop --nat VM failed' + "$BANGER" vm start smoke-nat >/dev/null || die 'NAT: restart --nat VM failed' + postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)" + rule_count="$(grep -c -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting" || true)" + [[ "$rule_count" == "1" ]] \ + || die "NAT: MASQUERADE rule count for $nat_ip = $rule_count after restart, want 1" + + "$BANGER" vm delete smoke-nat >/dev/null || die 'NAT: delete --nat VM failed' + "$BANGER" vm delete smoke-nocnat >/dev/null || die 'NAT: delete control VM failed' + postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)" + if grep -q -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting"; then + die "NAT: delete left a MASQUERADE rule behind for $nat_ip" + fi +} + +scenario_invalid_spec() { + log "${SMOKE_DESCS[invalid_spec]}" + local pre_vms post_vms rc + pre_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)" + set +e + "$BANGER" vm run --rm --vcpu 0 -- echo unused >/dev/null 2>&1 + rc=$? + set -e + [[ "$rc" -ne 0 ]] || die 'invalid spec: vm run succeeded despite --vcpu 0' + post_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)" + [[ "$pre_vms" == "$post_vms" ]] || die "invalid spec leaked a VM row: pre=$pre_vms, post=$post_vms" +} + +scenario_invalid_name() { + log "${SMOKE_DESCS[invalid_name]}" + local pre_vms post_vms rc + pre_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)" + for bad in 'MyBox' 'my box' 'box.vm' '-box'; do + set +e + "$BANGER" vm create --name "$bad" --no-start >/dev/null 2>&1 + rc=$? + set -e + [[ "$rc" -ne 0 ]] || die "invalid name: vm create accepted '$bad'" + done + post_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)" + [[ "$pre_vms" == "$post_vms" ]] \ + || die "invalid name leaked VM row(s): pre=$pre_vms, post=$post_vms" +} + +# --------------------------------------------------------------------- +# Dispatchers. +# --------------------------------------------------------------------- + +# run_serial calls each named scenario in-process. die() exits the +# script with rc=1 on any failure (current behavior). Stdout is +# unbuffered — identical to the pre-refactor experience. +run_serial() { + local name + for name in "$@"; do + "scenario_$name" + done +} + +# run_repodir_chain runs the repodir scenarios serially (registry order) +# inside a subshell so it can be backgrounded as one virtual job in the +# parallel pool. Buffered stdout/stderr go to one logfile. +run_repodir_chain() { + local logfile="$runtime_dir/parallel-repodir.log" + local rc=0 + ( + local name + for name in "$@"; do + "scenario_$name" || exit 1 + done + ) >"$logfile" 2>&1 || rc=$? + return $rc +} + +# run_one_buffered runs a single scenario in a subshell with stdout/stderr +# captured to a per-scenario logfile. On failure the buffer is dumped on +# the main stderr; on success only the one-line PASS is shown. +run_one_buffered() { + local name=$1 + local logfile="$runtime_dir/parallel-$name.log" + local rc=0 + ( "scenario_$name" ) >"$logfile" 2>&1 || rc=$? + if (( rc == 0 )); then + printf '[smoke] %s: PASS\n' "$name" >&2 + else + printf '[smoke] %s: FAIL (rc=%d)\n' "$name" "$rc" >&2 + sed 's/^/[smoke:'"$name"'] /' "$logfile" >&2 + fi + return $rc +} + +# run_parallel splits the selection into pure singletons + a single fused +# repodir chain (if any), runs them all in a slot-limited pool, then +# runs global scenarios serially in registry order. Reports per-scenario +# outcomes; final exit is non-zero iff any sub-job failed. +run_parallel() { + local jobs=$1; shift + local selected=("$@") + + local pure=() repodir_chain=() global=() + local name + for name in "${selected[@]}"; do + case "${SMOKE_CLASS[$name]}" in + pure) pure+=("$name") ;; + repodir) repodir_chain+=("$name") ;; + global) global+=("$name") ;; + esac + done + + # Build the parallel-pool job list. The repodir chain (if any) is one + # virtual job — it runs its scenarios serially inside a subshell and + # competes with pure scenarios for a slot. + local pool=() + for name in "${pure[@]}"; do + pool+=("pure:$name") + done + if (( ${#repodir_chain[@]} > 0 )); then + pool+=("repodir:$(IFS=' '; echo "${repodir_chain[*]}")") + fi + + log "parallel pool: ${#pool[@]} job(s), ${#global[@]} global; jobs=$jobs" + + declare -A pid_kind=() + declare -A pid_label=() + local active=0 + local failures=0 + + local job kind payload + for job in "${pool[@]}"; do + kind="${job%%:*}" + payload="${job#*:}" + while (( active >= jobs )); do + if ! wait -n; then + failures=$(( failures + 1 )) + fi + active=$(( active - 1 )) + done + if [[ "$kind" == "pure" ]]; then + run_one_buffered "$payload" & + else + # repodir chain: payload is a space-separated list of names + # shellcheck disable=SC2086 + ( run_repodir_chain $payload ) & + local p=$! + pid_kind[$p]=repodir + pid_label[$p]="$payload" + fi + active=$(( active + 1 )) + done + + # Drain remaining jobs. + while (( active > 0 )); do + if ! wait -n; then + failures=$(( failures + 1 )) + fi + active=$(( active - 1 )) + done + + # Emit a one-line report for the repodir chain if it ran. + if (( ${#repodir_chain[@]} > 0 )); then + local logfile="$runtime_dir/parallel-repodir.log" + if [[ -s "$logfile" ]]; then + log "repodir chain log:" + sed 's/^/[smoke:repodir] /' "$logfile" >&2 + fi + fi + + if (( failures > 0 )); then + log "parallel pool: $failures job(s) failed" + exit 1 + fi + + # Global scenarios: serial, in registry order, current behavior. + if (( ${#global[@]} > 0 )); then + log "global pool: ${#global[@]} scenario(s) (serial)" + run_serial "${global[@]}" + fi +} + +# --------------------------------------------------------------------- +# Main. +# --------------------------------------------------------------------- +install_preamble +setup_fixtures + +if (( SMOKE_JOBS == 1 )); then + run_serial "${SMOKE_SELECTED[@]}" +else + run_parallel "$SMOKE_JOBS" "${SMOKE_SELECTED[@]}" +fi + +if (( ${#SMOKE_SELECTED[@]} == ${#SMOKE_SCENARIOS[@]} )); then + log 'all scenarios passed' +else + log "scenario(s) passed: ${SMOKE_SELECTED[*]}" +fi