smoke: discoverable scenarios + selectable runs + parallel dispatch

`scripts/smoke.sh` was a 600-line linear script: no way to see what it
covers without reading the whole thing, and no way to run a single
scenario when iterating. Every iteration paid the full ~5-10 min suite,
which made fast feedback loops painful enough to avoid the suite.

Refactor into a registry + per-scenario functions:

- Top-of-file SMOKE_SCENARIOS (ordered) + SMOKE_DESCS (one-line desc per
  scenario) + SMOKE_CLASS (pure / repodir / global) drive both listing
  and dispatch. The 21 existing scenario blocks become scenario_<name>
  functions. Bodies are the inline blocks verbatim, modulo the workspace
  fixture move described below.
- New CLI: --list (cheap discovery, no install / no env-vars),
  --scenario NAME (or NAME,NAME,...), --jobs N (parallel dispatch),
  -h / --help.
- New setup_fixtures runs once after the install/doctor/restart preamble
  and produces the throwaway git repo at $repodir that 'repodir'-class
  scenarios consume. Lifted out of scenario_workspace_run so single-
  scenario invocations (e.g. --scenario workspace_dryrun) get the
  fixture even when the scenario that historically built it isn't
  selected.
- Wipe ~/.local/state/banger/ssh/known_hosts in the install preamble.
  `system uninstall --purge` clears /var/lib/banger but the user-side
  known_hosts persists by design — and smoke creates VMs that reuse
  guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a
  leftover entry trips StrictHostKeyChecking and the daemon's wait-
  for-ssh sees only timeouts. This was the real cause of the "guest
  ssh did not come up" flakes that surface across smoke iterations.

Parallel dispatch:

- --jobs N opts into a slot-limited pool: 'pure' scenarios fan out as
  individual jobs; 'repodir' scenarios fuse into a single serial chain
  (since they mutate $repodir in registry order); 'global' scenarios
  run serially after the pool, one at a time.
- Cap is min(N, 8) — each parallel slot runs an 8 GiB VM, so RAM is
  the binding constraint.
- Parallel-mode stdout/stderr per scenario buffer to per-scenario
  logs and emit one PASS/FAIL line on completion; on FAIL the buffer
  is dumped. Serial mode (--jobs 1, the default) keeps stdout
  unbuffered exactly as before.
- Parallelism is documented as experimental in --help: it surfaces
  real daemon-side concurrency bugs (image auto-pull manifest race,
  work-seed-refresh race on the shared work-seed.ext4) that don't
  appear in serial mode and that need their own fix in the daemon.
  Serial (--jobs 1) is the reliable path; --jobs N is for fast-
  iteration dev work where occasional re-runs are acceptable.

Exit codes: 0 ok, 1 assertion failed, 2 usage error (unknown
scenario, missing SCENARIO=), 77 explicit selection skipped (NAT
when sudo iptables is unavailable AND nat is the only selected
scenario; soft-skip otherwise).

Makefile additions:

- `make smoke-list` — cheap discovery, no smoke-build dep, no env vars.
- `make smoke-one SCENARIO=name` — single-scenario run, full preamble.
  MAKECMDGOALS guard catches missing SCENARIO= before any rebuild.
- `make smoke JOBS=N` — passes through to the script's --jobs N.
- Help text covers all three.

Verified: serial full suite passes 21/21 in ~140s on this host;
make smoke-one SCENARIO=workspace_restart runs the recently-added
regression test alone in ~50s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-27 16:56:57 -03:00
parent c9358ab390
commit 115eec8576
No known key found for this signature in database
GPG key ID: 33112E6833C34679
2 changed files with 872 additions and 402 deletions

View file

@ -33,7 +33,16 @@ GO_LDFLAGS := -X banger/internal/buildinfo.Version=$(VERSION) -X banger/internal
.DEFAULT_GOAL := help
.PHONY: help build banger bangerd test fmt tidy clean install uninstall lint lint-go lint-shell coverage coverage-html coverage-total coverage-combined coverage-combined-html smoke smoke-build smoke-coverage-html smoke-clean smoke-fresh
# `make smoke-one` requires SCENARIO=. Validate before any prerequisite
# (notably smoke-build) so a typo'd invocation doesn't pay for a Go
# rebuild before learning it's wrong.
ifneq (,$(filter smoke-one,$(MAKECMDGOALS)))
ifndef SCENARIO
$(error smoke-one needs SCENARIO=name (see `make smoke-list` for names))
endif
endif
.PHONY: help build banger bangerd test fmt tidy clean install uninstall lint lint-go lint-shell coverage coverage-html coverage-total coverage-combined coverage-combined-html smoke smoke-build smoke-list smoke-one smoke-coverage-html smoke-clean smoke-fresh
help:
@printf '%s\n' \
@ -52,6 +61,9 @@ help:
' make tidy Run go mod tidy' \
' make clean Remove built Go binaries and coverage artefacts' \
' make smoke Build instrumented binaries, run the supported systemd smoke suite, report coverage (needs KVM + sudo)' \
' make smoke JOBS=N Same, but dispatch parallel-safe scenarios across N slots (1-8; default 1)' \
' make smoke-list Print the list of smoke scenarios with descriptions (no build, no install)' \
' make smoke-one SCENARIO=NAME Run a single smoke scenario (still does the install preamble)' \
' make smoke-fresh smoke-clean + smoke — purges stale smoke-owned installs before a clean supported-path run' \
' make smoke-coverage-html HTML coverage report from the last smoke run' \
' make smoke-clean Remove the smoke build tree and purge any stale smoke-owned system install'
@ -170,11 +182,28 @@ smoke: smoke-build
BANGER_SMOKE_BIN_DIR="$(abspath $(SMOKE_BIN_DIR))" \
BANGER_SMOKE_COVER_DIR="$(abspath $(SMOKE_COVER_DIR))" \
BANGER_SMOKE_XDG_DIR="$(abspath $(SMOKE_XDG_DIR))" \
bash "$(SMOKE_SCRIPT)"
bash "$(SMOKE_SCRIPT)" $(if $(JOBS),--jobs $(JOBS))
@echo ''
@echo 'Smoke coverage:'
@$(GO) tool covdata percent -i="$(SMOKE_COVER_DIR)"
# smoke-list is intentionally cheap: no smoke-build dep, no env vars.
# The script's --list path short-circuits before any side-effect or
# env validation, so this works on a fresh checkout.
smoke-list:
@bash "$(SMOKE_SCRIPT)" --list
# smoke-one runs one scenario (or a comma-separated list) with the same
# install preamble as the full suite. Useful when iterating on a specific
# scenario — see `make smoke-list` for names.
smoke-one: smoke-build
rm -rf "$(SMOKE_COVER_DIR)"
mkdir -p "$(SMOKE_COVER_DIR)" "$(SMOKE_XDG_DIR)"
BANGER_SMOKE_BIN_DIR="$(abspath $(SMOKE_BIN_DIR))" \
BANGER_SMOKE_COVER_DIR="$(abspath $(SMOKE_COVER_DIR))" \
BANGER_SMOKE_XDG_DIR="$(abspath $(SMOKE_XDG_DIR))" \
bash "$(SMOKE_SCRIPT)" --scenario "$(SCENARIO)"
smoke-coverage-html: smoke
$(GO) tool covdata textfmt -i="$(SMOKE_COVER_DIR)" -o="$(SMOKE_DIR)/cover.out"
$(GO) tool cover -html="$(SMOKE_DIR)/cover.out" -o "$(SMOKE_DIR)/cover.html"