make smoke: end-to-end boot suite with coverage from real VM runs

The unit + integration tests can't cross machine.Start — the SDK
boundary would need a fake firecracker that reimplements the
control-plane HTTP API, and the ongoing maintenance cost of keeping
that fake honest with upstream kills the value. Instead, add a
pre-release smoke target that drives REAL Firecracker + real KVM,
captures coverage from the -cover-instrumented binaries, and
surfaces per-package deltas so regressions in the boot path don't
ship silently.

scripts/smoke.sh:
  - Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't
    touch real user state (state/cache persist under build/smoke/xdg
    for fast reruns; runtime is mktemp'd fresh per-run because
    sockets can't be reused)
  - Preflight: `banger doctor` must pass; UDP :42069 must be free
    (otherwise the user's real daemon is up and the smoke daemon
    can't bind its DNS listener — fail with an actionable message)
  - Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok`
    exercises create → start → socket ownership chown → machine.Start
    → SDK waitForSocket race → vsock agent readiness → guest SSH
    wait → exec → cleanup → delete
  - Scenario 2 — workspace: creates a throwaway git repo, runs
    `banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`,
    verifies the tracked file reached the guest (exercises
    workDisk capability PrepareHost + workspace.prepare)
  - `banger daemon stop` at the end so instrumented binaries flush
    GOCOVERDIR pods before the script exits

Makefile additions:
  - smoke-build: builds banger/bangerd under build/smoke/bin/ with
    `go build -cover`
  - smoke: runs the script with GOCOVERDIR set, reports per-package
    coverage via `go tool covdata percent`
  - smoke-coverage-html: textfmt + go tool cover for a browsable
    report
  - smoke-clean: nukes build/smoke/ including the persisted XDG
    state

Bonus fix uncovered during the first smoke run: doctor treated a
missing state.db as a FAIL ("out of memory" from SQLite
SQLITE_CANTOPEN), which red-flagged every fresh install. Split
the store check: DB file absent → PASS with "will be created on
first daemon start" detail; DB present but unreadable → FAIL as
before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall
pins the behaviour.

Concrete coverage delta from the first successful smoke run
(compared to `make coverage-total`'s unit-test-only 37.8%):

  internal/firecracker        43.6% → 75.0%
  internal/daemon/workspace   33.8% → 60.8%
  internal/store              40.1% → 56.3%
  internal/guest              63.7% → 57.4%  (different mix: smoke
                                              exercises real SSH;
                                              unit tests cover more
                                              error branches)

The packages the review flagged are the ones that moved most —
which is the point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-22 18:59:57 -03:00
parent 52612b516b
commit 5f81332b0a
No known key found for this signature in database
GPG key ID: 33112E6833C34679
5 changed files with 221 additions and 18 deletions

117
scripts/smoke.sh Executable file
View file

@ -0,0 +1,117 @@
#!/usr/bin/env bash
#
# scripts/smoke.sh — end-to-end smoke suite for banger.
#
# Drives a real create → start → ssh → exec → delete cycle against
# real Firecracker + real KVM on the host. Intended as a pre-release
# gate: the Go unit + integration tests don't and can't cover the
# post-machine.Start path (socket ownership, guest boot, vsock agent
# wait, guest SSH, workspace prepare). If this suite fails, don't
# ship.
#
# State lives under $BANGER_SMOKE_XDG_DIR (set by `make smoke`,
# defaults to build/smoke/xdg). It's ISOLATED from the invoking
# user's real banger install via XDG_{CONFIG,STATE,CACHE,RUNTIME}
# overrides, but PERSISTED across runs — so the first smoke pulls
# the golden image, subsequent smokes reuse it. `make smoke-clean`
# wipes it.
#
# Invoked via `make smoke`, which sets the three env vars below.
# Don't run this directly unless you know they're set.
set -euo pipefail
log() { printf '[smoke] %s\n' "$*" >&2; }
die() { printf '[smoke] FAIL: %s\n' "$*" >&2; exit 1; }
: "${BANGER_SMOKE_BIN_DIR:?must point at the instrumented binary dir, set by make smoke}"
: "${BANGER_SMOKE_COVER_DIR:?must point at the coverage dir, set by make smoke}"
: "${BANGER_SMOKE_XDG_DIR:?must point at the isolated XDG root, set by make smoke}"
BANGER="$BANGER_SMOKE_BIN_DIR/banger"
BANGERD="$BANGER_SMOKE_BIN_DIR/bangerd"
VSOCK_AGENT="$BANGER_SMOKE_BIN_DIR/banger-vsock-agent"
for bin in "$BANGER" "$BANGERD" "$VSOCK_AGENT"; do
[[ -x "$bin" ]] || die "binary missing or not executable: $bin"
done
# Persistent XDG dirs (state, cache, config) so repeated smoke
# runs reuse the pulled golden image instead of re-downloading
# ~300MB each time. Runtime dir needs to be fresh per-run because
# it holds sockets the daemon cleans up on stop and refuses to
# reuse if any are stale.
mkdir -p \
"$BANGER_SMOKE_XDG_DIR/config" \
"$BANGER_SMOKE_XDG_DIR/state" \
"$BANGER_SMOKE_XDG_DIR/cache"
runtime_dir="$(mktemp -d -t banger-smoke-runtime-XXXXXX)"
# shellcheck disable=SC2064
trap "rm -rf '$runtime_dir'" EXIT
chmod 0700 "$runtime_dir"
export XDG_CONFIG_HOME="$BANGER_SMOKE_XDG_DIR/config"
export XDG_STATE_HOME="$BANGER_SMOKE_XDG_DIR/state"
export XDG_CACHE_HOME="$BANGER_SMOKE_XDG_DIR/cache"
export XDG_RUNTIME_DIR="$runtime_dir"
# Point banger at its companion binaries inside the smoke build.
export BANGER_DAEMON_BIN="$BANGERD"
export BANGER_VSOCK_AGENT_BIN="$VSOCK_AGENT"
# Instrumented binaries dump coverage here on clean exit.
export GOCOVERDIR="$BANGER_SMOKE_COVER_DIR"
mkdir -p "$GOCOVERDIR"
# Any smoke daemon left behind from a prior run that crashed mid-
# scenario would reuse the stale socket path and confuse
# ensureDaemon. Best-effort stop; ignore if nothing is running.
"$BANGER" daemon stop >/dev/null 2>&1 || true
# banger's vmDNS binds 127.0.0.1:42069 (UDP) hard. If the user's
# real (non-smoke) daemon is running, its listener holds the port
# and the smoke daemon's Open() fails before any scenario runs.
# Fail fast with an actionable message — don't guess whether to
# stop the user's daemon for them.
if command -v ss >/dev/null 2>&1 && ss -Huln 2>/dev/null | awk '{print $4}' | grep -q '[:.]42069$'; then
die 'port 127.0.0.1:42069 is already bound (likely your real banger daemon); stop it with `banger daemon stop` and re-run `make smoke`'
fi
# --- doctor -----------------------------------------------------------
log 'doctor: checking host readiness'
if ! "$BANGER" doctor; then
die 'doctor reported failures; fix the host before running smoke'
fi
# --- bare vm run ------------------------------------------------------
log "bare vm run: create + start + ssh + exec 'echo smoke-bare-ok' + --rm"
bare_out="$("$BANGER" vm run --rm -- echo smoke-bare-ok)" || die "bare vm run exit $?"
grep -q 'smoke-bare-ok' <<<"$bare_out" || die "bare vm run stdout missing marker: $bare_out"
# --- workspace vm run -------------------------------------------------
log 'workspace vm run: preparing a throwaway git repo'
repodir="$runtime_dir/fake-repo"
mkdir -p "$repodir"
(
cd "$repodir"
git init -q -b main
git config commit.gpgsign false
git config user.name smoke
git config user.email smoke@smoke
echo 'smoke-workspace-marker' > smoke-file.txt
git add .
git commit -q -m init
)
log "workspace vm run: create + start + workspace prepare + cat guest file + --rm"
ws_out="$("$BANGER" vm run --rm "$repodir" -- cat /root/repo/smoke-file.txt)" || die "workspace vm run exit $?"
grep -q 'smoke-workspace-marker' <<<"$ws_out" || die "workspace vm run didn't ship smoke-file.txt: $ws_out"
# --- daemon stop (flushes coverage) -----------------------------------
log 'stopping daemon so instrumented binaries flush coverage'
"$BANGER" daemon stop >/dev/null 2>&1 || true
# Give the daemon a moment to write its covdata pod before the trap
# tears down runtime_dir.
sleep 0.5
log 'all scenarios passed'