banger/scripts/smoke.sh
Thales Maciel 115eec8576
smoke: discoverable scenarios + selectable runs + parallel dispatch
`scripts/smoke.sh` was a 600-line linear script: no way to see what it
covers without reading the whole thing, and no way to run a single
scenario when iterating. Every iteration paid the full ~5-10 min suite,
which made fast feedback loops painful enough to avoid the suite.

Refactor into a registry + per-scenario functions:

- Top-of-file SMOKE_SCENARIOS (ordered) + SMOKE_DESCS (one-line desc per
  scenario) + SMOKE_CLASS (pure / repodir / global) drive both listing
  and dispatch. The 21 existing scenario blocks become scenario_<name>
  functions. Bodies are the inline blocks verbatim, modulo the workspace
  fixture move described below.
- New CLI: --list (cheap discovery, no install / no env-vars),
  --scenario NAME (or NAME,NAME,...), --jobs N (parallel dispatch),
  -h / --help.
- New setup_fixtures runs once after the install/doctor/restart preamble
  and produces the throwaway git repo at $repodir that 'repodir'-class
  scenarios consume. Lifted out of scenario_workspace_run so single-
  scenario invocations (e.g. --scenario workspace_dryrun) get the
  fixture even when the scenario that historically built it isn't
  selected.
- Wipe ~/.local/state/banger/ssh/known_hosts in the install preamble.
  `system uninstall --purge` clears /var/lib/banger but the user-side
  known_hosts persists by design — and smoke creates VMs that reuse
  guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a
  leftover entry trips StrictHostKeyChecking and the daemon's wait-
  for-ssh sees only timeouts. This was the real cause of the "guest
  ssh did not come up" flakes that surface across smoke iterations.

Parallel dispatch:

- --jobs N opts into a slot-limited pool: 'pure' scenarios fan out as
  individual jobs; 'repodir' scenarios fuse into a single serial chain
  (since they mutate $repodir in registry order); 'global' scenarios
  run serially after the pool, one at a time.
- Cap is min(N, 8) — each parallel slot runs an 8 GiB VM, so RAM is
  the binding constraint.
- Parallel-mode stdout/stderr per scenario buffer to per-scenario
  logs and emit one PASS/FAIL line on completion; on FAIL the buffer
  is dumped. Serial mode (--jobs 1, the default) keeps stdout
  unbuffered exactly as before.
- Parallelism is documented as experimental in --help: it surfaces
  real daemon-side concurrency bugs (image auto-pull manifest race,
  work-seed-refresh race on the shared work-seed.ext4) that don't
  appear in serial mode and that need their own fix in the daemon.
  Serial (--jobs 1) is the reliable path; --jobs N is for fast-
  iteration dev work where occasional re-runs are acceptable.

Exit codes: 0 ok, 1 assertion failed, 2 usage error (unknown
scenario, missing SCENARIO=), 77 explicit selection skipped (NAT
when sudo iptables is unavailable AND nat is the only selected
scenario; soft-skip otherwise).

Makefile additions:

- `make smoke-list` — cheap discovery, no smoke-build dep, no env vars.
- `make smoke-one SCENARIO=name` — single-scenario run, full preamble.
  MAKECMDGOALS guard catches missing SCENARIO= before any rebuild.
- `make smoke JOBS=N` — passes through to the script's --jobs N.
- Help text covers all three.

Verified: serial full suite passes 21/21 in ~140s on this host;
make smoke-one SCENARIO=workspace_restart runs the recently-added
regression test alone in ~50s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 16:56:57 -03:00

1023 lines
39 KiB
Bash

#!/usr/bin/env bash
#
# scripts/smoke.sh — end-to-end smoke suite for banger's supported
# two-service systemd model.
#
# Installs instrumented binaries as temporary bangerd.service +
# bangerd-root.service, drives real Firecracker/KVM scenarios, collects
# covdata from both services plus the CLI, then purges the smoke-owned
# install on exit.
#
# Because the supported path is global host state, smoke refuses to
# overwrite a pre-existing non-smoke install. If a prior smoke crashed,
# rerun `make smoke-clean` or `make smoke`; the smoke marker lets the
# harness purge only its own stale install safely.
#
# Scratch files live under $BANGER_SMOKE_XDG_DIR (historic name kept for
# make-compat). Service state uses the real supported system paths and is
# purged by the smoke cleanup path.
#
# Usage:
# scripts/smoke.sh # full suite, serial
# scripts/smoke.sh --list # cheap discovery, no install
# scripts/smoke.sh --scenario NAME # single scenario
# scripts/smoke.sh --scenario a,b,c # comma list, registry order
# scripts/smoke.sh --jobs N # parallel dispatch (default 1)
# scripts/smoke.sh -h | --help # this help
#
# Exit codes:
# 0 success
# 1 assertion failed
# 2 usage error (unknown scenario, bad flag)
# 77 scenario explicitly selected but env can't run it (autotools "skip")
set -euo pipefail
log() { printf '[smoke] %s\n' "$*" >&2; }
die() { printf '[smoke] FAIL: %s\n' "$*" >&2; exit 1; }
usage_die() { printf '[smoke] usage: %s\n' "$*" >&2; exit 2; }
wait_for_ssh() {
local vm="$1"
local deadline=$(( $(date +%s) + 60 ))
while (( $(date +%s) < deadline )); do
if "$BANGER" vm ssh "$vm" -- true >/dev/null 2>&1; then
return 0
fi
sleep 1
done
return 1
}
# ---------------------------------------------------------------------
# Scenario registry. Order in SMOKE_SCENARIOS is the run order for full
# suite mode and the order shown in --list. Class drives parallelism:
# pure — independent VMs, parallel-safe
# repodir — share $repodir mutations; serial chain in registry order
# global — assert host-global state (iptables, vm row counts, ssh-config
# on a fake HOME); run serially after everything else
# Names are bash function suffixes — `scenario_<name>` must exist.
# ---------------------------------------------------------------------
SMOKE_SCENARIOS=(
bare_run
workspace_run
exit_code
workspace_dryrun
include_untracked
workspace_export
concurrent_run
vm_lifecycle
vm_set
vm_restart
vm_kill
vm_prune
vm_ports
workspace_full_copy
workspace_basecommit
workspace_restart
vm_exec
ssh_config
nat
invalid_spec
invalid_name
)
declare -A SMOKE_DESCS=(
[bare_run]="bare vm run: create + start + ssh + echo + --rm"
[workspace_run]="workspace vm run: ship git repo, read file in guest"
[exit_code]="exit-code propagation: guest sh -c 'exit 42' returns rc=42"
[workspace_dryrun]="workspace dry-run: list tracked files without a VM"
[include_untracked]="--include-untracked ships files outside the git index"
[workspace_export]="workspace export round-trip: guest edit -> patch marker"
[concurrent_run]="two parallel --rm invocations both succeed"
[vm_lifecycle]="explicit create / stop / start / ssh / delete"
[vm_set]="reconfigure vcpu while stopped; guest sees new count"
[vm_restart]="restart verb: boot_id changes"
[vm_kill]="vm kill --signal KILL: stopped, no leaked dm device"
[vm_prune]="prune -f removes stopped VMs, preserves running ones"
[vm_ports]="vm ports: sshd :22 visible via VM DNS name"
[workspace_full_copy]="workspace prepare --mode full_copy: alternate transfer path"
[workspace_basecommit]="workspace export --base-commit: guest commits captured"
[workspace_restart]="workspace prepare -> stop -> start preserves marker"
[vm_exec]="vm exec: auto-cd, exit-code, stale-warn, --auto-prepare resync"
[ssh_config]="ssh-config --install / --uninstall: idempotent, HOME-isolated"
[nat]="--nat installs per-VM MASQUERADE; control VM does not"
[invalid_spec]="--vcpu 0 rejected, no VM row leaked"
[invalid_name]="bad names (uppercase/space/dot/leading-hyphen) all rejected"
)
declare -A SMOKE_CLASS=(
[bare_run]=pure
[workspace_run]=repodir
[exit_code]=pure
[workspace_dryrun]=repodir
[include_untracked]=repodir
[workspace_export]=repodir
[concurrent_run]=pure
[vm_lifecycle]=pure
[vm_set]=pure
[vm_restart]=pure
[vm_kill]=pure
[vm_prune]=pure
[vm_ports]=pure
[workspace_full_copy]=repodir
[workspace_basecommit]=repodir
[workspace_restart]=repodir
[vm_exec]=repodir
[ssh_config]=pure
[nat]=global
[invalid_spec]=global
[invalid_name]=global
)
usage() {
cat <<'EOF'
scripts/smoke.sh — banger end-to-end smoke suite
Usage:
scripts/smoke.sh run the full suite (serial)
scripts/smoke.sh --list list all scenarios (no install)
scripts/smoke.sh --scenario NAME run a single scenario
scripts/smoke.sh --scenario a,b,c run a comma-separated list
scripts/smoke.sh --jobs N parallel dispatch (default 1)
scripts/smoke.sh -h | --help this help
Notes:
--list works on a fresh checkout — no sudo, no KVM, no smoke-build.
--jobs N caps at min(N, 8); each parallel slot runs an 8 GiB VM.
Scenarios in the 'repodir' class share fixture mutations and run as
a serial chain regardless of --jobs.
Parallelism (--jobs >1) is experimental: it surfaces real concurrency
bugs in the daemon's image-pull and work-seed-refresh paths that don't
appear in serial mode. Use serial (--jobs 1, the default) for reliable
CI-style runs; use --jobs N when you can tolerate a few re-runs to
debug something fast.
Exit codes: 0 ok, 1 fail, 2 usage error, 77 explicit selection skipped.
EOF
}
list_scenarios() {
local name
for name in "${SMOKE_SCENARIOS[@]}"; do
printf ' %-22s %s\n' "$name" "${SMOKE_DESCS[$name]}"
done
}
# ---------------------------------------------------------------------
# Argument parsing. Done before env-var checks so --list / --help work
# on a fresh checkout, and so a typo in --scenario fails before we
# touch sudo / system install.
# ---------------------------------------------------------------------
SMOKE_LIST=0
SMOKE_FILTER=""
SMOKE_EXPLICIT=0
SMOKE_JOBS=1
while (( $# > 0 )); do
case "$1" in
--list)
SMOKE_LIST=1; shift ;;
--scenario)
[[ $# -ge 2 ]] || usage_die "--scenario requires a name (see --list)"
SMOKE_FILTER="$2"; SMOKE_EXPLICIT=1; shift 2 ;;
--scenario=*)
SMOKE_FILTER="${1#--scenario=}"; SMOKE_EXPLICIT=1; shift ;;
--jobs)
[[ $# -ge 2 ]] || usage_die "--jobs requires N"
SMOKE_JOBS="$2"; shift 2 ;;
--jobs=*)
SMOKE_JOBS="${1#--jobs=}"; shift ;;
-h|--help)
usage; exit 0 ;;
*)
usage_die "unknown argument: $1 (try --help)" ;;
esac
done
if (( SMOKE_LIST )); then
list_scenarios
exit 0
fi
# Validate --jobs.
if ! [[ "$SMOKE_JOBS" =~ ^[1-9][0-9]*$ ]]; then
usage_die "--jobs must be a positive integer; got '$SMOKE_JOBS'"
fi
if (( SMOKE_JOBS > 8 )); then
log "capping --jobs at 8 (each parallel slot runs an 8 GiB VM)"
SMOKE_JOBS=8
fi
# Resolve --scenario filter into SMOKE_SELECTED in registry order.
SMOKE_SELECTED=()
if [[ -n "$SMOKE_FILTER" ]]; then
declare -A _requested=()
IFS=',' read -r -a _names <<<"$SMOKE_FILTER"
for name in "${_names[@]}"; do
name="${name// /}"
[[ -n "$name" ]] || continue
if [[ -z "${SMOKE_DESCS[$name]+x}" ]]; then
printf '[smoke] unknown scenario: %s\n' "$name" >&2
printf '[smoke] available scenarios:\n' >&2
list_scenarios >&2
exit 2
fi
_requested[$name]=1
done
for name in "${SMOKE_SCENARIOS[@]}"; do
if [[ -n "${_requested[$name]+x}" ]]; then
SMOKE_SELECTED+=("$name")
fi
done
unset _requested _names
else
SMOKE_SELECTED=("${SMOKE_SCENARIOS[@]}")
fi
if (( ${#SMOKE_SELECTED[@]} == 0 )); then
usage_die "no scenarios selected"
fi
# ---------------------------------------------------------------------
# Env checks. Required for any scenario; not required for --list/--help.
# ---------------------------------------------------------------------
: "${BANGER_SMOKE_BIN_DIR:?must point at the instrumented binary dir, set by make smoke}"
: "${BANGER_SMOKE_COVER_DIR:?must point at the coverage dir, set by make smoke}"
: "${BANGER_SMOKE_XDG_DIR:?must point at the smoke scratch root, set by make smoke}"
BANGER="$BANGER_SMOKE_BIN_DIR/banger"
BANGERD="$BANGER_SMOKE_BIN_DIR/bangerd"
VSOCK_AGENT="$BANGER_SMOKE_BIN_DIR/banger-vsock-agent"
for bin in "$BANGER" "$BANGERD" "$VSOCK_AGENT"; do
[[ -x "$bin" ]] || die "binary missing or not executable: $bin"
done
scratch_root="$BANGER_SMOKE_XDG_DIR"
runtime_dir=
repodir=
smoke_owner="$(id -un)"
smoke_marker='/etc/banger/.smoke-owned'
service_cover_dir='/var/lib/banger'
owner_service='bangerd.service'
root_service='bangerd-root.service'
mkdir -p "$BANGER_SMOKE_COVER_DIR"
rm -rf "$scratch_root"
mkdir -p "$scratch_root"
runtime_dir="$(mktemp -d "$scratch_root/runtime-XXXXXX")"
# The CLI binary itself is instrumented, so keep its covdata local.
export GOCOVERDIR="$BANGER_SMOKE_COVER_DIR"
cleanup_export_vm() {
"$BANGER" vm delete smoke-export >/dev/null 2>&1 || true
}
cleanup_prune() {
"$BANGER" vm delete smoke-prune-running >/dev/null 2>&1 || true
"$BANGER" vm delete smoke-prune-stopped >/dev/null 2>&1 || true
}
collect_service_coverage() {
local uid gid
uid="$(id -u)"
gid="$(id -g)"
sudo bash -lc '
set -euo pipefail
shopt -s nullglob
dst="$1"
uid="$2"
gid="$3"
src="$4"
for file in "$src"/covmeta.* "$src"/covcounters.*; do
base="${file##*/}"
cp "$file" "$dst/$base"
chown "$uid:$gid" "$dst/$base"
chmod 0644 "$dst/$base"
done
' bash "$BANGER_SMOKE_COVER_DIR" "$uid" "$gid" "$service_cover_dir"
}
stop_services_for_coverage() {
sudo systemctl stop "$owner_service" "$root_service" >/dev/null 2>&1 || true
}
sudo_banger() {
sudo env GOCOVERDIR="$BANGER_SMOKE_COVER_DIR" "$@"
}
cleanup() {
set +e
for vm in \
smoke-lifecycle smoke-set smoke-restart smoke-kill smoke-ports smoke-fc \
smoke-basecommit smoke-exec smoke-wsrestart smoke-nat smoke-nocnat; do
"$BANGER" vm delete "$vm" >/dev/null 2>&1 || true
done
cleanup_export_vm
cleanup_prune
stop_services_for_coverage
collect_service_coverage
sudo_banger "$BANGER" system uninstall --purge >/dev/null 2>&1 || true
rm -rf "$scratch_root"
}
trap cleanup EXIT
install_preamble() {
if sudo test -f /etc/banger/install.toml; then
if sudo test -f "$smoke_marker"; then
log 'found stale smoke-owned install; purging it first'
sudo_banger "$BANGER" system uninstall --purge >/dev/null 2>&1 || true
else
die 'banger is already installed on this host; supported-path smoke refuses to overwrite a non-smoke install'
fi
fi
# Wipe the user-side known_hosts. `system uninstall --purge` clears
# /var/lib/banger but the user-state known_hosts at
# ~/.local/state/banger/ssh/known_hosts is by-design left alone — it's
# the user's data, not the daemon's. Smoke creates VMs that reuse
# guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a
# leftover entry from a prior run trips StrictHostKeyChecking and
# the daemon's wait-for-ssh sees only timeouts. Removing the file
# is safe — the daemon recreates it on first connect.
rm -f "$HOME/.local/state/banger/ssh/known_hosts" 2>/dev/null || true
log 'installing smoke-owned services'
sudo env \
GOCOVERDIR="$BANGER_SMOKE_COVER_DIR" \
BANGER_SYSTEM_GOCOVERDIR="$service_cover_dir" \
BANGER_ROOT_HELPER_GOCOVERDIR="$service_cover_dir" \
"$BANGER" system install --owner "$smoke_owner" >/dev/null \
|| die 'system install failed'
sudo touch "$smoke_marker"
local status_out
status_out="$("$BANGER" system status)" || die 'system status failed after install'
grep -qE '^active +active' <<<"$status_out" || die "owner daemon not active after install: $status_out"
grep -qE '^helper_active +active' <<<"$status_out" || die "root helper not active after install: $status_out"
log 'doctor: checking host readiness'
if ! "$BANGER" doctor; then
die 'doctor reported failures; fix the host before running smoke'
fi
log 'system restart: services should come back cleanly'
sudo_banger "$BANGER" system restart >/dev/null || die 'system restart failed'
status_out="$("$BANGER" system status)" || die 'system status failed after restart'
grep -qE '^active +active' <<<"$status_out" || die "owner daemon not active after restart: $status_out"
grep -qE '^helper_active +active' <<<"$status_out" || die "root helper not active after restart: $status_out"
}
# setup_fixtures builds the throwaway git repo at $repodir that every
# 'repodir'-class scenario consumes. Pulled out of scenario_workspace_run
# so single-scenario invocations (e.g. --scenario workspace_dryrun) get
# the fixture even when the scenario that historically created it is
# not selected.
setup_fixtures() {
log 'setup_fixtures: preparing throwaway git repo for repodir-class scenarios'
repodir="$runtime_dir/fake-repo"
mkdir -p "$repodir"
(
cd "$repodir"
git init -q -b main
git config commit.gpgsign false
git config user.name smoke
git config user.email smoke@smoke
echo 'smoke-workspace-marker' > smoke-file.txt
git add .
git commit -q -m init
)
}
# ---------------------------------------------------------------------
# Scenario implementations. Each is a function `scenario_<name>` that
# logs its description first and then runs assertions. Bodies are the
# pre-refactor inline blocks, modulo the workspace_run fixture move.
# ---------------------------------------------------------------------
scenario_bare_run() {
log "${SMOKE_DESCS[bare_run]}"
local bare_out
bare_out="$("$BANGER" vm run --rm -- echo smoke-bare-ok)" || die "bare vm run exit $?"
grep -q 'smoke-bare-ok' <<<"$bare_out" || die "bare vm run stdout missing marker: $bare_out"
}
scenario_workspace_run() {
log "${SMOKE_DESCS[workspace_run]}"
local ws_out
ws_out="$("$BANGER" vm run --rm "$repodir" -- cat /root/repo/smoke-file.txt)" || die "workspace vm run exit $?"
grep -q 'smoke-workspace-marker' <<<"$ws_out" || die "workspace vm run didn't ship smoke-file.txt: $ws_out"
}
scenario_exit_code() {
log "${SMOKE_DESCS[exit_code]}"
local rc
set +e
"$BANGER" vm run --rm -- sh -c 'exit 42'
rc=$?
set -e
[[ "$rc" -eq 42 ]] || die "exit-code propagation: got rc=$rc, want 42"
}
scenario_workspace_dryrun() {
log "${SMOKE_DESCS[workspace_dryrun]}"
local dry_out
dry_out="$("$BANGER" vm run --dry-run "$repodir")" || die "dry-run exit $?"
grep -q 'smoke-file.txt' <<<"$dry_out" || die "dry-run didn't list smoke-file.txt: $dry_out"
grep -q 'mode: tracked only' <<<"$dry_out" || die "dry-run mode line missing or wrong: $dry_out"
}
scenario_include_untracked() {
log "${SMOKE_DESCS[include_untracked]}"
echo 'untracked-marker' > "$repodir/smoke-untracked.txt"
local inc_out
inc_out="$("$BANGER" vm run --rm --include-untracked "$repodir" -- cat /root/repo/smoke-untracked.txt)" || die "include-untracked vm run exit $?"
grep -q 'untracked-marker' <<<"$inc_out" || die "--include-untracked didn't ship the untracked file: $inc_out"
# Self-cleanup: scenario added an untracked file, scenario removes it.
rm -f "$repodir/smoke-untracked.txt"
}
scenario_workspace_export() {
log "${SMOKE_DESCS[workspace_export]}"
"$BANGER" vm create --name smoke-export --image debian-bookworm >/dev/null \
|| die "export: vm create exit $?"
"$BANGER" vm workspace prepare smoke-export "$repodir" >/dev/null \
|| die "export: workspace prepare exit $?"
"$BANGER" vm ssh smoke-export -- sh -c 'echo guest-edit > /root/repo/new-guest-file.txt' \
|| die "export: guest-side file write exit $?"
local export_patch="$runtime_dir/smoke-export.diff"
"$BANGER" vm workspace export smoke-export --output "$export_patch" \
|| die "export: workspace export exit $?"
[[ -s "$export_patch" ]] || die "export: patch file empty at $export_patch"
grep -q 'new-guest-file.txt' "$export_patch" \
|| die "export: patch missing new-guest-file.txt marker (head: $(head -c 400 "$export_patch"))"
cleanup_export_vm
}
scenario_concurrent_run() {
log "${SMOKE_DESCS[concurrent_run]}"
local tmpA="$runtime_dir/concurrent-a.out"
local tmpB="$runtime_dir/concurrent-b.out"
"$BANGER" vm run --rm -- echo smoke-concurrent-a > "$tmpA" 2>&1 &
local pidA=$!
"$BANGER" vm run --rm -- echo smoke-concurrent-b > "$tmpB" 2>&1 &
local pidB=$!
wait "$pidA" || die "concurrent VM A exited non-zero: $(cat "$tmpA")"
wait "$pidB" || die "concurrent VM B exited non-zero: $(cat "$tmpB")"
grep -q 'smoke-concurrent-a' "$tmpA" || die "concurrent VM A missing marker: $(cat "$tmpA")"
grep -q 'smoke-concurrent-b' "$tmpB" || die "concurrent VM B missing marker: $(cat "$tmpB")"
}
scenario_vm_lifecycle() {
log "${SMOKE_DESCS[vm_lifecycle]}"
local lifecycle_name=smoke-lifecycle
local show_out ssh_out rc
"$BANGER" vm create --name "$lifecycle_name" >/dev/null || die "vm create $lifecycle_name failed"
show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after create failed"
grep -q '"state": "running"' <<<"$show_out" || die "post-create state not running: $show_out"
wait_for_ssh "$lifecycle_name" || die 'vm lifecycle: ssh did not come up after create'
ssh_out="$("$BANGER" vm ssh "$lifecycle_name" -- echo hello-1)" || die "vm ssh #1 failed"
grep -q 'hello-1' <<<"$ssh_out" || die "vm ssh #1 missing marker: $ssh_out"
"$BANGER" vm stop "$lifecycle_name" >/dev/null || die "vm stop failed"
show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after stop failed"
grep -q '"state": "stopped"' <<<"$show_out" || die "post-stop state not stopped: $show_out"
"$BANGER" vm start "$lifecycle_name" >/dev/null || die "vm start (from stopped) failed"
show_out="$("$BANGER" vm show "$lifecycle_name")" || die "vm show after start failed"
grep -q '"state": "running"' <<<"$show_out" || die "post-start state not running: $show_out"
wait_for_ssh "$lifecycle_name" || die 'vm lifecycle: ssh did not come up after restart'
ssh_out="$("$BANGER" vm ssh "$lifecycle_name" -- echo hello-2)" || die "vm ssh #2 (post-restart) failed"
grep -q 'hello-2' <<<"$ssh_out" || die "vm ssh #2 missing marker: $ssh_out"
"$BANGER" vm delete "$lifecycle_name" >/dev/null || die "vm delete failed"
set +e
"$BANGER" vm show "$lifecycle_name" >/dev/null 2>&1
rc=$?
set -e
[[ "$rc" -ne 0 ]] || die "vm show still finds $lifecycle_name after delete"
}
scenario_vm_set() {
log "${SMOKE_DESCS[vm_set]}"
local nproc_before nproc_after rc
"$BANGER" vm create --name smoke-set --vcpu 2 >/dev/null || die 'vm set: create failed'
wait_for_ssh smoke-set || die 'vm set: initial ssh did not come up'
set +e
nproc_before="$("$BANGER" vm ssh smoke-set -- nproc 2>/dev/null)"
rc=$?
set -e
[[ "$rc" -eq 0 ]] || die "vm set: initial nproc ssh exit $rc"
[[ "$(printf '%s' "$nproc_before" | tr -d '[:space:]')" == "2" ]] \
|| die "vm set: initial nproc got '$nproc_before', want 2"
"$BANGER" vm stop smoke-set >/dev/null || die 'vm set: stop failed'
"$BANGER" vm set smoke-set --vcpu 4 >/dev/null || die 'vm set: reconfigure failed'
"$BANGER" vm start smoke-set >/dev/null || die 'vm set: restart failed'
wait_for_ssh smoke-set || die 'vm set: post-reconfig ssh did not come up'
set +e
nproc_after="$("$BANGER" vm ssh smoke-set -- nproc 2>/dev/null)"
rc=$?
set -e
[[ "$rc" -eq 0 ]] || die "vm set: post-reconfig nproc ssh exit $rc"
[[ "$(printf '%s' "$nproc_after" | tr -d '[:space:]')" == "4" ]] \
|| die "vm set: post-reconfig nproc got '$nproc_after', want 4 (spec change didn't land)"
"$BANGER" vm delete smoke-set >/dev/null || die 'vm set: delete failed'
}
scenario_vm_restart() {
log "${SMOKE_DESCS[vm_restart]}"
local boot_before boot_after
"$BANGER" vm create --name smoke-restart >/dev/null || die 'vm restart: create failed'
wait_for_ssh smoke-restart || die 'vm restart: initial ssh never came up'
boot_before="$("$BANGER" vm ssh smoke-restart -- cat /proc/sys/kernel/random/boot_id | tr -d '[:space:]')"
[[ -n "$boot_before" ]] || die 'vm restart: could not read initial boot_id'
"$BANGER" vm restart smoke-restart >/dev/null || die 'vm restart: verb failed'
wait_for_ssh smoke-restart || die 'vm restart: ssh did not come up after restart'
boot_after="$("$BANGER" vm ssh smoke-restart -- cat /proc/sys/kernel/random/boot_id | tr -d '[:space:]')"
[[ -n "$boot_after" ]] || die 'vm restart: could not read post-restart boot_id'
[[ "$boot_before" != "$boot_after" ]] \
|| die "vm restart: boot_id unchanged ($boot_before); verb didn't actually reboot the guest"
"$BANGER" vm delete smoke-restart >/dev/null || die 'vm restart: delete failed'
}
scenario_vm_kill() {
log "${SMOKE_DESCS[vm_kill]}"
local dm_name show_out
"$BANGER" vm create --name smoke-kill >/dev/null || die 'vm kill: create failed'
dm_name="$("$BANGER" vm show smoke-kill 2>/dev/null | awk -F'"' '/"dm_dev"|fc-rootfs-/ {for(i=1;i<=NF;i++) if($i~/^fc-rootfs-/) print $i}' | head -1 || true)"
"$BANGER" vm kill --signal KILL smoke-kill >/dev/null || die 'vm kill: verb failed'
show_out="$("$BANGER" vm show smoke-kill)" || die 'vm kill: show after kill failed'
grep -q '"state": "stopped"' <<<"$show_out" || die "vm kill: post-kill state not stopped: $show_out"
if [[ -n "$dm_name" ]]; then
if sudo -n dmsetup ls 2>/dev/null | awk '{print $1}' | grep -qx "$dm_name"; then
die "vm kill: dm device $dm_name still mapped (cleanup didn't run)"
fi
fi
"$BANGER" vm delete smoke-kill >/dev/null || die 'vm kill: delete failed'
}
scenario_vm_prune() {
log "${SMOKE_DESCS[vm_prune]}"
"$BANGER" vm create --name smoke-prune-running >/dev/null || die 'vm prune: create running failed'
"$BANGER" vm create --name smoke-prune-stopped >/dev/null || die 'vm prune: create stopped failed'
"$BANGER" vm stop smoke-prune-stopped >/dev/null || die 'vm prune: stop the stopped one failed'
"$BANGER" vm prune -f >/dev/null || die 'vm prune: verb failed'
"$BANGER" vm show smoke-prune-running >/dev/null 2>&1 || die 'vm prune: running VM was deleted (regression!)'
if "$BANGER" vm show smoke-prune-stopped >/dev/null 2>&1; then
die 'vm prune: stopped VM survived prune'
fi
"$BANGER" vm delete smoke-prune-running >/dev/null || die 'vm prune: cleanup delete failed'
}
scenario_vm_ports() {
log "${SMOKE_DESCS[vm_ports]}"
local ports_out
"$BANGER" vm create --name smoke-ports >/dev/null || die 'vm ports: create failed'
wait_for_ssh smoke-ports || die 'vm ports: ssh did not come up'
ports_out="$("$BANGER" vm ports smoke-ports 2>&1)" \
|| die "vm ports: verb failed: $ports_out"
grep -q 'smoke-ports.vm:22' <<<"$ports_out" \
|| die "vm ports: expected 'smoke-ports.vm:22' in output; got: $ports_out"
grep -q 'sshd' <<<"$ports_out" \
|| die "vm ports: expected process 'sshd' in output; got: $ports_out"
"$BANGER" vm delete smoke-ports >/dev/null || die 'vm ports: delete failed'
}
scenario_workspace_full_copy() {
log "${SMOKE_DESCS[workspace_full_copy]}"
local fc_out
"$BANGER" vm create --name smoke-fc >/dev/null || die 'workspace fc: create failed'
"$BANGER" vm workspace prepare smoke-fc "$repodir" --mode full_copy >/dev/null \
|| die 'workspace fc: prepare --mode full_copy failed'
fc_out="$("$BANGER" vm ssh smoke-fc -- cat /root/repo/smoke-file.txt)" \
|| die 'workspace fc: guest read failed'
grep -q 'smoke-workspace-marker' <<<"$fc_out" \
|| die "workspace fc: marker missing in full_copy workspace: $fc_out"
"$BANGER" vm delete smoke-fc >/dev/null || die 'workspace fc: delete failed'
}
scenario_workspace_basecommit() {
log "${SMOKE_DESCS[workspace_basecommit]}"
"$BANGER" vm create --name smoke-basecommit >/dev/null || die 'export base: create failed'
"$BANGER" vm workspace prepare smoke-basecommit "$repodir" >/dev/null \
|| die 'export base: prepare failed'
local base_sha
base_sha="$("$BANGER" vm ssh smoke-basecommit -- sh -c 'cd /root/repo && git rev-parse HEAD' | tr -d '[:space:]')"
[[ "${#base_sha}" -eq 40 ]] || die "export base: bad base sha: $base_sha"
"$BANGER" vm ssh smoke-basecommit -- sh -c "cd /root/repo && git -c user.email=smoke@smoke -c user.name=smoke checkout -b smoke-branch >/dev/null 2>&1 && echo committed-marker > smoke-committed.txt && git add smoke-committed.txt && git -c user.email=smoke@smoke -c user.name=smoke commit -q -m 'guest side'" \
|| die 'export base: guest-side commit failed'
local plain_patch="$runtime_dir/smoke-plain.diff"
"$BANGER" vm workspace export smoke-basecommit --output "$plain_patch" \
|| die 'export base: plain export failed'
if [[ -f "$plain_patch" ]] && grep -q 'smoke-committed.txt' "$plain_patch"; then
die 'export base: plain export unexpectedly captured the guest-side commit'
fi
local base_patch="$runtime_dir/smoke-base.diff"
"$BANGER" vm workspace export smoke-basecommit --base-commit "$base_sha" --output "$base_patch" \
|| die 'export base: --base-commit export failed'
[[ -s "$base_patch" ]] || die 'export base: patch file empty'
grep -q 'smoke-committed.txt' "$base_patch" \
|| die "export base: --base-commit patch missing committed marker (head: $(head -c 400 "$base_patch"))"
"$BANGER" vm delete smoke-basecommit >/dev/null || die 'export base: delete failed'
}
scenario_workspace_restart() {
log "${SMOKE_DESCS[workspace_restart]}"
"$BANGER" vm create --name smoke-wsrestart >/dev/null \
|| die 'workspace stop/start: create failed'
"$BANGER" vm workspace prepare smoke-wsrestart "$repodir" >/dev/null \
|| die 'workspace stop/start: prepare failed'
# Sanity: marker is present before the stop/start cycle.
local pre_out
pre_out="$("$BANGER" vm ssh smoke-wsrestart -- cat /root/repo/smoke-file.txt)" \
|| die 'workspace stop/start: pre-cycle ssh read failed'
grep -q 'smoke-workspace-marker' <<<"$pre_out" \
|| die "workspace stop/start: marker missing pre-cycle: $pre_out"
"$BANGER" vm stop smoke-wsrestart >/dev/null \
|| die 'workspace stop/start: stop failed'
"$BANGER" vm start smoke-wsrestart >/dev/null \
|| die 'workspace stop/start: start after stop failed (rootfs corrupt?)'
wait_for_ssh smoke-wsrestart \
|| die 'workspace stop/start: ssh did not come up after restart'
local post_out
post_out="$("$BANGER" vm ssh smoke-wsrestart -- cat /root/repo/smoke-file.txt)" \
|| die 'workspace stop/start: post-cycle ssh read failed'
grep -q 'smoke-workspace-marker' <<<"$post_out" \
|| die "workspace stop/start: marker lost across stop/start: $post_out"
"$BANGER" vm delete smoke-wsrestart >/dev/null \
|| die 'workspace stop/start: delete failed'
}
scenario_vm_exec() {
log "${SMOKE_DESCS[vm_exec]}"
local show_out exec_cat exec_pwd rc
"$BANGER" vm create --name smoke-exec >/dev/null || die 'vm exec: create failed'
"$BANGER" vm workspace prepare smoke-exec "$repodir" >/dev/null \
|| die 'vm exec: workspace prepare failed'
# WORKSPACE column populated in vm show after prepare.
show_out="$("$BANGER" vm show smoke-exec)" || die 'vm exec: vm show after prepare failed'
grep -q '"guest_path": "/root/repo"' <<<"$show_out" \
|| die "vm exec: workspace.guest_path not persisted on VM record: $show_out"
# Basic happy path: cd happens, file is read from the workspace.
exec_cat="$("$BANGER" vm exec smoke-exec -- cat smoke-file.txt)" \
|| die "vm exec: cat smoke-file.txt failed"
grep -q 'smoke-workspace-marker' <<<"$exec_cat" \
|| die "vm exec: stdout missing workspace marker: $exec_cat"
# pwd confirms the auto-cd into the prepared guest path.
exec_pwd="$("$BANGER" vm exec smoke-exec -- pwd | tr -d '[:space:]')" \
|| die 'vm exec: pwd failed'
[[ "$exec_pwd" == "/root/repo" ]] \
|| die "vm exec: pwd got '$exec_pwd', want '/root/repo' (auto-cd didn't happen)"
# Exit-code propagation: 17 must come back as 17, verbatim.
set +e
"$BANGER" vm exec smoke-exec -- sh -c 'exit 17' >/dev/null 2>&1
rc=$?
set -e
[[ "$rc" -eq 17 ]] || die "vm exec: exit-code propagation got rc=$rc, want 17"
# Dirty detection: advance host HEAD, run `vm exec` without --auto-prepare,
# expect a stale-workspace warning on stderr and the new file NOT present in
# the guest (workspace was not re-synced).
(
cd "$repodir"
echo 'post-prepare-marker' > smoke-exec-new.txt
git add smoke-exec-new.txt
git commit -q -m 'add smoke-exec-new.txt after prepare'
)
local stale_stderr="$runtime_dir/smoke-exec-stale.err"
local ls_rc
set +e
"$BANGER" vm exec smoke-exec -- ls smoke-exec-new.txt >/dev/null 2>"$stale_stderr"
ls_rc=$?
set -e
[[ "$ls_rc" -ne 0 ]] \
|| die 'vm exec: stale workspace unexpectedly already had the new file (dirty path didn'"'"'t take effect)'
grep -q 'workspace stale' "$stale_stderr" \
|| die "vm exec: stale-workspace warning missing on stderr; got: $(cat "$stale_stderr")"
grep -q -- '--auto-prepare' "$stale_stderr" \
|| die "vm exec: stale warning didn't mention --auto-prepare hint; got: $(cat "$stale_stderr")"
# --auto-prepare: re-syncs workspace, then runs the command. New file appears.
local auto_out
auto_out="$("$BANGER" vm exec smoke-exec --auto-prepare -- cat smoke-exec-new.txt)" \
|| die 'vm exec: --auto-prepare run failed'
grep -q 'post-prepare-marker' <<<"$auto_out" \
|| die "vm exec: --auto-prepare didn't re-sync new file; got: $auto_out"
# After auto-prepare, the warning must NOT reappear on the next exec —
# stored HEAD should now match the host.
local clean_stderr="$runtime_dir/smoke-exec-clean.err"
"$BANGER" vm exec smoke-exec -- true 2>"$clean_stderr" \
|| die 'vm exec: post-auto-prepare exec failed'
if grep -q 'workspace stale' "$clean_stderr"; then
die "vm exec: stale warning persisted after --auto-prepare; got: $(cat "$clean_stderr")"
fi
# Self-cleanup: scenario added a host-side commit, scenario rolls it back
# so downstream repodir-class scenarios see the original tree.
(
cd "$repodir"
git reset --hard HEAD~1 -q
)
# Refusal when VM is not running: exec on a stopped VM must error out
# with a clear "not running" message. Done last so we can delete from
# the stopped state without needing a restart.
"$BANGER" vm stop smoke-exec >/dev/null || die 'vm exec: stop for not-running test failed'
local stopped_err
set +e
stopped_err="$("$BANGER" vm exec smoke-exec -- true 2>&1)"
rc=$?
set -e
[[ "$rc" -ne 0 ]] || die 'vm exec: exec on stopped VM unexpectedly succeeded'
grep -q 'not running' <<<"$stopped_err" \
|| die "vm exec: stopped-VM error missing 'not running' phrase: $stopped_err"
"$BANGER" vm delete smoke-exec >/dev/null || die 'vm exec: delete failed'
}
scenario_ssh_config() {
log "${SMOKE_DESCS[ssh_config]}"
local fake_home="$scratch_root/fake-home"
mkdir -p "$fake_home/.ssh"
printf 'Host myserver\n HostName example.invalid\n' > "$fake_home/.ssh/config"
(
export HOME="$fake_home"
"$BANGER" ssh-config --install >/dev/null || die 'ssh-config: install failed'
grep -q '^Include ' "$fake_home/.ssh/config" \
|| die "ssh-config: install didn't add Include line to ~/.ssh/config"
grep -q '^Host myserver' "$fake_home/.ssh/config" \
|| die 'ssh-config: install clobbered pre-existing content (!!)'
"$BANGER" ssh-config --install >/dev/null || die 'ssh-config: second install failed'
local include_count
include_count="$(grep -c '^Include .*banger' "$fake_home/.ssh/config")"
[[ "$include_count" == "1" ]] \
|| die "ssh-config: install not idempotent (Include appeared $include_count times)"
"$BANGER" ssh-config --uninstall >/dev/null || die 'ssh-config: uninstall failed'
if grep -q '^Include .*banger' "$fake_home/.ssh/config"; then
die 'ssh-config: uninstall left the Include line behind'
fi
grep -q '^Host myserver' "$fake_home/.ssh/config" \
|| die 'ssh-config: uninstall nuked user content (!!)'
)
}
scenario_nat() {
log "${SMOKE_DESCS[nat]}"
if ! sudo -n iptables -t nat -S POSTROUTING >/dev/null 2>&1; then
# Env-skip semantics:
# - implicit (no --scenario, or mixed --scenario list): soft-skip.
# - explicit (only "nat" selected): exit 77 to distinguish from
# a real failure for callers that care.
if (( SMOKE_EXPLICIT == 1 )) && (( ${#SMOKE_SELECTED[@]} == 1 )) \
&& [[ "${SMOKE_SELECTED[0]}" == "nat" ]]; then
log 'NAT: passwordless sudo iptables unavailable; explicit selection — exiting 77 (autotools skip)'
exit 77
fi
log 'NAT: skipping — passwordless sudo iptables unavailable'
return 0
fi
"$BANGER" vm create --name smoke-nat --nat >/dev/null || die 'NAT: create --nat failed'
"$BANGER" vm create --name smoke-nocnat >/dev/null || die 'NAT: control create failed'
local nat_ip ctl_ip postrouting rule_count
nat_ip="$("$BANGER" vm show smoke-nat 2>/dev/null | awk -F'"' '/"guest_ip"/ {print $4}')"
ctl_ip="$("$BANGER" vm show smoke-nocnat 2>/dev/null | awk -F'"' '/"guest_ip"/ {print $4}')"
[[ -n "$nat_ip" && -n "$ctl_ip" ]] || die "NAT: couldn't read guest IPs (nat='$nat_ip', ctl='$ctl_ip')"
postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)"
grep -q -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting" \
|| die "NAT: --nat VM has no POSTROUTING MASQUERADE rule for $nat_ip; got:"$'\n'"$postrouting"
if grep -q -- "-s $ctl_ip/32.*-j MASQUERADE" <<<"$postrouting"; then
die "NAT: control VM unexpectedly has a MASQUERADE rule for $ctl_ip"
fi
"$BANGER" vm stop smoke-nat >/dev/null || die 'NAT: stop --nat VM failed'
"$BANGER" vm start smoke-nat >/dev/null || die 'NAT: restart --nat VM failed'
postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)"
rule_count="$(grep -c -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting" || true)"
[[ "$rule_count" == "1" ]] \
|| die "NAT: MASQUERADE rule count for $nat_ip = $rule_count after restart, want 1"
"$BANGER" vm delete smoke-nat >/dev/null || die 'NAT: delete --nat VM failed'
"$BANGER" vm delete smoke-nocnat >/dev/null || die 'NAT: delete control VM failed'
postrouting="$(sudo -n iptables -t nat -S POSTROUTING 2>/dev/null || true)"
if grep -q -- "-s $nat_ip/32.*-j MASQUERADE" <<<"$postrouting"; then
die "NAT: delete left a MASQUERADE rule behind for $nat_ip"
fi
}
scenario_invalid_spec() {
log "${SMOKE_DESCS[invalid_spec]}"
local pre_vms post_vms rc
pre_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)"
set +e
"$BANGER" vm run --rm --vcpu 0 -- echo unused >/dev/null 2>&1
rc=$?
set -e
[[ "$rc" -ne 0 ]] || die 'invalid spec: vm run succeeded despite --vcpu 0'
post_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)"
[[ "$pre_vms" == "$post_vms" ]] || die "invalid spec leaked a VM row: pre=$pre_vms, post=$post_vms"
}
scenario_invalid_name() {
log "${SMOKE_DESCS[invalid_name]}"
local pre_vms post_vms rc
pre_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)"
for bad in 'MyBox' 'my box' 'box.vm' '-box'; do
set +e
"$BANGER" vm create --name "$bad" --no-start >/dev/null 2>&1
rc=$?
set -e
[[ "$rc" -ne 0 ]] || die "invalid name: vm create accepted '$bad'"
done
post_vms="$("$BANGER" vm list --all 2>/dev/null | wc -l)"
[[ "$pre_vms" == "$post_vms" ]] \
|| die "invalid name leaked VM row(s): pre=$pre_vms, post=$post_vms"
}
# ---------------------------------------------------------------------
# Dispatchers.
# ---------------------------------------------------------------------
# run_serial calls each named scenario in-process. die() exits the
# script with rc=1 on any failure (current behavior). Stdout is
# unbuffered — identical to the pre-refactor experience.
run_serial() {
local name
for name in "$@"; do
"scenario_$name"
done
}
# run_repodir_chain runs the repodir scenarios serially (registry order)
# inside a subshell so it can be backgrounded as one virtual job in the
# parallel pool. Buffered stdout/stderr go to one logfile.
run_repodir_chain() {
local logfile="$runtime_dir/parallel-repodir.log"
local rc=0
(
local name
for name in "$@"; do
"scenario_$name" || exit 1
done
) >"$logfile" 2>&1 || rc=$?
return $rc
}
# run_one_buffered runs a single scenario in a subshell with stdout/stderr
# captured to a per-scenario logfile. On failure the buffer is dumped on
# the main stderr; on success only the one-line PASS is shown.
run_one_buffered() {
local name=$1
local logfile="$runtime_dir/parallel-$name.log"
local rc=0
( "scenario_$name" ) >"$logfile" 2>&1 || rc=$?
if (( rc == 0 )); then
printf '[smoke] %s: PASS\n' "$name" >&2
else
printf '[smoke] %s: FAIL (rc=%d)\n' "$name" "$rc" >&2
sed 's/^/[smoke:'"$name"'] /' "$logfile" >&2
fi
return $rc
}
# run_parallel splits the selection into pure singletons + a single fused
# repodir chain (if any), runs them all in a slot-limited pool, then
# runs global scenarios serially in registry order. Reports per-scenario
# outcomes; final exit is non-zero iff any sub-job failed.
run_parallel() {
local jobs=$1; shift
local selected=("$@")
local pure=() repodir_chain=() global=()
local name
for name in "${selected[@]}"; do
case "${SMOKE_CLASS[$name]}" in
pure) pure+=("$name") ;;
repodir) repodir_chain+=("$name") ;;
global) global+=("$name") ;;
esac
done
# Build the parallel-pool job list. The repodir chain (if any) is one
# virtual job — it runs its scenarios serially inside a subshell and
# competes with pure scenarios for a slot.
local pool=()
for name in "${pure[@]}"; do
pool+=("pure:$name")
done
if (( ${#repodir_chain[@]} > 0 )); then
pool+=("repodir:$(IFS=' '; echo "${repodir_chain[*]}")")
fi
log "parallel pool: ${#pool[@]} job(s), ${#global[@]} global; jobs=$jobs"
declare -A pid_kind=()
declare -A pid_label=()
local active=0
local failures=0
local job kind payload
for job in "${pool[@]}"; do
kind="${job%%:*}"
payload="${job#*:}"
while (( active >= jobs )); do
if ! wait -n; then
failures=$(( failures + 1 ))
fi
active=$(( active - 1 ))
done
if [[ "$kind" == "pure" ]]; then
run_one_buffered "$payload" &
else
# repodir chain: payload is a space-separated list of names
# shellcheck disable=SC2086
( run_repodir_chain $payload ) &
local p=$!
pid_kind[$p]=repodir
pid_label[$p]="$payload"
fi
active=$(( active + 1 ))
done
# Drain remaining jobs.
while (( active > 0 )); do
if ! wait -n; then
failures=$(( failures + 1 ))
fi
active=$(( active - 1 ))
done
# Emit a one-line report for the repodir chain if it ran.
if (( ${#repodir_chain[@]} > 0 )); then
local logfile="$runtime_dir/parallel-repodir.log"
if [[ -s "$logfile" ]]; then
log "repodir chain log:"
sed 's/^/[smoke:repodir] /' "$logfile" >&2
fi
fi
if (( failures > 0 )); then
log "parallel pool: $failures job(s) failed"
exit 1
fi
# Global scenarios: serial, in registry order, current behavior.
if (( ${#global[@]} > 0 )); then
log "global pool: ${#global[@]} scenario(s) (serial)"
run_serial "${global[@]}"
fi
}
# ---------------------------------------------------------------------
# Main.
# ---------------------------------------------------------------------
install_preamble
setup_fixtures
if (( SMOKE_JOBS == 1 )); then
run_serial "${SMOKE_SELECTED[@]}"
else
run_parallel "$SMOKE_JOBS" "${SMOKE_SELECTED[@]}"
fi
if (( ${#SMOKE_SELECTED[@]} == ${#SMOKE_SCENARIOS[@]} )); then
log 'all scenarios passed'
else
log "scenario(s) passed: ${SMOKE_SELECTED[*]}"
fi