Pre-release polish: be explicit about which firecracker versions
banger has been validated against, and give users a one-line install
suggestion when the binary is missing rather than the previous
generic "install firecracker or set firecracker_bin".
internal/firecracker/version.go (new):
* MinSupportedVersion = "1.5.0" — the floor banger refuses to
launch below. Bumping this is a deliberate decision, paired
with whatever helper feature started requiring the newer
firecracker.
* KnownTestedVersion = "1.14.1" — what banger's smoke suite
actually runs against today.
* SemVer + Compare + ParseVersionOutput, table-tested. The parser
tolerates the trailing "exiting successfully" log line that
firecracker tacks onto --version; only the canonical
"Firecracker vX.Y.Z" line matters.
* QueryVersion shells `<bin> --version` through a CommandRunner-
shaped interface; doesn't import internal/system to keep the
firecracker package leaf-clean.
internal/daemon/doctor.go:
* New addFirecrackerVersionCheck replaces the previous bare
RequireExecutable preflight for firecracker. Three outcomes:
PASS within [Min, Tested], WARN above Tested (newer firecracker
usually works but is outside the tested window), FAIL below Min
or when the binary is missing.
* On missing binary, surfaces a distro-aware install command via
parseOSReleaseIDs(/etc/os-release) → guessFirecrackerInstall
Command. Pinned suggestions for debian (apt), arch/manjaro
(paru), and nixos (nix-env). Other distros get only the upstream
Releases URL — guessing wrong sends users on a wild goose chase.
* runtimeChecks no longer includes the firecracker preflight; the
new check subsumes it.
README.md:
* Requirements line now spells out the tested-against version
(v1.14.1) and the supported floor (≥ v1.5.0), and points at
`banger doctor` for the version check + install hint.
Tests: ParseVersionOutput across canonical/prerelease/garbage inputs,
SemVer.Compare across major/minor/patch boundaries, MustParseSemVer
panics on malformed inputs. Doctor-side: PASS on tested version,
FAIL below Min, WARN above Tested, FAIL with upstream URL when
missing, install-hint dispatch table covering debian/ubuntu (via
ID_LIKE)/arch/manjaro/nixos/fedora-fallback/missing-os-release.
The renamed TestDoctorReport_MissingFirecrackerFails... now asserts
against the new check name. Live `banger doctor` reports
"v1.14.1 at /usr/bin/firecracker (within tested range; min v1.5.0,
tested v1.14.1)" against the smoke host.
Smoke bare_run still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
validateFirecrackerPID was a substring check on /proc/<pid>/cmdline:
"contains 'firecracker'". Good enough to refuse init/sshd/the test
binary, but on a shared host where multiple users run firecracker
the helper would happily SIGKILL someone else's VM. The owner-UID
daemon could weaponise the helper as an arbitrary "kill any
firecracker on this box" primitive.
Replace the substring gate with two stronger acceptance modes:
* Cgroup match (the supported path): /proc/<pid>/cgroup contains
bangerd-root.service. systemd assigns every direct child of the
helper unit into that cgroup at fork; the kernel keeps it there
for the process's lifetime, so no daemon-UID code can forge it.
Other users' firecracker processes live in different cgroups
(user@<uid>.service, foreign service slices) and fail this
check. Also robust across helper restarts: KillMode=control-group
on the unit kills children when the service goes down, so an
"orphan banger firecracker in some other cgroup" is rare by
construction.
* --api-sock fallback: cmdline carries `--api-sock <path>` with
the path under banger's RuntimeDir. Covers the legacy direct
(no-jailer) launch path, and gives daemon reconcile a way to
clean up the rare orphan that lands outside the service cgroup
after a hard helper crash.
Tried /proc/<pid>/root first — pivot_root semantics make jailer'd
firecracker read its root as "/" from any namespace, so the symlink
is useless as a banger-managed fingerprint. Cgroup is the right
signal.
Also added a signal allowlist: priv.signal_process now rejects
anything outside {TERM, KILL, INT, HUP, QUIT, USR1, USR2, ABRT}
(case-insensitive, with or without SIG prefix). STOP/CONT, real-time
signals, and numeric forms are refused — the helper running as root
must not be a generic "send arbitrary signal to my pid" primitive.
priv.kill_process is unaffected (it always sends KILL).
Tests: validateSignalName covers allowlist + numeric/STOP/RTMIN
rejection; extractFirecrackerAPISock pins the three flag forms
(--api-sock VAL, --api-sock=VAL, -a VAL); pathIsUnder gets a small
table; existing TestValidateFirecrackerPID still rejects PID 0,
PID 1, and the test process itself. Doctor's non-system-mode test
gained a t.TempDir-backed install path so it stops being
environment-dependent on machines that happen to have
/etc/banger/install.toml.
Smoke at JOBS=4 still green — every banger-launched firecracker
sails through the cgroup match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`docs/privileges.md` now documents what the install promises (helper +
daemon services active, sockets at 0600 ownerUID, units carrying the
hardening directives, firecracker root-owned + non-writable). Doctor
verifies the running install matches: drift between the doc and the
filesystem would silently weaken the trust model otherwise.
In system mode (install.toml present):
* helper service / owner daemon service: `systemctl is-active`.
* helper socket / daemon socket: stat-and-compare mode + uid against
the registered owner.
* helper unit hardening / daemon unit hardening: scan the rendered
unit for NoNewPrivileges, ProtectSystem=strict, ProtectHome
(=yes for the helper, =read-only for the daemon), RestrictSUIDSGID,
LockPersonality, and the helper's CapabilityBoundingSet line. The
daemon unit also pins User=<registered owner>.
* firecracker binary ownership: regular file, not a symlink, mode
not group/world writable, executable, owned by uid 0 — same
constraints validateRootExecutable enforces at launch, surfaced
once at doctor time so a misconfigured binary fails fast with a
clearer error than the helper's open-time rejection.
In non-system mode (no /etc/banger/install.toml) doctor emits a single
WARN row pointing at docs/privileges.md > 'Running outside the system
install'. A PASS would imply guarantees the install isn't actually
providing.
Tests cover both branches: the non-system warn pins its message
substrings; system-mode pins that every check name shows up; and the
helpers (socket-perms, unit-hardening, executable-ownership) have
direct table-style negative tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
banger hasn't shipped a public release — every "legacy", "pre-opt-in",
"previously", "migration note", "no longer" reference in the tree is
pinning against a state no real user's install has ever been in.
That scaffolding has weight: it's a coordinate system future readers
have to decode, and it keeps dead code alive.
Removed (code):
- internal/daemon/ssh_client_config.go
- vmSSHConfigIncludeBegin / vmSSHConfigIncludeEnd constants and
every `removeManagedBlock(existing, vm...)` call they enabled
(legacy inline `Host *.vm` block scrub)
- cleanupLegacySSHConfigDir (+ its caller in syncVMSSHClientConfig)
— wiped a pre-opt-in sibling file under $ConfigDir/ssh
- sameDirOrParent + resolvePathForComparison — only ever used
by cleanupLegacySSHConfigDir
- the "also check legacy marker" fallback in
UserSSHIncludeInstalled / UninstallUserSSHInclude
- internal/store/migrations.go
- migrateDropDeadImageColumns (migration 2) + its slice entry
- dropColumnIfExists (orphaned after the above)
- addColumnIfMissing + the whole "columns added across the pre-
versioning lifetime" block at the end of migrateBaseline —
subsumed into the baseline CREATE TABLE
- `packages_path TEXT` column on the images table (the
throwaway migration 2 dropped it, but there was never any
reader)
- internal/daemon/vm.go
- vmDNSRecordName local wrapper — was justified as "avoid
pulling vmdns into every file"; three of four callers already
imported vmdns directly, so inline the one stray call
- internal/cli/cli_test.go
- TestLegacyRemovedCommandIsRejected (`tui` subcommand never
shipped)
Removed / simplified (tests):
- ssh_client_config_test.go: dropped TestSameDirOrParentHandlesSymlinks,
TestSyncVMSSHClientConfigPreservesUserKeyInLegacyDir,
TestSyncVMSSHClientConfigNarrowsCleanupToLegacyFile,
TestSyncVMSSHClientConfigLeavesUnexpectedLegacyContents,
TestInstallUserSSHIncludeMigratesLegacyInlineBlock, plus the
"legacy posture" regression strings in the remaining happy-path
test; TestUninstallUserSSHIncludeRemovesBothMarkerBlocks collapsed
to a single-block test
- migrations_test.go: dropped TestMigrateDropDeadImageColumns_AcrossInstallPaths,
TestDropColumnIfExistsIsIdempotent; TestOpenReadOnlyDoesNotRunMigrations
simplified to test against the baseline marker
Removed (docs):
- README.md "**Migration note.**" blockquote about the SSH-key path move
- docs/advanced.md parenthetical "(the old behaviour)"
Reworded (comments):
- Dropped "Previously this file also contained LogLevel DEBUG3..."
history from vm_disk.go's sshdGuestConfig doc
- Dropped "Call sites that previously read vm.Runtime.{PID,...}"
from vm_handles.go; now documents the current contract
- Dropped "Pre-v0.1 the defaults are" scaffolding in doctor_test.go
- Dropped "no longer does its own git inspection" phrasing in vm_run.go
- Dropped the "(also cleans up legacy inline block from pre-opt-in
builds)" aside on the `ssh-config` CLI docstring
- Renamed test var `legacyKey` → `existingKey` in vm_test.go; its
purpose was "pre-existing authorized_keys line," not banger-legacy
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unit + integration tests can't cross machine.Start — the SDK
boundary would need a fake firecracker that reimplements the
control-plane HTTP API, and the ongoing maintenance cost of keeping
that fake honest with upstream kills the value. Instead, add a
pre-release smoke target that drives REAL Firecracker + real KVM,
captures coverage from the -cover-instrumented binaries, and
surfaces per-package deltas so regressions in the boot path don't
ship silently.
scripts/smoke.sh:
- Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't
touch real user state (state/cache persist under build/smoke/xdg
for fast reruns; runtime is mktemp'd fresh per-run because
sockets can't be reused)
- Preflight: `banger doctor` must pass; UDP :42069 must be free
(otherwise the user's real daemon is up and the smoke daemon
can't bind its DNS listener — fail with an actionable message)
- Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok`
exercises create → start → socket ownership chown → machine.Start
→ SDK waitForSocket race → vsock agent readiness → guest SSH
wait → exec → cleanup → delete
- Scenario 2 — workspace: creates a throwaway git repo, runs
`banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`,
verifies the tracked file reached the guest (exercises
workDisk capability PrepareHost + workspace.prepare)
- `banger daemon stop` at the end so instrumented binaries flush
GOCOVERDIR pods before the script exits
Makefile additions:
- smoke-build: builds banger/bangerd under build/smoke/bin/ with
`go build -cover`
- smoke: runs the script with GOCOVERDIR set, reports per-package
coverage via `go tool covdata percent`
- smoke-coverage-html: textfmt + go tool cover for a browsable
report
- smoke-clean: nukes build/smoke/ including the persisted XDG
state
Bonus fix uncovered during the first smoke run: doctor treated a
missing state.db as a FAIL ("out of memory" from SQLite
SQLITE_CANTOPEN), which red-flagged every fresh install. Split
the store check: DB file absent → PASS with "will be created on
first daemon start" detail; DB present but unreadable → FAIL as
before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall
pins the behaviour.
Concrete coverage delta from the first successful smoke run
(compared to `make coverage-total`'s unit-test-only 37.8%):
internal/firecracker 43.6% → 75.0%
internal/daemon/workspace 33.8% → 60.8%
internal/store 40.1% → 56.3%
internal/guest 63.7% → 57.4% (different mix: smoke
exercises real SSH;
unit tests cover more
error branches)
The packages the review flagged are the ones that moved most —
which is the point.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three thematic test files pinning behavior surfaces that had none
before, following the review's recommendation to plug concrete
error/cleanup branches rather than chase a coverage percentage.
doctor_test.go
Covers Daemon.doctorReport end-to-end with a permissive runner +
fake executables on PATH. Pins: store error surfaces as fail,
store success as pass, missing firecracker kills the host-runtime
check, the three default capability feature checks (work disk,
vm dns, nat) are emitted, vm-defaults is always-pass with
provenance. Previously 0% — now the Doctor() command's contract
with the CLI is under guard.
workspace_rejection_test.go
Covers the four early-exit branches of PrepareVMWorkspace that
the existing happy-path + lock-release tests never hit: malformed
mode, --from without --branch, VM not running, VM not found.
Each one returns before any SSH I/O, so the fake-firecracker
infra the happy-path test needs is unnecessary — a bare wired
daemon with a stored VMRecord suffices.
nat_capability_test.go
Covers natCapability.ApplyConfigChange (unchanged flag → no-op,
VM not alive → no-op, toggle on live VM → runner reached) and
natCapability.Cleanup (NAT disabled → no-op, runtime handles
missing → defensive no-op, full wiring → ensureNAT(false)). A
countingRunner + startFakeFirecracker fixture stands in for the
real host plumbing, with waitForVMAlive polling past the
exec -a race window that startFakeFirecracker exposes on
loaded CI boxes.
make coverage-total 37.8% → 38.6%. The number isn't the point —
these tests exist so the next refactor in this area has to
break an explicit assertion to drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>