banger

Author	SHA1	Message	Date
Thales Maciel	fae28e3d8b	update: docs + publish script for the self-update feature README gets a top-level Updating section; docs/privileges.md gains a step-by-step trust-model writeup of `banger update`. The new scripts/publish-banger-release.sh drives the manual release cut: build, tar, sha256sum, cosign sign-blob, verify against the embedded public key, jq-merge into manifest.json, rclone upload to the R2 bucket. Refuses outright if the embedded key is still the placeholder so we can't accidentally publish an unverifiable release. Also folds in gofmt drift accumulated across the updater package and a few sibling files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 12:43:46 -03:00
Thales Maciel	775525b592	cli,doctor: --version flag + CLI/install drift check Two pre-release polish items on the version-display surface. * --version on both binaries: cobra's Version field on the banger and bangerd roots renders a one-line summary (banger v0.1.0 (commit abcd1234, built 2026-04-28T20:45:50Z)). The SetVersionTemplate override drops cobra's "{{.Name}} version" prefix — our string is already a complete sentence. The multi-line `banger version` subcommand is unchanged for callers that want the full SHA / built_at on separate lines. * Doctor "banger version" row: prints the running CLI's version + short commit + built-at, plus what /etc/banger/install.toml recorded at install time. Disagreement is the most common version-skew pitfall (stale CLI against fresh daemon, or vice versa) and a one-line warn is friendlier than tracking that down from a launch failure. Drift detection is suppressed when either side is dev/unknown (untagged build) — comparing a dev CLI against a tagged install is the developer-machine case, not a real problem. formatVersionLine is in internal/cli (banger.go) and reused by bangerd.go via a strings.Replace because bangerd's version line should say "bangerd" not "banger". Slightly tilt-feeling but cheaper than parameterising the helper for one caller. Tests: TestVersionsDriftToleratesDevAndUnknown pins the four branches (match, version diff, commit diff, dev-suppression). The existing version-format test already runs through formatVersionLine indirectly. Live exercise: $ banger --version banger dev (commit `1c1ca7d6`, built 2026-04-28T20:52:33Z) $ bangerd --version bangerd dev (commit `1c1ca7d6`, built 2026-04-28T20:52:33Z) $ banger doctor \| head ... PASS banger version - CLI dev (commit `1c1ca7d6`, built 2026-04-28T20:52:33Z) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:53:32 -03:00
Thales Maciel	1c1ca7d6a4	doctor: pin firecracker version range, distro-aware install hint Pre-release polish: be explicit about which firecracker versions banger has been validated against, and give users a one-line install suggestion when the binary is missing rather than the previous generic "install firecracker or set firecracker_bin". internal/firecracker/version.go (new): * MinSupportedVersion = "1.5.0" — the floor banger refuses to launch below. Bumping this is a deliberate decision, paired with whatever helper feature started requiring the newer firecracker. * KnownTestedVersion = "1.14.1" — what banger's smoke suite actually runs against today. * SemVer + Compare + ParseVersionOutput, table-tested. The parser tolerates the trailing "exiting successfully" log line that firecracker tacks onto --version; only the canonical "Firecracker vX.Y.Z" line matters. * QueryVersion shells `<bin> --version` through a CommandRunner- shaped interface; doesn't import internal/system to keep the firecracker package leaf-clean. internal/daemon/doctor.go: * New addFirecrackerVersionCheck replaces the previous bare RequireExecutable preflight for firecracker. Three outcomes: PASS within [Min, Tested], WARN above Tested (newer firecracker usually works but is outside the tested window), FAIL below Min or when the binary is missing. * On missing binary, surfaces a distro-aware install command via parseOSReleaseIDs(/etc/os-release) → guessFirecrackerInstall Command. Pinned suggestions for debian (apt), arch/manjaro (paru), and nixos (nix-env). Other distros get only the upstream Releases URL — guessing wrong sends users on a wild goose chase. * runtimeChecks no longer includes the firecracker preflight; the new check subsumes it. README.md: * Requirements line now spells out the tested-against version (v1.14.1) and the supported floor (≥ v1.5.0), and points at `banger doctor` for the version check + install hint. Tests: ParseVersionOutput across canonical/prerelease/garbage inputs, SemVer.Compare across major/minor/patch boundaries, MustParseSemVer panics on malformed inputs. Doctor-side: PASS on tested version, FAIL below Min, WARN above Tested, FAIL with upstream URL when missing, install-hint dispatch table covering debian/ubuntu (via ID_LIKE)/arch/manjaro/nixos/fedora-fallback/missing-os-release. The renamed TestDoctorReport_MissingFirecrackerFails... now asserts against the new check name. Live `banger doctor` reports "v1.14.1 at /usr/bin/firecracker (within tested range; min v1.5.0, tested v1.14.1)" against the smoke host. Smoke bare_run still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 17:47:42 -03:00
Thales Maciel	3805b093b4	roothelper: tie kill/signal authorization to banger-launched firecracker validateFirecrackerPID was a substring check on /proc/<pid>/cmdline: "contains 'firecracker'". Good enough to refuse init/sshd/the test binary, but on a shared host where multiple users run firecracker the helper would happily SIGKILL someone else's VM. The owner-UID daemon could weaponise the helper as an arbitrary "kill any firecracker on this box" primitive. Replace the substring gate with two stronger acceptance modes: * Cgroup match (the supported path): /proc/<pid>/cgroup contains bangerd-root.service. systemd assigns every direct child of the helper unit into that cgroup at fork; the kernel keeps it there for the process's lifetime, so no daemon-UID code can forge it. Other users' firecracker processes live in different cgroups (user@<uid>.service, foreign service slices) and fail this check. Also robust across helper restarts: KillMode=control-group on the unit kills children when the service goes down, so an "orphan banger firecracker in some other cgroup" is rare by construction. * --api-sock fallback: cmdline carries `--api-sock <path>` with the path under banger's RuntimeDir. Covers the legacy direct (no-jailer) launch path, and gives daemon reconcile a way to clean up the rare orphan that lands outside the service cgroup after a hard helper crash. Tried /proc/<pid>/root first — pivot_root semantics make jailer'd firecracker read its root as "/" from any namespace, so the symlink is useless as a banger-managed fingerprint. Cgroup is the right signal. Also added a signal allowlist: priv.signal_process now rejects anything outside {TERM, KILL, INT, HUP, QUIT, USR1, USR2, ABRT} (case-insensitive, with or without SIG prefix). STOP/CONT, real-time signals, and numeric forms are refused — the helper running as root must not be a generic "send arbitrary signal to my pid" primitive. priv.kill_process is unaffected (it always sends KILL). Tests: validateSignalName covers allowlist + numeric/STOP/RTMIN rejection; extractFirecrackerAPISock pins the three flag forms (--api-sock VAL, --api-sock=VAL, -a VAL); pathIsUnder gets a small table; existing TestValidateFirecrackerPID still rejects PID 0, PID 1, and the test process itself. Doctor's non-system-mode test gained a t.TempDir-backed install path so it stops being environment-dependent on machines that happen to have /etc/banger/install.toml. Smoke at JOBS=4 still green — every banger-launched firecracker sails through the cgroup match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 16:00:41 -03:00
Thales Maciel	3e6d0cee89	doctor: surface security-posture drift in `banger doctor` `docs/privileges.md` now documents what the install promises (helper + daemon services active, sockets at 0600 ownerUID, units carrying the hardening directives, firecracker root-owned + non-writable). Doctor verifies the running install matches: drift between the doc and the filesystem would silently weaken the trust model otherwise. In system mode (install.toml present): * helper service / owner daemon service: `systemctl is-active`. * helper socket / daemon socket: stat-and-compare mode + uid against the registered owner. * helper unit hardening / daemon unit hardening: scan the rendered unit for NoNewPrivileges, ProtectSystem=strict, ProtectHome (=yes for the helper, =read-only for the daemon), RestrictSUIDSGID, LockPersonality, and the helper's CapabilityBoundingSet line. The daemon unit also pins User=<registered owner>. * firecracker binary ownership: regular file, not a symlink, mode not group/world writable, executable, owned by uid 0 — same constraints validateRootExecutable enforces at launch, surfaced once at doctor time so a misconfigured binary fails fast with a clearer error than the helper's open-time rejection. In non-system mode (no /etc/banger/install.toml) doctor emits a single WARN row pointing at docs/privileges.md > 'Running outside the system install'. A PASS would imply guarantees the install isn't actually providing. Tests cover both branches: the non-system warn pins its message substrings; system-mode pins that every check name shows up; and the helpers (socket-perms, unit-hardening, executable-ownership) have direct table-style negative tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:58:34 -03:00
Thales Maciel	700a1e6e60	cleanup: drop pre-v0.1 migration scaffolding + legacy-behavior refs banger hasn't shipped a public release — every "legacy", "pre-opt-in", "previously", "migration note", "no longer" reference in the tree is pinning against a state no real user's install has ever been in. That scaffolding has weight: it's a coordinate system future readers have to decode, and it keeps dead code alive. Removed (code): - internal/daemon/ssh_client_config.go - vmSSHConfigIncludeBegin / vmSSHConfigIncludeEnd constants and every `removeManagedBlock(existing, vm...)` call they enabled (legacy inline `Host .vm` block scrub) - cleanupLegacySSHConfigDir (+ its caller in syncVMSSHClientConfig) — wiped a pre-opt-in sibling file under $ConfigDir/ssh - sameDirOrParent + resolvePathForComparison — only ever used by cleanupLegacySSHConfigDir - the "also check legacy marker" fallback in UserSSHIncludeInstalled / UninstallUserSSHInclude - internal/store/migrations.go - migrateDropDeadImageColumns (migration 2) + its slice entry - dropColumnIfExists (orphaned after the above) - addColumnIfMissing + the whole "columns added across the pre- versioning lifetime" block at the end of migrateBaseline — subsumed into the baseline CREATE TABLE - `packages_path TEXT` column on the images table (the throwaway migration 2 dropped it, but there was never any reader) - internal/daemon/vm.go - vmDNSRecordName local wrapper — was justified as "avoid pulling vmdns into every file"; three of four callers already imported vmdns directly, so inline the one stray call - internal/cli/cli_test.go - TestLegacyRemovedCommandIsRejected (`tui` subcommand never shipped) Removed / simplified (tests): - ssh_client_config_test.go: dropped TestSameDirOrParentHandlesSymlinks, TestSyncVMSSHClientConfigPreservesUserKeyInLegacyDir, TestSyncVMSSHClientConfigNarrowsCleanupToLegacyFile, TestSyncVMSSHClientConfigLeavesUnexpectedLegacyContents, TestInstallUserSSHIncludeMigratesLegacyInlineBlock, plus the "legacy posture" regression strings in the remaining happy-path test; TestUninstallUserSSHIncludeRemovesBothMarkerBlocks collapsed to a single-block test - migrations_test.go: dropped TestMigrateDropDeadImageColumns_AcrossInstallPaths, TestDropColumnIfExistsIsIdempotent; TestOpenReadOnlyDoesNotRunMigrations simplified to test against the baseline marker Removed (docs): - README.md "Migration note.*" blockquote about the SSH-key path move - docs/advanced.md parenthetical "(the old behaviour)" Reworded (comments): - Dropped "Previously this file also contained LogLevel DEBUG3..." history from vm_disk.go's sshdGuestConfig doc - Dropped "Call sites that previously read vm.Runtime.{PID,...}" from vm_handles.go; now documents the current contract - Dropped "Pre-v0.1 the defaults are" scaffolding in doctor_test.go - Dropped "no longer does its own git inspection" phrasing in vm_run.go - Dropped the "(also cleans up legacy inline block from pre-opt-in builds)" aside on the `ssh-config` CLI docstring - Renamed test var `legacyKey` → `existingKey` in vm_test.go; its purpose was "pre-existing authorized_keys line," not banger-legacy Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 13:56:32 -03:00
Thales Maciel	5f81332b0a	make smoke: end-to-end boot suite with coverage from real VM runs The unit + integration tests can't cross machine.Start — the SDK boundary would need a fake firecracker that reimplements the control-plane HTTP API, and the ongoing maintenance cost of keeping that fake honest with upstream kills the value. Instead, add a pre-release smoke target that drives REAL Firecracker + real KVM, captures coverage from the -cover-instrumented binaries, and surfaces per-package deltas so regressions in the boot path don't ship silently. scripts/smoke.sh: - Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't touch real user state (state/cache persist under build/smoke/xdg for fast reruns; runtime is mktemp'd fresh per-run because sockets can't be reused) - Preflight: `banger doctor` must pass; UDP :42069 must be free (otherwise the user's real daemon is up and the smoke daemon can't bind its DNS listener — fail with an actionable message) - Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok` exercises create → start → socket ownership chown → machine.Start → SDK waitForSocket race → vsock agent readiness → guest SSH wait → exec → cleanup → delete - Scenario 2 — workspace: creates a throwaway git repo, runs `banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`, verifies the tracked file reached the guest (exercises workDisk capability PrepareHost + workspace.prepare) - `banger daemon stop` at the end so instrumented binaries flush GOCOVERDIR pods before the script exits Makefile additions: - smoke-build: builds banger/bangerd under build/smoke/bin/ with `go build -cover` - smoke: runs the script with GOCOVERDIR set, reports per-package coverage via `go tool covdata percent` - smoke-coverage-html: textfmt + go tool cover for a browsable report - smoke-clean: nukes build/smoke/ including the persisted XDG state Bonus fix uncovered during the first smoke run: doctor treated a missing state.db as a FAIL ("out of memory" from SQLite SQLITE_CANTOPEN), which red-flagged every fresh install. Split the store check: DB file absent → PASS with "will be created on first daemon start" detail; DB present but unreadable → FAIL as before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall pins the behaviour. Concrete coverage delta from the first successful smoke run (compared to `make coverage-total`'s unit-test-only 37.8%): internal/firecracker 43.6% → 75.0% internal/daemon/workspace 33.8% → 60.8% internal/store 40.1% → 56.3% internal/guest 63.7% → 57.4% (different mix: smoke exercises real SSH; unit tests cover more error branches) The packages the review flagged are the ones that moved most — which is the point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:59:57 -03:00
Thales Maciel	88bc466d58	tests: targeted coverage for doctor, workspace rejections, and nat capability Three thematic test files pinning behavior surfaces that had none before, following the review's recommendation to plug concrete error/cleanup branches rather than chase a coverage percentage. doctor_test.go Covers Daemon.doctorReport end-to-end with a permissive runner + fake executables on PATH. Pins: store error surfaces as fail, store success as pass, missing firecracker kills the host-runtime check, the three default capability feature checks (work disk, vm dns, nat) are emitted, vm-defaults is always-pass with provenance. Previously 0% — now the Doctor() command's contract with the CLI is under guard. workspace_rejection_test.go Covers the four early-exit branches of PrepareVMWorkspace that the existing happy-path + lock-release tests never hit: malformed mode, --from without --branch, VM not running, VM not found. Each one returns before any SSH I/O, so the fake-firecracker infra the happy-path test needs is unnecessary — a bare wired daemon with a stored VMRecord suffices. nat_capability_test.go Covers natCapability.ApplyConfigChange (unchanged flag → no-op, VM not alive → no-op, toggle on live VM → runner reached) and natCapability.Cleanup (NAT disabled → no-op, runtime handles missing → defensive no-op, full wiring → ensureNAT(false)). A countingRunner + startFakeFirecracker fixture stands in for the real host plumbing, with waitForVMAlive polling past the exec -a race window that startFakeFirecracker exposes on loaded CI boxes. make coverage-total 37.8% → 38.6%. The number isn't the point — these tests exist so the next refactor in this area has to break an explicit assertion to drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 12:58:12 -03:00

8 commits