Prerequisite for `banger update`. Before swapping a staged binary
into place, the updater needs to confirm the new bangerd recognises
the running install's DB schema. Without this, an operator could end
up with a service that won't open its store after the binary swap +
restart.
* store.InspectSchemaState(path): opens the DB read-only (reusing
OpenReadOnly's mode=ro DSN), reads the schema_migrations table,
and classifies the relationship between applied and known IDs:
SchemaCompatible (lockstep), SchemaMigrationsNeeded (binary
newer, will auto-migrate on first Open), or SchemaIncompatible
(DB has applied IDs the binary doesn't know about).
Missing schema_migrations table is treated as "all migrations
pending" rather than an error — matches the fresh-install case.
* bangerd --check-migrations: opens the configured DB read-only,
prints a one-line classification, and exits 0/1/2. The exit
code is the contract:
0 — compatible
1 — migrations needed (binary newer; safe to swap)
2 — incompatible (binary older than DB; abort the swap)
Honours --system to pick between system StateDir and user mode.
* bangerdExit indirection so future tests can capture the exit
code without terminating the test process. Production points
at os.Exit.
Tests cover the four classifications: compatible (fully migrated
DB), migrations-needed (only baseline applied), incompatible
(synthetic id=99 inserted), and missing-table (fresh DB). Live
exercise on this dev host returned `migrations needed: pending [3]
(binary will apply on first Open)` and exit 1, matching the
contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces three interconnected features for persistent VM workflows:
1. `banger vm exec <vm> -- <cmd>`: runs a command in the prepared
workspace, automatically cd-ing into the guest path and wrapping
via `mise exec --` so mise-managed tools are on PATH. Falls back
to a plain exec when mise isn't available. Exit code propagates
verbatim.
2. Workspace persistence: workspace.prepare now stores the guest path,
host source path, and HEAD commit into a new `workspace_json` column
on the vms table (migration 3). This state survives daemon restarts
and informs both dirty-checking and auto-prepare.
3. Dirty detection: `vm exec` compares the stored HEAD commit against
the current host repo HEAD. When stale it warns and, with
--auto-prepare, re-syncs the workspace before running.
Also:
- WORKSPACE column added to `banger ps` / `vm list`
- `banger vm` quick reference updated with `vm exec` entry
The 'docker' bit on model.Image was unused at runtime — every code
path that branched on it had been removed earlier, leaving only the
field, the SQL column, the --docker flag, and the
#feature:docker sentinel that BuildMetadataPackages emitted into a
hash file. None of those have callers anymore.
Strip the field from the model, the API params, the SQLite column,
the CLI flag, and BuildMetadataPackages's signature. Add migration
2 (drop_images_docker) so existing installs lose the column on next
daemon start. ALTER TABLE ... DROP COLUMN is fine: SQLite has
supported it since 3.35 (2021).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
banger hasn't shipped a public release — every "legacy", "pre-opt-in",
"previously", "migration note", "no longer" reference in the tree is
pinning against a state no real user's install has ever been in.
That scaffolding has weight: it's a coordinate system future readers
have to decode, and it keeps dead code alive.
Removed (code):
- internal/daemon/ssh_client_config.go
- vmSSHConfigIncludeBegin / vmSSHConfigIncludeEnd constants and
every `removeManagedBlock(existing, vm...)` call they enabled
(legacy inline `Host *.vm` block scrub)
- cleanupLegacySSHConfigDir (+ its caller in syncVMSSHClientConfig)
— wiped a pre-opt-in sibling file under $ConfigDir/ssh
- sameDirOrParent + resolvePathForComparison — only ever used
by cleanupLegacySSHConfigDir
- the "also check legacy marker" fallback in
UserSSHIncludeInstalled / UninstallUserSSHInclude
- internal/store/migrations.go
- migrateDropDeadImageColumns (migration 2) + its slice entry
- dropColumnIfExists (orphaned after the above)
- addColumnIfMissing + the whole "columns added across the pre-
versioning lifetime" block at the end of migrateBaseline —
subsumed into the baseline CREATE TABLE
- `packages_path TEXT` column on the images table (the
throwaway migration 2 dropped it, but there was never any
reader)
- internal/daemon/vm.go
- vmDNSRecordName local wrapper — was justified as "avoid
pulling vmdns into every file"; three of four callers already
imported vmdns directly, so inline the one stray call
- internal/cli/cli_test.go
- TestLegacyRemovedCommandIsRejected (`tui` subcommand never
shipped)
Removed / simplified (tests):
- ssh_client_config_test.go: dropped TestSameDirOrParentHandlesSymlinks,
TestSyncVMSSHClientConfigPreservesUserKeyInLegacyDir,
TestSyncVMSSHClientConfigNarrowsCleanupToLegacyFile,
TestSyncVMSSHClientConfigLeavesUnexpectedLegacyContents,
TestInstallUserSSHIncludeMigratesLegacyInlineBlock, plus the
"legacy posture" regression strings in the remaining happy-path
test; TestUninstallUserSSHIncludeRemovesBothMarkerBlocks collapsed
to a single-block test
- migrations_test.go: dropped TestMigrateDropDeadImageColumns_AcrossInstallPaths,
TestDropColumnIfExistsIsIdempotent; TestOpenReadOnlyDoesNotRunMigrations
simplified to test against the baseline marker
Removed (docs):
- README.md "**Migration note.**" blockquote about the SSH-key path move
- docs/advanced.md parenthetical "(the old behaviour)"
Reworded (comments):
- Dropped "Previously this file also contained LogLevel DEBUG3..."
history from vm_disk.go's sshdGuestConfig doc
- Dropped "Call sites that previously read vm.Runtime.{PID,...}"
from vm_handles.go; now documents the current contract
- Dropped "Pre-v0.1 the defaults are" scaffolding in doctor_test.go
- Dropped "no longer does its own git inspection" phrasing in vm_run.go
- Dropped the "(also cleans up legacy inline block from pre-opt-in
builds)" aside on the `ssh-config` CLI docstring
- Renamed test var `legacyKey` → `existingKey` in vm_test.go; its
purpose was "pre-existing authorized_keys line," not banger-legacy
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three gaps from the coverage plan, none of which were covered before.
internal/store/migrations_test.go:
TestRunMigrationsIgnoresUnknownAppliedIDs — simulates a DB
written by a newer banger opened by an older one: schema_migrations
carries an id (9001) the current binary doesn't know about. The
runner must leave the alien row alone AND still apply its own
known migrations. Without this, forward-then-backward upgrades or
running two daemon versions against the same state dir would
either fail or start destructively reinterpreting rows.
TestDropColumnIfExistsIsIdempotent — pins the "run twice, no harm"
property. A daemon restart after migration 2 succeeded on a fresh
install must not fail because the column is already gone.
dropColumnIfExists is what makes that idempotent.
internal/store/store_test.go:
TestOpenRejectsCorruptDB — writes garbage to state.db, Open must
error cleanly (not panic, not silently overwrite). Also verifies
the garbage bytes are untouched so the operator can hand the
file to a recovery tool.
TestOpenReadOnlyRejectsMissingDB — the doctor path must not
silently create an empty DB when none exists; that would make
"no VMs yet" and "your state is missing" indistinguishable.
Package function coverage nudged 39.1% → 40.1%.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`banger doctor` used to call store.Open, which unconditionally runs
migrations on the way up. Diagnostics mutating persistent state is a
surprise — particularly now that migration 2 drops a column, so a
plain `doctor` invocation against an old DB would silently schema-
evolve it.
Add store.OpenReadOnly: separate DSN builder with mode=ro and a
minimal pragma set (foreign_keys, busy_timeout — no journal_mode=WAL,
no wal_autocheckpoint), skips runMigrations, and pings on open so a
missing DB fails up front rather than at first query. doctor.go now
uses OpenReadOnly; the existing storeErr fallback path surfaces any
failure as a failing check, unchanged.
Tests pin two invariants:
- OpenReadOnly against a DB whose migration 2 marker was removed and
packages_path re-added must leave both alone (i.e. no drift is
applied behind the user's back).
- Any write attempted through the read-only handle is rejected at
the driver layer (belt-and-braces for future refactors).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three drift items surfaced in review, each dead on arrival and each
worth trusting a little more at v0.1.0.
config: drop MetricsPollInterval. The field was parsed from TOML
(metrics_poll_interval), stored on DaemonConfig, and ignored by every
consumer — only StatsPollInterval drives the background poll loop.
Users setting it in config.toml saw zero effect. Removed from the TOML
surface, the model constant, and the config test.
daemon: delete ensureDefaultImage. No callers, body was `_ = ctx;
return nil`. Dead since whatever flow used to call it got removed.
store: drop packages_path from the images table. The column was
carried by the baseline migration but never referenced by UpsertImage
(no INSERT / UPDATE mention) or any Go model field — a ghost from a
build pipeline that no longer exists. Added migration id=2
(drop_dead_image_columns) with an idempotent dropColumnIfExists
helper: fresh installs run baseline (creates the column) + 2 (drops
it); legacy DBs where the column was never added get a no-op. Updated
the direct-INSERT SQL in TestGetImageRejectsMalformedTimestamp to
drop the column reference, and added a migration test covering both
install paths (fresh + legacy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reserveVM's duplicate-name guard routed through Daemon.FindVM, which
falls back to prefix-matching on both ids and names when no exact
match is found. That turns the uniqueness check into a correctness
bug: a brand-new VM name can be rejected because it happens to
prefix an existing VM's id, or an existing VM's name. So `vm create
--name beta` fails when `beta-sandbox` already exists.
Swap in a dedicated store.GetVMByName that does a literal `WHERE
name = ?` lookup, and use it from reserveVM. FindVM keeps its
prefix-matching behaviour for user-facing lookup paths (`vm ssh
<partial>`, `vm stop <partial>`) where "did you mean" semantics
are the feature.
Tests:
- TestReserveVMAllowsNameThatPrefixesExistingVM — seeds a VM whose
id + name both start with "longname", then reserves two new VMs
named "longname" and "longname-sandbox". Both must succeed.
Under the old FindVM-based check, both would fail.
- TestReserveVMRejectsExactDuplicateName — actual collisions are
still rejected after the swap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old migrate() helper only knew how to re-run a fixed slab of CREATE
TABLE IF NOT EXISTS plus per-column ensureColumnExists calls. That worked
while every schema change was a benign additive column; it falls apart
as soon as we need a data backfill, an index, a rename, or anything that
has to happen exactly once in a known order.
Replaces it with a schema_migrations table + ordered []migration slice.
Each migration has a unique id, a human-readable name, and a func(*Tx)
body; the runner opens a transaction per migration so DDL and any data
changes either both land and get recorded or both roll back together,
leaving the DB in a state where retrying on next Open() reapplies from
the same point.
Migration 1 ("baseline") collapses the current schema into one entry:
fresh databases apply it in one shot; existing dev databases see
idempotent `CREATE TABLE IF NOT EXISTS` + `ALTER TABLE … ADD COLUMN`
statements that succeed as no-ops, and the only net effect is the
schema_migrations row that brings them into the versioned system.
Tests cover fresh apply, idempotent re-open, skipping already-applied
ids, rollback on body error (the transient table the migration created
must not survive), and duplicate-id rejection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts the daemon-managed guest-session machinery (start/list/show/
logs/stop/kill/attach/send). The feature shipped aimed at agent-
orchestration workflows (programmatic stdin piping into a long-lived
guest process) that aren't driving any concrete user today, and the
~2.3K LOC of daemon surface area — attach bridge, FIFO keepalive,
controller registry, sessionstream framing, SQLite persistence — was
locking in an API we'd have to keep through v0.1.0.
Anything session-flavoured that people actually need today can be
done with `vm ssh + tmux` or `vm run -- cmd`.
Deleted:
- internal/cli/commands_vm_session.go
- internal/daemon/{guest_sessions,session_lifecycle,session_attach,session_stream,session_controller}.go
- internal/daemon/session/ (guest-session helpers package)
- internal/sessionstream/ (framing package)
- internal/daemon/guest_sessions_test.go
- internal/store/guest_session_test.go
- GuestSession* types from internal/{api,model}
- Store UpsertGuestSession/GetGuestSession/ListGuestSessionsByVM/DeleteGuestSession + scanner helpers
- guest.session.* RPC dispatch entries
- 5 CLI session tests, 2 completion tests, 2 printer tests
Extracted:
- ShellQuote + FormatStepError lifted to internal/daemon/workspace/util.go
(only non-session consumer); workspace package now self-contained
- internal/daemon/guest_ssh.go keeps guestSSHClient + dialGuest +
waitForGuestSSH — still used by workspace prepare/export
- internal/daemon/fake_firecracker_test.go preserves the test helper
that used to live in guest_sessions_test.go
Store schema: CREATE TABLE guest_sessions and its column migrations
removed. Existing dev DBs keep an orphan table (harmless, pre-v0.1.0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.
Why:
Persisting ephemeral process handles to SQLite forced reconcile to
treat "running with a stale PID" as a first-class case and mix it
with real state transitions. The schema described what we last
observed, not what the VM is. Every time the observation model
shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
grew a new branch. Splitting lets each layer own what it's good at:
durable records describe intent, in-memory cache + scratch file
describe momentary reality.
Shape:
- `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
DMDev. Never in SQLite.
- `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
WorkDiskPath, LastError. All durable or deterministic.
- `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
plumbing (`writeHandlesFile` / `readHandlesFile` /
`rediscoverHandles`). See `internal/daemon/vm_handles.go`.
- `d.vmAlive(vm)` replaces the 20+ inline
`vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
spreads. Single source of truth for liveness.
- Startup reconcile: per running VM, load the scratch file, pgrep
the api sock, either keep (cache seeded from scratch) or demote
to stopped (scratch handles passed to cleanupRuntime first so DM
/ loops / tap actually get torn down).
Verification:
- `go test ./...` green.
- Live: `banger vm run --name handles-test -- cat /etc/hostname`
starts; `handles.json` appears in VMDir with the expected PID,
tap, loops, DM.
- `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
CLI, daemon auto-starts, reconcile recognises the VM as alive,
`banger vm ssh` still connects, `banger vm delete` cleans up.
Tests added:
- vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
behaviour, cache concurrency, rediscoverHandles prefers pgrep
over scratch, returns scratch contents even when process is
dead (so cleanup can tear down kernel state).
- vm_test.go: reconcile test rewritten to exercise the new flow
(write scratch → reconcile reads it → verifies process is gone →
issues dmsetup/losetup teardown).
ARCHITECTURE.md updated; `handles` added to Daemon field docs.
Add daemon-backed workspace and guest-session primitives so host
orchestrators can prepare /root/repo, launch long-lived guest commands,
and attach to pipe-mode sessions over the local stdio mux bridge.
Persist richer session metadata and launch diagnostics, preflight guest
cwd/command requirements, make pipe-mode attach rehydratable from guest
state after daemon restart, and allow submodules when workspace prepare
runs in full_copy mode.
At the same time, stop vm run from auto-attaching opencode, make it
print next-step commands instead, and make glibc guest images more
agent-ready by installing node, opencode, claude, and pi while syncing
opencode/claude/pi auth files into work disks on VM start.
Validation:
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
- banger vm workspace prepare --help
- banger vm session --help
- banger vm session start --help
- banger vm session attach --help
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.
Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.
Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.
Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.
Beat VM create wall time without changing VM semantics.
Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly.
Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks.
Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".
Multi-VM delete exposed two separate regressions: NAT teardown was still running after stopped VMs had already dropped their tap metadata, and the store was relying on one-off SQLite pragmas instead of configuring every pooled connection.
Skip NAT cleanup when the runtime no longer has the network handles needed to identify rules, and move the SQLite profile into the DSN so WAL, busy timeouts, foreign keys, and the other connection-scoped settings apply consistently across the pool. Keep the write mutex in place for concurrent mutations, and update the daemon/store tests to use valid image fixtures now that foreign key enforcement is real.
Validated with go test ./... and make build.
Dangerous lifecycle, store, system, and RPC paths still had little or no automated confidence, and the live smoke harness failed opaquely when guest boot timing drifted. This adds targeted unit coverage for store allocation and decode failures, system helper failure ordering and cleanup, RPC error handling, and daemon lookup/reconcile/editing/stats/preflight edge cases.
It also makes verify.sh wait for daemon-observable VM readiness before SSH, reuse a bounded boot deadline for the SSH phase, and dump VM metadata, logs, tap state, socket state, and NAT rules on timeout so host-level failures are diagnosable instead of surfacing only connection refused.
Validation: go test ./..., go test ./... -cover, bash -n verify.sh. No live ./verify.sh boot was run in this environment.
Replace the shell-only user workflow with `banger` and `bangerd`: Cobra commands, XDG/SQLite-backed state, managed VM and image lifecycle, and a Bubble Tea TUI for browsing and operating VMs.\n\nKeep Firecracker orchestration behind the daemon so VM specs become persistent objects, and add repo entrypoints for building, installing, and documenting the new flow while still delegating rootfs customization to the existing shell tooling.\n\nHarden the control plane around real usage by reclaiming Firecracker API sockets for the user, restarting stale daemons after rebuilds, and returning the correct `vm.create` payload so the CLI and TUI creation flow work reliably.\n\nValidation: `go test ./...`, `make build`, and a host-side smoke test with `./banger vm create --name codex-smoke`.