banger hasn't shipped a public release — every "legacy", "pre-opt-in",
"previously", "migration note", "no longer" reference in the tree is
pinning against a state no real user's install has ever been in.
That scaffolding has weight: it's a coordinate system future readers
have to decode, and it keeps dead code alive.
Removed (code):
- internal/daemon/ssh_client_config.go
- vmSSHConfigIncludeBegin / vmSSHConfigIncludeEnd constants and
every `removeManagedBlock(existing, vm...)` call they enabled
(legacy inline `Host *.vm` block scrub)
- cleanupLegacySSHConfigDir (+ its caller in syncVMSSHClientConfig)
— wiped a pre-opt-in sibling file under $ConfigDir/ssh
- sameDirOrParent + resolvePathForComparison — only ever used
by cleanupLegacySSHConfigDir
- the "also check legacy marker" fallback in
UserSSHIncludeInstalled / UninstallUserSSHInclude
- internal/store/migrations.go
- migrateDropDeadImageColumns (migration 2) + its slice entry
- dropColumnIfExists (orphaned after the above)
- addColumnIfMissing + the whole "columns added across the pre-
versioning lifetime" block at the end of migrateBaseline —
subsumed into the baseline CREATE TABLE
- `packages_path TEXT` column on the images table (the
throwaway migration 2 dropped it, but there was never any
reader)
- internal/daemon/vm.go
- vmDNSRecordName local wrapper — was justified as "avoid
pulling vmdns into every file"; three of four callers already
imported vmdns directly, so inline the one stray call
- internal/cli/cli_test.go
- TestLegacyRemovedCommandIsRejected (`tui` subcommand never
shipped)
Removed / simplified (tests):
- ssh_client_config_test.go: dropped TestSameDirOrParentHandlesSymlinks,
TestSyncVMSSHClientConfigPreservesUserKeyInLegacyDir,
TestSyncVMSSHClientConfigNarrowsCleanupToLegacyFile,
TestSyncVMSSHClientConfigLeavesUnexpectedLegacyContents,
TestInstallUserSSHIncludeMigratesLegacyInlineBlock, plus the
"legacy posture" regression strings in the remaining happy-path
test; TestUninstallUserSSHIncludeRemovesBothMarkerBlocks collapsed
to a single-block test
- migrations_test.go: dropped TestMigrateDropDeadImageColumns_AcrossInstallPaths,
TestDropColumnIfExistsIsIdempotent; TestOpenReadOnlyDoesNotRunMigrations
simplified to test against the baseline marker
Removed (docs):
- README.md "**Migration note.**" blockquote about the SSH-key path move
- docs/advanced.md parenthetical "(the old behaviour)"
Reworded (comments):
- Dropped "Previously this file also contained LogLevel DEBUG3..."
history from vm_disk.go's sshdGuestConfig doc
- Dropped "Call sites that previously read vm.Runtime.{PID,...}"
from vm_handles.go; now documents the current contract
- Dropped "Pre-v0.1 the defaults are" scaffolding in doctor_test.go
- Dropped "no longer does its own git inspection" phrasing in vm_run.go
- Dropped the "(also cleans up legacy inline block from pre-opt-in
builds)" aside on the `ssh-config` CLI docstring
- Renamed test var `legacyKey` → `existingKey` in vm_test.go; its
purpose was "pre-existing authorized_keys line," not banger-legacy
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three gaps from the coverage plan, none of which were covered before.
internal/store/migrations_test.go:
TestRunMigrationsIgnoresUnknownAppliedIDs — simulates a DB
written by a newer banger opened by an older one: schema_migrations
carries an id (9001) the current binary doesn't know about. The
runner must leave the alien row alone AND still apply its own
known migrations. Without this, forward-then-backward upgrades or
running two daemon versions against the same state dir would
either fail or start destructively reinterpreting rows.
TestDropColumnIfExistsIsIdempotent — pins the "run twice, no harm"
property. A daemon restart after migration 2 succeeded on a fresh
install must not fail because the column is already gone.
dropColumnIfExists is what makes that idempotent.
internal/store/store_test.go:
TestOpenRejectsCorruptDB — writes garbage to state.db, Open must
error cleanly (not panic, not silently overwrite). Also verifies
the garbage bytes are untouched so the operator can hand the
file to a recovery tool.
TestOpenReadOnlyRejectsMissingDB — the doctor path must not
silently create an empty DB when none exists; that would make
"no VMs yet" and "your state is missing" indistinguishable.
Package function coverage nudged 39.1% → 40.1%.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`banger doctor` used to call store.Open, which unconditionally runs
migrations on the way up. Diagnostics mutating persistent state is a
surprise — particularly now that migration 2 drops a column, so a
plain `doctor` invocation against an old DB would silently schema-
evolve it.
Add store.OpenReadOnly: separate DSN builder with mode=ro and a
minimal pragma set (foreign_keys, busy_timeout — no journal_mode=WAL,
no wal_autocheckpoint), skips runMigrations, and pings on open so a
missing DB fails up front rather than at first query. doctor.go now
uses OpenReadOnly; the existing storeErr fallback path surfaces any
failure as a failing check, unchanged.
Tests pin two invariants:
- OpenReadOnly against a DB whose migration 2 marker was removed and
packages_path re-added must leave both alone (i.e. no drift is
applied behind the user's back).
- Any write attempted through the read-only handle is rejected at
the driver layer (belt-and-braces for future refactors).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three drift items surfaced in review, each dead on arrival and each
worth trusting a little more at v0.1.0.
config: drop MetricsPollInterval. The field was parsed from TOML
(metrics_poll_interval), stored on DaemonConfig, and ignored by every
consumer — only StatsPollInterval drives the background poll loop.
Users setting it in config.toml saw zero effect. Removed from the TOML
surface, the model constant, and the config test.
daemon: delete ensureDefaultImage. No callers, body was `_ = ctx;
return nil`. Dead since whatever flow used to call it got removed.
store: drop packages_path from the images table. The column was
carried by the baseline migration but never referenced by UpsertImage
(no INSERT / UPDATE mention) or any Go model field — a ghost from a
build pipeline that no longer exists. Added migration id=2
(drop_dead_image_columns) with an idempotent dropColumnIfExists
helper: fresh installs run baseline (creates the column) + 2 (drops
it); legacy DBs where the column was never added get a no-op. Updated
the direct-INSERT SQL in TestGetImageRejectsMalformedTimestamp to
drop the column reference, and added a migration test covering both
install paths (fresh + legacy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reserveVM's duplicate-name guard routed through Daemon.FindVM, which
falls back to prefix-matching on both ids and names when no exact
match is found. That turns the uniqueness check into a correctness
bug: a brand-new VM name can be rejected because it happens to
prefix an existing VM's id, or an existing VM's name. So `vm create
--name beta` fails when `beta-sandbox` already exists.
Swap in a dedicated store.GetVMByName that does a literal `WHERE
name = ?` lookup, and use it from reserveVM. FindVM keeps its
prefix-matching behaviour for user-facing lookup paths (`vm ssh
<partial>`, `vm stop <partial>`) where "did you mean" semantics
are the feature.
Tests:
- TestReserveVMAllowsNameThatPrefixesExistingVM — seeds a VM whose
id + name both start with "longname", then reserves two new VMs
named "longname" and "longname-sandbox". Both must succeed.
Under the old FindVM-based check, both would fail.
- TestReserveVMRejectsExactDuplicateName — actual collisions are
still rejected after the swap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old migrate() helper only knew how to re-run a fixed slab of CREATE
TABLE IF NOT EXISTS plus per-column ensureColumnExists calls. That worked
while every schema change was a benign additive column; it falls apart
as soon as we need a data backfill, an index, a rename, or anything that
has to happen exactly once in a known order.
Replaces it with a schema_migrations table + ordered []migration slice.
Each migration has a unique id, a human-readable name, and a func(*Tx)
body; the runner opens a transaction per migration so DDL and any data
changes either both land and get recorded or both roll back together,
leaving the DB in a state where retrying on next Open() reapplies from
the same point.
Migration 1 ("baseline") collapses the current schema into one entry:
fresh databases apply it in one shot; existing dev databases see
idempotent `CREATE TABLE IF NOT EXISTS` + `ALTER TABLE … ADD COLUMN`
statements that succeed as no-ops, and the only net effect is the
schema_migrations row that brings them into the versioned system.
Tests cover fresh apply, idempotent re-open, skipping already-applied
ids, rollback on body error (the transient table the migration created
must not survive), and duplicate-id rejection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cuts the daemon-managed guest-session machinery (start/list/show/
logs/stop/kill/attach/send). The feature shipped aimed at agent-
orchestration workflows (programmatic stdin piping into a long-lived
guest process) that aren't driving any concrete user today, and the
~2.3K LOC of daemon surface area — attach bridge, FIFO keepalive,
controller registry, sessionstream framing, SQLite persistence — was
locking in an API we'd have to keep through v0.1.0.
Anything session-flavoured that people actually need today can be
done with `vm ssh + tmux` or `vm run -- cmd`.
Deleted:
- internal/cli/commands_vm_session.go
- internal/daemon/{guest_sessions,session_lifecycle,session_attach,session_stream,session_controller}.go
- internal/daemon/session/ (guest-session helpers package)
- internal/sessionstream/ (framing package)
- internal/daemon/guest_sessions_test.go
- internal/store/guest_session_test.go
- GuestSession* types from internal/{api,model}
- Store UpsertGuestSession/GetGuestSession/ListGuestSessionsByVM/DeleteGuestSession + scanner helpers
- guest.session.* RPC dispatch entries
- 5 CLI session tests, 2 completion tests, 2 printer tests
Extracted:
- ShellQuote + FormatStepError lifted to internal/daemon/workspace/util.go
(only non-session consumer); workspace package now self-contained
- internal/daemon/guest_ssh.go keeps guestSSHClient + dialGuest +
waitForGuestSSH — still used by workspace prepare/export
- internal/daemon/fake_firecracker_test.go preserves the test helper
that used to live in guest_sessions_test.go
Store schema: CREATE TABLE guest_sessions and its column migrations
removed. Existing dev DBs keep an orphan table (harmless, pre-v0.1.0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.
Why:
Persisting ephemeral process handles to SQLite forced reconcile to
treat "running with a stale PID" as a first-class case and mix it
with real state transitions. The schema described what we last
observed, not what the VM is. Every time the observation model
shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
grew a new branch. Splitting lets each layer own what it's good at:
durable records describe intent, in-memory cache + scratch file
describe momentary reality.
Shape:
- `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
DMDev. Never in SQLite.
- `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
WorkDiskPath, LastError. All durable or deterministic.
- `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
plumbing (`writeHandlesFile` / `readHandlesFile` /
`rediscoverHandles`). See `internal/daemon/vm_handles.go`.
- `d.vmAlive(vm)` replaces the 20+ inline
`vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
spreads. Single source of truth for liveness.
- Startup reconcile: per running VM, load the scratch file, pgrep
the api sock, either keep (cache seeded from scratch) or demote
to stopped (scratch handles passed to cleanupRuntime first so DM
/ loops / tap actually get torn down).
Verification:
- `go test ./...` green.
- Live: `banger vm run --name handles-test -- cat /etc/hostname`
starts; `handles.json` appears in VMDir with the expected PID,
tap, loops, DM.
- `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
CLI, daemon auto-starts, reconcile recognises the VM as alive,
`banger vm ssh` still connects, `banger vm delete` cleans up.
Tests added:
- vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
behaviour, cache concurrency, rediscoverHandles prefers pgrep
over scratch, returns scratch contents even when process is
dead (so cleanup can tear down kernel state).
- vm_test.go: reconcile test rewritten to exercise the new flow
(write scratch → reconcile reads it → verifies process is gone →
issues dmsetup/losetup teardown).
ARCHITECTURE.md updated; `handles` added to Daemon field docs.
Add daemon-backed workspace and guest-session primitives so host
orchestrators can prepare /root/repo, launch long-lived guest commands,
and attach to pipe-mode sessions over the local stdio mux bridge.
Persist richer session metadata and launch diagnostics, preflight guest
cwd/command requirements, make pipe-mode attach rehydratable from guest
state after daemon restart, and allow submodules when workspace prepare
runs in full_copy mode.
At the same time, stop vm run from auto-attaching opencode, make it
print next-step commands instead, and make glibc guest images more
agent-ready by installing node, opencode, claude, and pi while syncing
opencode/claude/pi auth files into work disks on VM start.
Validation:
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
- banger vm workspace prepare --help
- banger vm session --help
- banger vm session start --help
- banger vm session attach --help
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.
Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.
Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.
Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.
Beat VM create wall time without changing VM semantics.
Generate a work-seed ext4 sidecar during image builds and rootfs rebuilds, then clone and resize that seed for each new VM instead of rebuilding /root from scratch. Plumb the new seed artifact through config, runtime metadata, store state, runtime-bundle defaults, doctor checks, and default-image reconciliation so older images still fall back cleanly.
Add a daemon TAP pool to keep idle bridge-attached devices warm, expose stage timing in lifecycle logs, add a create/SSH benchmark script plus Make target, and teach verify.sh that tap-pool-* devices are reusable capacity rather than cleanup leaks.
Validated with go test ./..., make build, ./verify.sh, and make bench-create ARGS="--runs 2".
Multi-VM delete exposed two separate regressions: NAT teardown was still running after stopped VMs had already dropped their tap metadata, and the store was relying on one-off SQLite pragmas instead of configuring every pooled connection.
Skip NAT cleanup when the runtime no longer has the network handles needed to identify rules, and move the SQLite profile into the DSN so WAL, busy timeouts, foreign keys, and the other connection-scoped settings apply consistently across the pool. Keep the write mutex in place for concurrent mutations, and update the daemon/store tests to use valid image fixtures now that foreign key enforcement is real.
Validated with go test ./... and make build.
Dangerous lifecycle, store, system, and RPC paths still had little or no automated confidence, and the live smoke harness failed opaquely when guest boot timing drifted. This adds targeted unit coverage for store allocation and decode failures, system helper failure ordering and cleanup, RPC error handling, and daemon lookup/reconcile/editing/stats/preflight edge cases.
It also makes verify.sh wait for daemon-observable VM readiness before SSH, reuse a bounded boot deadline for the SSH phase, and dump VM metadata, logs, tap state, socket state, and NAT rules on timeout so host-level failures are diagnosable instead of surfacing only connection refused.
Validation: go test ./..., go test ./... -cover, bash -n verify.sh. No live ./verify.sh boot was run in this environment.
Replace the shell-only user workflow with `banger` and `bangerd`: Cobra commands, XDG/SQLite-backed state, managed VM and image lifecycle, and a Bubble Tea TUI for browsing and operating VMs.\n\nKeep Firecracker orchestration behind the daemon so VM specs become persistent objects, and add repo entrypoints for building, installing, and documenting the new flow while still delegating rootfs customization to the existing shell tooling.\n\nHarden the control plane around real usage by reclaiming Firecracker API sockets for the user, restarting stale daemons after rebuilds, and returning the correct `vm.create` payload so the CLI and TUI creation flow work reliably.\n\nValidation: `go test ./...`, `make build`, and a host-side smoke test with `./banger vm create --name codex-smoke`.