Commit graph

15 commits

Author SHA1 Message Date
59e48e830b
daemon: split owner daemon from root helper
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.

Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.

Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.

Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
2026-04-26 12:43:17 -03:00
5f81332b0a
make smoke: end-to-end boot suite with coverage from real VM runs
The unit + integration tests can't cross machine.Start — the SDK
boundary would need a fake firecracker that reimplements the
control-plane HTTP API, and the ongoing maintenance cost of keeping
that fake honest with upstream kills the value. Instead, add a
pre-release smoke target that drives REAL Firecracker + real KVM,
captures coverage from the -cover-instrumented binaries, and
surfaces per-package deltas so regressions in the boot path don't
ship silently.

scripts/smoke.sh:
  - Isolated XDG_{CONFIG,STATE,CACHE,RUNTIME} so the smoke run can't
    touch real user state (state/cache persist under build/smoke/xdg
    for fast reruns; runtime is mktemp'd fresh per-run because
    sockets can't be reused)
  - Preflight: `banger doctor` must pass; UDP :42069 must be free
    (otherwise the user's real daemon is up and the smoke daemon
    can't bind its DNS listener — fail with an actionable message)
  - Scenario 1 — bare: `banger vm run --rm -- echo smoke-bare-ok`
    exercises create → start → socket ownership chown → machine.Start
    → SDK waitForSocket race → vsock agent readiness → guest SSH
    wait → exec → cleanup → delete
  - Scenario 2 — workspace: creates a throwaway git repo, runs
    `banger vm run --rm <repo> -- cat /root/repo/smoke-file.txt`,
    verifies the tracked file reached the guest (exercises
    workDisk capability PrepareHost + workspace.prepare)
  - `banger daemon stop` at the end so instrumented binaries flush
    GOCOVERDIR pods before the script exits

Makefile additions:
  - smoke-build: builds banger/bangerd under build/smoke/bin/ with
    `go build -cover`
  - smoke: runs the script with GOCOVERDIR set, reports per-package
    coverage via `go tool covdata percent`
  - smoke-coverage-html: textfmt + go tool cover for a browsable
    report
  - smoke-clean: nukes build/smoke/ including the persisted XDG
    state

Bonus fix uncovered during the first smoke run: doctor treated a
missing state.db as a FAIL ("out of memory" from SQLite
SQLITE_CANTOPEN), which red-flagged every fresh install. Split
the store check: DB file absent → PASS with "will be created on
first daemon start" detail; DB present but unreadable → FAIL as
before. New TestDoctorReport_StoreMissingSurfacesAsPassForFreshInstall
pins the behaviour.

Concrete coverage delta from the first successful smoke run
(compared to `make coverage-total`'s unit-test-only 37.8%):

  internal/firecracker        43.6% → 75.0%
  internal/daemon/workspace   33.8% → 60.8%
  internal/store              40.1% → 56.3%
  internal/guest              63.7% → 57.4%  (different mix: smoke
                                              exercises real SSH;
                                              unit tests cover more
                                              error branches)

The packages the review flagged are the ones that moved most —
which is the point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:59:57 -03:00
ecb18ce6ca
seams: move the last four package globals onto instance fields
Three test seams were still package-level mutable vars, which tests
had to swap before use. That's the classic path to flaky parallel
tests — two goroutines fighting over the same global fake. Push each
down to the struct that owns the behaviour.

internal/daemon/dns_routing.go
  lookupExecutableFunc + vmDNSAddrFunc → fields on *HostNetwork,
  defaulted at newHostNetwork time. dns_routing_test builds
  HostNetwork{..., lookupExecutable: stub, vmDNSAddr: stub} inline,
  no more t.Cleanup dance around package-level vars.

internal/daemon/preflight.go + doctor.go
  vsockHostDevicePath (mutable string) → vsockHostDevice field on
  *VMService, defaulted via defaultVsockHostDevice constant in
  newVMService. Preflight reads s.vsockHostDevice; doctor reads
  d.vm.vsockHostDevice. Logger test sets d.vm.vsockHostDevice = tmp
  after wireServices.

internal/daemon/workspace/workspace.go
  HostCommandOutputFunc → *Inspector struct with a Runner field.
  Every git-using helper (GitOutput, GitTrimmedOutput,
  GitResolvedConfigValue, RunHostCommand, ListSubmodules,
  ListOverlayPaths, CountUntrackedPaths, InspectRepo,
  ImportRepoToGuest, PrepareRepoCopy) is now a method on *Inspector.
  NewInspector() wraps the real host runner for production;
  WorkspaceService holds one via repoInspector, CLI deps holds one
  too. cli_test.go's submodule-rejection test builds its own
  Inspector with a scripted Runner instead of patching a global.
  Pure helpers (FinalizeScript, ResolveSourcePath, ParsePrepareMode,
  ShellQuote, FormatStepError, GitFileURL, ParseNullSeparatedOutput)
  stay free functions since they don't touch the host.

Sentinel: grep for HostCommandOutputFunc, lookupExecutableFunc,
vmDNSAddrFunc, vsockHostDevicePath is now empty across internal/.
make lint test green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 12:07:14 -03:00
2685bc73f8
doctor: open the state DB read-only so inspection never mutates it
`banger doctor` used to call store.Open, which unconditionally runs
migrations on the way up. Diagnostics mutating persistent state is a
surprise — particularly now that migration 2 drops a column, so a
plain `doctor` invocation against an old DB would silently schema-
evolve it.

Add store.OpenReadOnly: separate DSN builder with mode=ro and a
minimal pragma set (foreign_keys, busy_timeout — no journal_mode=WAL,
no wal_autocheckpoint), skips runMigrations, and pings on open so a
missing DB fails up front rather than at first query. doctor.go now
uses OpenReadOnly; the existing storeErr fallback path surfaces any
failure as a failing check, unchanged.

Tests pin two invariants:
- OpenReadOnly against a DB whose migration 2 marker was removed and
  packages_path re-added must leave both alone (i.e. no drift is
  applied behind the user's back).
- Any write attempted through the read-only handle is rejected at
  the driver layer (belt-and-braces for future refactors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:05:23 -03:00
16702bd5e1
daemon split (6/n): extract wireServices + drop lazy service getters
Factor the service + capability wiring out of Daemon.Open() into
wireServices(d), an idempotent helper that constructs HostNetwork,
ImageService, WorkspaceService, and VMService from whatever
infrastructure (runner, store, config, layout, logger, closing) is
already set on d. Open() calls it once after filling the composition
root; tests that build &Daemon{...} literals call it to get a working
service graph, preinstalling stubs on the fields they want to fake.

Drops the four lazy-init getters on *Daemon — d.hostNet(),
d.imageSvc(), d.workspaceSvc(), d.vmSvc() — whose sole purpose was
keeping test literals working. Every production call site now reads
d.net / d.img / d.ws / d.vm directly; the services are guaranteed
non-nil once Open returns. No behavior change.

Mechanical: all existing `d.xxxSvc()` calls (production + tests)
rewritten to field access; each `d := &Daemon{...}` in tests gets a
trailing wireServices(d) so the literal + wiring are side-by-side.
Tests that override a pre-built service (e.g. d.img = &ImageService{
bundleFetch: stub}) now set the override before wireServices so the
replacement propagates into VMService's peer pointer.

Also nil-guards HostNetwork.stopVMDNS and d.store in Close() so
partially-initialised daemons (pre-reconcile open failure) still
tear down cleanly — same contract the old lazy getters provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:55:28 -03:00
466a7c30c4
daemon split (4/5): extract *VMService service
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.

Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.

Three small cleanups along the way:

  * preflight helpers (validateStartPrereqs / addBaseStartPrereqs
    / addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
    move with the VM methods that call them.
  * cleanupRuntime / rebuildDNS move to *VMService, with
    HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
    killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
    reached through s.net instead of the hostNet() facade.
  * vsockAgentBinary becomes a package-level function so both
    *Daemon (doctor) and *VMService (preflight) call one entry
    point instead of each owning a forwarder method.

WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.

Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:57:05 -03:00
108f7a0600
ssh-config: make the ssh <name>.vm shortcut opt-in
Before this change, every daemon.Open() wrote a Host *.vm stanza into
~/.ssh/config in a marker-fenced block. That's a real footgun for users
who manage their SSH config declaratively (chezmoi, dotfiles, NixOS):
banger was mutating host state outside its own directory on every
daemon start, easy to miss and hard to audit.

New contract: the daemon only ever writes its own ssh_config file at
~/.config/banger/ssh_config. ~/.ssh/config is untouched unless the user
opts in. `banger vm ssh <name>` still works out of the box — the
shortcut only matters for plain `ssh sandbox.vm` from any terminal.

The opt-in surface is `banger ssh-config`:

  banger ssh-config              # prints path + include-line +
                                 # install/uninstall hints
  banger ssh-config --install    # adds `Include <bangerConfig>` to
                                 # ~/.ssh/config inside a marker-fenced
                                 # block; idempotent; migrates any
                                 # legacy inline Host *.vm block from
                                 # pre-opt-in builds
  banger ssh-config --uninstall  # removes the new Include block AND
                                 # any legacy inline block

Doctor gains a gentle warn-level note when banger's ssh_config exists
but the user hasn't wired it in — not a fail, since the shortcut is
convenience and `banger vm ssh` covers the essential case.

Tests cover: daemon writes banger file and does NOT touch ~/.ssh/config,
Install adds the block, Install is idempotent, Install migrates the
legacy inline block cleanly (removing it, preserving unrelated
entries, adding the new Include block), Uninstall removes both marker
variants, Uninstall is a no-op when ~/.ssh/config is absent, and
UserSSHIncludeInstalled detects both marker shapes.

README reframes the feature as optional convenience.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:57:26 -03:00
58464ac28c
docs + doctor: be honest about amd64-only support
The README sold the product as "Linux with /dev/kvm"; the deeper docs
admit that the Makefile pins companion builds to GOARCH=amd64, the
kernel catalog ships only x86_64 entries, and OCI import pulls
linux/amd64 layers. arm64 users who show up through the README only
discover that after install fails in non-obvious ways.

Two surface-level fixes:

- README requirements list leads with "x86_64 / amd64 Linux — arm64 is
  not supported today", with a short note on the three places that
  assumption lives so users understand it's not a last-mile gap.
- `banger doctor` now runs an architecture check that passes on amd64
  and FAILS (not warns) on anything else, referencing the three
  downstream assumptions. Hard-fail rather than warn so a user on an
  arm64 machine doesn't waste time chasing unrelated preflight items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:03:50 -03:00
d38f580e00
doctor: surface state store open failure as failing check
Previously store.Open errors were silently swallowed, so `banger
doctor` could report green while the default-image check (and any
other store-dependent diagnostic) was silently skipped because
d.store was nil.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:35:27 -03:00
21b74639d8
vm defaults: host-aware sizing + spec line on spawn + doctor check
Replaces the static model.Default* constants that drove --vcpu / --memory
/ --disk-size with a three-layer resolver:

  1. [vm_defaults] in ~/.config/banger/config.toml (if set)
  2. host-derived heuristics (cpus/4 capped at 4; ram/8 capped at 8 GiB)
  3. baked-in constants (floor)

Visibility:

- Every `vm run` / `vm create` prints a `spec:` line before progress
  begins: `spec: 4 vcpu · 8192 MiB · 8G disk`. Matches the VM that
  actually gets created because the CLI is now the single source of
  truth — it resolves, populates the flag defaults, and forwards the
  explicit values to the daemon.
- `banger doctor` adds a "vm defaults" check showing per-field
  provenance (config|auto|builtin) and the config file path for
  overrides.
- `--help` shows the resolved defaults (e.g. `--vcpu int (default 4)`
  on an 8-core host).

No `banger config init` command, no first-run side effects, no writes
to the user's filesystem behind their back. Users who want explicit
control set the keys; everyone else gets sensible numbers that track
their hardware.
2026-04-19 13:06:51 -03:00
ac7974f5b9
Remove image build --from-image; doctor treats catalog images as OK
The `image build` flow spun up a transient Firecracker VM, SSHed in,
and ran a large bash provisioning script to derive a new managed
image from an existing one. It overlapped heavily with the golden-
image Dockerfile flow (same mise/docker/tmux/opencode install logic
duplicated in Go as `imagemgr.BuildProvisionScript`) and had far more
machinery: async op state, RPC begin/status/cancel, webui form +
operation page, preflight checks, API types, tests. For custom
images, writing a Dockerfile is simpler and more reproducible.

Removed end-to-end:
- CLI `image build` subcommand + `absolutizeImageBuildPaths`.
- Daemon: BuildImage method, imagebuild.go (transient-VM orchestration),
  image_build_ops.go (async begin/status/cancel), imagemgr/build.go
  (the 247-line provisioning script generator and all its append*
  helpers), validateImageBuildPrereqs + addImageBuildPrereqs.
- RPC dispatches for image.build / .begin / .status / .cancel.
- opstate registry `imageBuildOps`, daemon seam `imageBuild`,
  background pruner call.
- API types: ImageBuildParams, ImageBuildOperation, ImageBuildBeginResult,
  ImageBuildStatusParams, ImageBuildStatusResult; model type
  ImageBuildRequest.
- Web UI: Backend interface methods, handlers, form, routes, template
  branches (images.html build form, operation.html build branch,
  dashboard.html Build button).
- Tests that directly exercised BuildImage.

Doctor polish (task C):
- Drop the "image build" preflight section entirely (its raison d'être
  is gone).
- Default-image check now accepts "not local but in imagecat" as OK:
  vm create auto-pulls on first use. Only flag when the image is
  neither locally registered nor in the catalog.

Net: 24 files touched, 1,373 lines deleted, 25 added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:54:29 -03:00
572bf32424
Remove runtime-bundle image dependencies
Hard-cut banger away from source-checkout runtime bundles as an implicit source of\nimage and host defaults. Managed images now own their full boot set,\nimage build starts from an existing registered image, and daemon startup\nno longer synthesizes a default image from host paths.\n\nResolve Firecracker from PATH or firecracker_bin, make SSH keys config-owned\nwith an auto-managed XDG default, replace the external name generator and\npackage manifests with Go code, and keep the vsock helper as a companion\nbinary instead of a user-managed runtime asset.\n\nUpdate the manual scripts, web/CLI forms, config surface, and docs around\nthe new build/manual flow and explicit image registration semantics.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., bash -n scripts/*.sh,\nand make build.
2026-03-21 18:34:53 -03:00
3ed78fdcfc
Add experimental Void guest workflow and vsock agent
Make iterating on a Firecracker-friendly Void guest practical without replacing the Debian default image path.

Add local Void rootfs build/register/verify plumbing, a language-agnostic dev package baseline, and guest SSH/work-disk hardening so new images use the runtime bundle key, keep a normal root bash environment, and repair stale nested /root layouts on restart.

Replace the guest PING/PONG responder with an HTTP /healthz agent over vsock, rename the runtime bundle and config surface from ping helper to agent while still accepting the legacy keys, and route the post-SSH reminder through the new vm.health path.

Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, bash -n customize.sh make-rootfs-void.sh, and git diff --check.
2026-03-19 14:51:25 -03:00
08ef706e3f
Add vsock-backed SSH session reminders
Remind users when a VM is still running after 	hanger vm ssh exits instead of silently dropping them back to the host shell.\n\nAttach a Firecracker vsock device to each VM, persist the host vsock path/CID,\nadd a new guest-side banger-vsock-pingd responder to the runtime bundle and both\nimage-build paths, and expose a vm.ping RPC that the CLI and TUI call after SSH\nreturns. Doctor and start/build preflight now validate the helper plus\n/dev/vhost-vsock so the feature fails early and clearly.\n\nValidated with go mod tidy, bash -n customize.sh, git diff --check, make build,\nand GOCACHE=/tmp/banger-gocache go test ./... outside the sandbox because the\ndaemon tests need real Unix/UDP sockets. Rebuild the image/rootfs used for new\nVMs so the guest ping service is present.
2026-03-18 20:14:51 -03:00
4930d82cb9
Refactor VM lifecycle around capabilities
Make host-integrated VM features fit a standard Go extension path instead of adding more one-off branches through vm.go. This is the enabling refactor for future work like shared mounts, not the /work feature itself.

Add a daemon capability pipeline plus a structured guest-config builder, then move the existing /root work-disk mount, built-in DNS, and NAT wiring onto those hooks. Generalize Firecracker drive config at the same time so later storage features can extend machine setup without another hardcoded path.

Add banger doctor on top of the shared readiness checks, update the docs to describe the new architecture, and cover the new seams with guest-config, capability, report, CLI, and full go test verification. Also verify make build and a real ./banger doctor run on the host.
2026-03-18 19:28:26 -03:00