Commit graph

87 commits

Author SHA1 Message Date
2b6437d1b4
remove vm session feature
Cuts the daemon-managed guest-session machinery (start/list/show/
logs/stop/kill/attach/send). The feature shipped aimed at agent-
orchestration workflows (programmatic stdin piping into a long-lived
guest process) that aren't driving any concrete user today, and the
~2.3K LOC of daemon surface area — attach bridge, FIFO keepalive,
controller registry, sessionstream framing, SQLite persistence — was
locking in an API we'd have to keep through v0.1.0.

Anything session-flavoured that people actually need today can be
done with `vm ssh + tmux` or `vm run -- cmd`.

Deleted:
- internal/cli/commands_vm_session.go
- internal/daemon/{guest_sessions,session_lifecycle,session_attach,session_stream,session_controller}.go
- internal/daemon/session/ (guest-session helpers package)
- internal/sessionstream/ (framing package)
- internal/daemon/guest_sessions_test.go
- internal/store/guest_session_test.go
- GuestSession* types from internal/{api,model}
- Store UpsertGuestSession/GetGuestSession/ListGuestSessionsByVM/DeleteGuestSession + scanner helpers
- guest.session.* RPC dispatch entries
- 5 CLI session tests, 2 completion tests, 2 printer tests

Extracted:
- ShellQuote + FormatStepError lifted to internal/daemon/workspace/util.go
  (only non-session consumer); workspace package now self-contained
- internal/daemon/guest_ssh.go keeps guestSSHClient + dialGuest +
  waitForGuestSSH — still used by workspace prepare/export
- internal/daemon/fake_firecracker_test.go preserves the test helper
  that used to live in guest_sessions_test.go

Store schema: CREATE TABLE guest_sessions and its column migrations
removed. Existing dev DBs keep an orphan table (harmless, pre-v0.1.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 12:47:58 -03:00
c42fcbe012
cli + daemon: move test seams off package globals onto injected structs
CLI: introduce internal/cli.deps which owns every RPC/SSH/host-command
seam the tree used to reach through mutable package vars. Command
builders, orchestrators, and the completion helpers become methods on
*deps. Tests construct their own deps per case, so fakes no longer leak
across cases and tests are free to run in parallel.

Daemon: move workspaceInspectRepoFunc + workspaceImportFunc onto the
Daemon struct (workspaceInspectRepo / workspaceImport), mirroring the
existing guestWaitForSSH / guestDial pattern. Workspace-prepare tests
drop t.Parallel() guards now that they no longer mutate process-wide
state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 19:03:55 -03:00
d38f580e00
doctor: surface state store open failure as failing check
Previously store.Open errors were silently swallowed, so `banger
doctor` could report green while the default-image check (and any
other store-dependent diagnostic) was silently skipped because
d.store was nil.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:35:27 -03:00
ae14b9499d
ssh: trust-on-first-use host key pinning everywhere
Guest host-key verification was off in all three SSH paths:

  * Go SSH (internal/guest/ssh.go) used ssh.InsecureIgnoreHostKey
  * `banger vm ssh` passed StrictHostKeyChecking=no
    + UserKnownHostsFile=/dev/null
  * `~/.ssh/config` Host *.vm shipped the same posture into the
    user's global config

Now each path verifies against a banger-owned known_hosts file at
`~/.local/state/banger/ssh/known_hosts` with TOFU semantics:

  * First dial to a VM pins the key.
  * Subsequent dials require an exact match. A mismatch fails with
    an explicit "possible MITM" error.
  * `vm delete` removes the entries so a future VM reusing the IP
    or name re-pins cleanly.
  * The user's `~/.ssh/known_hosts` is untouched.

Changes:

  internal/guest/known_hosts.go (new) — OpenSSH-compatible parser,
    TOFUHostKeyCallback, RemoveKnownHosts. Process-wide mutex
    around the file.
  internal/guest/ssh.go — Dial and WaitForSSH grew a knownHostsPath
    parameter threaded through the callback. Empty path keeps the
    insecure callback (tests + throwaway tools only; documented).
  internal/daemon/{guest_sessions,session_attach,session_lifecycle,
    session_stream}.go — call sites pass d.layout.KnownHostsPath.
  internal/daemon/ssh_client_config.go — the ~/.ssh/config Host *.vm
    block now points at banger's known_hosts and uses
    StrictHostKeyChecking=accept-new. Missing path → fail closed.
  internal/daemon/vm_lifecycle.go — deleteVMLocked drops known_hosts
    entries for the VM's IP and DNS name via removeVMKnownHosts.
  internal/cli/banger.go — sshCommandArgs swaps StrictHostKeyChecking
    no + /dev/null for banger's file + accept-new. Path resolution
    failure falls through to StrictHostKeyChecking=yes.
  internal/paths/paths.go — Layout gains SSHDir + KnownHostsPath;
    Ensure creates SSHDir at 0700.

Tests (internal/guest/known_hosts_test.go): pin on first use, accept
matching key on second dial, reject mismatch, empty path skips
checking, RemoveKnownHosts drops the entry, re-pin works after
remove. Existing daemon + cli tests updated to assert the new
posture and regression-guard against the old flags.

Live verified: vm run writes the pin to banger's known_hosts at 0600
inside a 0700 dir; banger vm ssh + ssh root@<vm>.vm both succeed
using the pin; vm delete clears it.
2026-04-19 16:46:03 -03:00
a59958d4f5
daemon: roll back host state on any Open() failure
Open() touched several pieces of host state before hitting the step
that returned the error:

  * SQLite handle (store.Open)
  * managed SSH client config block (ensureVMSSHClientConfig)
  * vm-DNS UDP listener goroutine (startVMDNS)
  * systemd-resolved per-interface routing (ensureVMDNSResolverRouting)

The only deferred cleanup guarded stopVMDNS. A reconcile() or
initializeTapPool() failure therefore left the listener running, the
resolver wiring in place, and the SQLite handle open. A subsequent
startup attempt ran into "port 42069 already in use" or silently
published stale state.

Fix: once `d` exists, defer `d.Close()` on `err != nil`. Close is
idempotent (sync.Once) and every teardown step (listener close, DNS
listener close, resolver revert, session registry close, store close)
is nil-guarded, so calling it on a daemon that never got past the
first startup step is safe.

Tests (internal/daemon/open_close_test.go):

  - TestCloseOnPartiallyInitialisedDaemon: Close survives a daemon
    with only store + closing channel, and with a vmDNS listener but
    nothing else. Catches regressions where a teardown step forgets
    to nil-check.
  - TestCloseIdempotentUnderConcurrency: 5 goroutines racing on
    Close() never panic (sync.Once + close(d.closing) survive).
  - TestOpenFailureRunsCloseCleanup: structural check that the
    `defer cleanup() if err != nil` pattern actually fires.

Live: `banger daemon stop` cleanly, `banger vm ls` restarts daemon
without a residual listener on port 42069.
2026-04-19 16:36:29 -03:00
d1b9a8c102
remove experimental web UI
The web UI shipped as "experimental" and was never finished — no nav
off the dashboard, no live updates, no settled design, never a
supported surface. It was opt-in by default already; leaving the code
in the tree for v0.1.0 only invited "does this work?" questions and
kept HostSummary/BangerSummary/SudoStatus types on the public RPC
surface that nothing else uses.

Removed:

  internal/webui/                         (all Go + templates + assets)
  internal/daemon/web.go                  (server start / Layout / Config / ListVMs / ListImages)
  internal/daemon/dashboard.go            (DashboardSummary aggregator)

Simplified:

  internal/api/types.go                   drop WebURL on PingResult, drop
                                          HostSummary / SudoStatus / BangerSummary /
                                          DashboardSummary / DashboardSummaryResult
  internal/model/types.go                 drop DaemonConfig.WebListenAddr
  internal/config/config.go               drop web_listen_addr from fileConfig + Load
  internal/daemon/daemon.go               drop webListener / webServer / webURL fields +
                                          startWebServer() call + ping WebURL population
  internal/cli/banger.go                  `daemon status` output no longer branches on web
  internal/daemon/{doc.go,ARCHITECTURE.md} drop web UI sections
  README.md                               drop web_listen_addr config bullet + security paragraph

Tests updated to reflect the new shape. Coverage 57.3 -> 58.9% (the
webui package was largely untested; its removal lifts the ratio
without moving the numerator). `banger daemon status` output and
--help are web-free. Lint + full suite green.
2026-04-19 14:28:08 -03:00
687fcf0b59
vm state: split transient kernel/process handles off the durable schema
Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.

Why:

  Persisting ephemeral process handles to SQLite forced reconcile to
  treat "running with a stale PID" as a first-class case and mix it
  with real state transitions. The schema described what we last
  observed, not what the VM is. Every time the observation model
  shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
  grew a new branch. Splitting lets each layer own what it's good at:
  durable records describe intent, in-memory cache + scratch file
  describe momentary reality.

Shape:

  - `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
    DMDev. Never in SQLite.
  - `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
    VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
    WorkDiskPath, LastError. All durable or deterministic.
  - `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
    plumbing (`writeHandlesFile` / `readHandlesFile` /
    `rediscoverHandles`). See `internal/daemon/vm_handles.go`.
  - `d.vmAlive(vm)` replaces the 20+ inline
    `vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
    spreads. Single source of truth for liveness.
  - Startup reconcile: per running VM, load the scratch file, pgrep
    the api sock, either keep (cache seeded from scratch) or demote
    to stopped (scratch handles passed to cleanupRuntime first so DM
    / loops / tap actually get torn down).

Verification:

  - `go test ./...` green.
  - Live: `banger vm run --name handles-test -- cat /etc/hostname`
    starts; `handles.json` appears in VMDir with the expected PID,
    tap, loops, DM.
  - `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
    CLI, daemon auto-starts, reconcile recognises the VM as alive,
    `banger vm ssh` still connects, `banger vm delete` cleans up.

Tests added:

  - vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
    behaviour, cache concurrency, rediscoverHandles prefers pgrep
    over scratch, returns scratch contents even when process is
    dead (so cleanup can tear down kernel state).
  - vm_test.go: reconcile test rewritten to exercise the new flow
    (write scratch → reconcile reads it → verifies process is gone →
    issues dmsetup/losetup teardown).

ARCHITECTURE.md updated; `handles` added to Daemon field docs.
2026-04-19 14:18:13 -03:00
2e6e64bc04
guest sshd: drop DEBUG3 + StrictModes no; normalise /root perms
Previously /etc/ssh/sshd_config.d/99-banger.conf landed with:

  LogLevel DEBUG3
  PermitRootLogin yes
  PubkeyAuthentication yes
  AuthorizedKeysFile /root/.ssh/authorized_keys
  StrictModes no

DEBUG3 was debug leftover that floods journald in normal use.
StrictModes no was a workaround for /root perm drift on the work
disk — the real fix is to make those perms correct at provisioning
time.

New drop-in:

  PermitRootLogin prohibit-password
  PubkeyAuthentication yes
  PasswordAuthentication no
  KbdInteractiveAuthentication no
  AuthorizedKeysFile /root/.ssh/authorized_keys

prohibit-password blocks password root login even if PasswordAuth
gets flipped on elsewhere; KbdInteractiveAuth no closes the last
interactive fallback; StrictModes is now on (sshd's default).

normaliseHomeDirPerms chown/chmods /root to 0755 root:root at every
work-disk mount (ensureAuthorizedKeyOnWorkDisk,
seedAuthorizedKeyOnExt4Image); the .ssh dir also explicitly
chown'd root:root. Verified end-to-end against a real VM:
`sshd -T` reports strictmodes yes and all five directives match.

Regression test (sshd_config_test.go) pins the allow-list and the
deny-list (DEBUG3, StrictModes no, bare `PermitRootLogin yes`) so
the next accidental reintroduction fails fast.

README's Security section updated to reflect the new posture.
2026-04-19 13:40:40 -03:00
6cd52d12f4
workspace prepare: release VM mutex before guest I/O
Previously withVMLockByRef held the per-VM mutex across InspectRepo,
waitForGuestSSH, dialGuest, ImportRepoToGuest (the tar stream!), and
the readonly chmod. A large repo could block `vm stop` / `vm delete`
/ `vm restart` on the same VM for however long the import took.

Split into two phases:

  1. VM mutex held briefly to validate state (running + PID alive)
     and snapshot the fields needed for SSH (guest IP, api sock).
  2. VM mutex released. Acquire workspaceLocks[id] — a separate
     per-VM mutex scoped to workspace.prepare / workspace.export —
     for the guest I/O phase.

Lifecycle ops (stop/delete/restart/set) only take vmLocks, so they
no longer queue behind a slow import. Two concurrent prepares on the
same VM still serialise via workspaceLocks so tar streams don't
interleave. ExportVMWorkspace also acquires workspaceLocks to avoid
snapshotting a half-streamed import.

Two regression tests (sequential — they swap package-level seams):

  ReleasesVMLockDuringGuestIO: stall the import fake, assert the VM
  mutex is acquirable from another goroutine during the stall.

  SerialisesConcurrentPreparesOnSameVM: 3 concurrent prepares, assert
  Import is only ever invoked 1-at-a-time per VM.

ARCHITECTURE.md documents the split + updated lock ordering.
2026-04-19 13:32:42 -03:00
99de42385f
workspace export: stop mutating the guest repo index
Previously `banger vm workspace export` ran `git add -A` against the
guest's real `.git/index`, so the observation step left staged
changes behind that users never asked for. Reconnecting later (ssh,
another export) surfaced them and looked like phantom work.

Route `git add -A` through a throwaway index file instead:

  tmp_idx=$(mktemp ...)
  trap 'rm -f "$tmp_idx"' EXIT
  git read-tree <ref> --index-output="$tmp_idx"
  GIT_INDEX_FILE="$tmp_idx" git add -A
  GIT_INDEX_FILE="$tmp_idx" git diff --cached <ref> --binary|--name-only

The real .git/index, working tree, and refs stay exactly as the user
left them. Same diff content — commits past <ref>, uncommitted edits,
and untracked files (minus .gitignore) all captured.

Regression test locks the invariant: every export script must route
add -A through GIT_INDEX_FILE and clean the temp index on exit. CLI
help text updated to say "non-mutating".
2026-04-19 13:20:56 -03:00
21b74639d8
vm defaults: host-aware sizing + spec line on spawn + doctor check
Replaces the static model.Default* constants that drove --vcpu / --memory
/ --disk-size with a three-layer resolver:

  1. [vm_defaults] in ~/.config/banger/config.toml (if set)
  2. host-derived heuristics (cpus/4 capped at 4; ram/8 capped at 8 GiB)
  3. baked-in constants (floor)

Visibility:

- Every `vm run` / `vm create` prints a `spec:` line before progress
  begins: `spec: 4 vcpu · 8192 MiB · 8G disk`. Matches the VM that
  actually gets created because the CLI is now the single source of
  truth — it resolves, populates the flag defaults, and forwards the
  explicit values to the daemon.
- `banger doctor` adds a "vm defaults" check showing per-field
  provenance (config|auto|builtin) and the config file path for
  overrides.
- `--help` shows the resolved defaults (e.g. `--vcpu int (default 4)`
  on an 8-core host).

No `banger config init` command, no first-run side effects, no writes
to the user's filesystem behind their back. Users who want explicit
control set the keys; everyone else gets sensible numbers that track
their hardware.
2026-04-19 13:06:51 -03:00
346eaba673
coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers
Reuses existing fixtures (CommandRunner fakes, SQLite tempfile store,
pure-Go seams). No new infra needed.

  hostnat                  50% -> 98%   (iptables orchestration via fake runner)
  store                    78% -> 91%   (guest_sessions CRUD roundtrip)
  daemon/session           57% -> 95%   (script gen, state parse, snapshot apply)
  daemon/opstate           67% -> 100%  (Registry Insert/Get/Prune)
  daemon (firstNonEmpty)   slight bump

Total 54.0% -> 56.5%.
2026-04-18 18:03:37 -03:00
b5c13e3938
Remove opencode package + vm acp command (dead code)
The `internal/opencode` package and the `opencodeCapability` that
consumed it were hard-wired to wait for opencode on guest port 4096
when an image shipped an initrd. After the prune commits (void /
alpine / customize.sh / image build all removed), nothing banger
produces today carries an initrd, so the capability's wait path was
unreachable: every startup short-circuited to the "direct-boot, skip
opencode" branch.

Same logic for `banger vm acp`: it SSHes to `opencode acp --cwd
<path>`, a binary the golden image no longer ships. Users who run
their own image with opencode can still invoke
`ssh vm -- opencode acp --cwd /root/repo` directly — no banger
scaffolding required.

Removed:
- internal/opencode/ (whole package, 255 LOC incl. tests)
- internal/daemon/opencode.go (opencodeCapability)
- cli `vm acp` command + its helpers (runVMACP, sshACPCommandArgs,
  vmACPRemoteCommand) + their tests
- The opencodeCapability{} entry in registeredCapabilities() plus
  the test that pinned its presence
- `wait_opencode` progress-stage label from the vm-create renderer
- Stale mentions in daemon/doc.go, README, and webui test fixtures

~480 lines gone, 12 added. `banger/internal` is now 25 packages
instead of 26.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:54:37 -03:00
0933deaeb1
file_sync: config-driven replacement for hardcoded auth sync
Replace the three hardcoded host→guest credential syncs (opencode,
claude, pi) with a generic `[[file_sync]]` config list. Default is
empty — users opt in to exactly what they want synced, with no
surprise about which tools banger "supports".

```toml
[[file_sync]]
host = "~/.local/share/opencode/auth.json"
guest = "~/.local/share/opencode/auth.json"

[[file_sync]]
host = "~/.aws"          # directories are copied recursively
guest = "~/.aws"

[[file_sync]]
host = "~/bin/my-script"
guest = "~/bin/my-script"
mode = "0755"            # optional; default 0600 for files
```

Semantics:
- Host `~/...` expands against the host user's $HOME. Absolute host
  paths are used as-is.
- Guest must live under `~/` or `/root/...` — banger's work disk is
  mounted at /root in the guest, so that's the syncable namespace.
  Anything outside is rejected at config load.
- Validation at config load: reject empty paths, relative paths,
  `..` traversal, `~user/...`, malformed mode strings. Errors name
  the offending entry index.
- Missing host paths are a soft skip with a warn log (existing
  behaviour). Other errors (read, mkdir, install) abort VM create.
- File entries: `install -o 0 -g 0 -m <mode>` (default 0600).
- Directory entries: walked in Go; each source file is installed
  with its own source permissions preserved. The entry's `mode` is
  ignored for directories.

Removed (all dead after this):
- `ensureOpencodeAuthOnWorkDisk`, `ensureClaudeAuthOnWorkDisk`,
  `ensurePiAuthOnWorkDisk`, the shared `ensureAuthFileOnWorkDisk`,
  their `warn*Skipped` helpers, `resolveHost{Opencode,Claude,Pi}AuthPath`,
  and the work-disk relative-path + default display-path constants.
- The capability hook registering the three syncs now calls the
  generic `runFileSync` once.

Seven tests exercising the old codepath deleted; six new tests cover
the new runFileSync (no-op on empty config, file copy, custom mode,
missing-host-skip, overwrite, recursive directory). Config-layer
test adds happy-path parsing and a case-per-shape table of invalid
entries (empty, relative host, guest outside /root, '..' traversal,
`~user`, bad mode).

README updated: replaces the "Credential sync" section with a
"File sync" section showing the new config shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:40:11 -03:00
843314be5e
vm_authsync: s/repairing/provisioning/ in SSH work-disk stage
The "repairing SSH access on work disk" stage detail sounded
remedial, like something had gone wrong. It's just writing banger's
SSH key to /root/.ssh/authorized_keys on the work disk for the first
time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:29:18 -03:00
ac7974f5b9
Remove image build --from-image; doctor treats catalog images as OK
The `image build` flow spun up a transient Firecracker VM, SSHed in,
and ran a large bash provisioning script to derive a new managed
image from an existing one. It overlapped heavily with the golden-
image Dockerfile flow (same mise/docker/tmux/opencode install logic
duplicated in Go as `imagemgr.BuildProvisionScript`) and had far more
machinery: async op state, RPC begin/status/cancel, webui form +
operation page, preflight checks, API types, tests. For custom
images, writing a Dockerfile is simpler and more reproducible.

Removed end-to-end:
- CLI `image build` subcommand + `absolutizeImageBuildPaths`.
- Daemon: BuildImage method, imagebuild.go (transient-VM orchestration),
  image_build_ops.go (async begin/status/cancel), imagemgr/build.go
  (the 247-line provisioning script generator and all its append*
  helpers), validateImageBuildPrereqs + addImageBuildPrereqs.
- RPC dispatches for image.build / .begin / .status / .cancel.
- opstate registry `imageBuildOps`, daemon seam `imageBuild`,
  background pruner call.
- API types: ImageBuildParams, ImageBuildOperation, ImageBuildBeginResult,
  ImageBuildStatusParams, ImageBuildStatusResult; model type
  ImageBuildRequest.
- Web UI: Backend interface methods, handlers, form, routes, template
  branches (images.html build form, operation.html build branch,
  dashboard.html Build button).
- Tests that directly exercised BuildImage.

Doctor polish (task C):
- Drop the "image build" preflight section entirely (its raison d'être
  is gone).
- Default-image check now accepts "not local but in imagecat" as OK:
  vm create auto-pulls on first use. Only flag when the image is
  neither locally registered nor in the catalog.

Net: 24 files touched, 1,373 lines deleted, 25 added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:54:29 -03:00
6083e2dde5
Prune legacy void/alpine + customize.sh flows
The golden-image Dockerfile + catalog pipeline replaces the entire
manual rootfs-build stack. With that shipped, the per-distro shell
flows are dead code.

Removed:
- scripts/customize.sh, scripts/interactive.sh, scripts/verify.sh
- scripts/make-rootfs{,-void,-alpine}.sh
- scripts/register-{void,alpine}-image.sh
- scripts/make-{void,alpine}-kernel.sh
- internal/imagepreset/ (only consumer was `banger internal packages`,
  which fed customize.sh)
- examples/{void,alpine}.config.toml
- Makefile targets: rootfs, rootfs-void, rootfs-alpine, void-kernel,
  alpine-kernel, void-register, alpine-register, void-vm, alpine-vm,
  verify-void, verify-alpine, plus the ALPINE_RELEASE / *_IMAGE_NAME
  / *_VM_NAME variables

The void-6.12 kernel catalog entry is also gone — golden image pairs
with generic-6.12 and nothing else in the catalog depended on it.

Consolidated: imagemgr now holds the small DebianBasePackages list +
package-hash helper inline, so the `image build --from-image` flow
(still supported) no longer pulls from a separate imagepreset package.

Net: 3,815 lines deleted, 59 added. No runtime functionality removed
beyond the `banger internal packages` subcommand (hidden, used only
by the deleted customize.sh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:39:53 -03:00
e0894376ea
vm create: auto-pull image and kernel from catalogs if missing
One-command sandbox: `banger vm run` on a fresh host now Just Works.
No prior `banger image pull` or `banger kernel pull` needed.

Changes:

- Default `default_image_name` flips from "default" to "debian-bookworm"
  so the golden image is the implicit target when `--image` is omitted.
- `CreateVM` resolves the image via a new `findOrAutoPullImage`: try
  the local store first, and on miss fall back to the embedded imagecat
  catalog + auto-pull. Emits a vm-create progress stage so the user
  sees "pulling from image catalog" in the create output.
- `resolveKernelInputs` gains context + the same pattern via
  `readOrAutoPullKernel`: try the local kernelcat, and on miss look up
  the embedded kernelcat and auto-pull. Fires whenever a bundle's
  manifest references a kernel the user hasn't pulled yet, not just
  during image pull — any CreateVM with an image that needs a kernel
  not yet local will resolve it.
- `--image` help text updated on both `vm run` and `vm create`.

Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog
error propagation, and a non-ENOENT kernel read error does NOT trigger
a misleading "not in catalog" claim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:10:26 -03:00
b2dcdf9757
vm_lifecycle: drop systemd.mask=dev-{ttyS0,vdb}.device
Both masks were added when the direct-boot path first landed for
container rootfses that didn't have anything mounted on /dev/vdb. The
golden image (and any pulled OCI image running under banger's
patchRootOverlay) has an /etc/fstab entry mounting /dev/vdb at /root —
masking dev-vdb.device makes systemd wait forever for a unit that can
never become active, and the work-disk mount never completes. dev-ttyS0
is a real serial console the image needs too. Drop both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:58:42 -03:00
5bdc9985c2
image pull: dispatch to imagecat bundle path before OCI
PullImage now checks the embedded imagecat catalog first. If the
ref matches a catalog entry, it takes the bundle path:

  1. Fetch the .tar.zst bundle into a staging dir (rootfs.ext4 +
     manifest.json).
  2. Strip manifest.json (staging-only metadata).
  3. Stage kernel/initrd/modules alongside rootfs.ext4.
  4. Publish the staging dir and upsert the image row.

Bundle rootfs is already flattened + ownership-fixed + agent-
injected at build time, so the daemon-side work is strictly I/O —
no flatten, no mkfs, no debugfs.

Kernel resolution in the bundle path: --kernel-ref > entry.kernel_ref
> --kernel/--initrd/--modules.

If the ref doesn't match a catalog entry, PullImage falls through
to the existing OCI path unchanged (extracted into pullFromOCI).

New test seam: d.bundleFetch. Six unit tests cover happy path,
--kernel-ref override, existing-name rejection, kernel-required
error, fetch-failure cleanup, and the catalog → OCI fallthrough.

CLI help updated: image pull now documents both forms and takes
<name-or-oci-ref> instead of requiring an OCI ref.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:43:33 -03:00
8f4be112c2
Generic kernel + init= boot path for OCI-pulled images
Closes the full arc: banger kernel pull + image pull + vm create + vm ssh
now works end-to-end against docker.io/library/debian:bookworm with zero
manual image building.

Generic kernel:
 - New scripts/make-generic-kernel.sh builds vmlinux from upstream
   kernel.org sources using Firecracker's official minimal config
   (configs/firecracker-x86_64-6.1.config). All critical drivers
   (virtio_blk, virtio_net, ext4, vsock) compiled in — no modules,
   no initramfs needed.
 - Published as generic-6.12 in the catalog (kernels.thaloco.com).
 - catalog.json updated with the new entry.

Direct-boot init= override (vm_lifecycle.go):
 - For images without an initrd (direct-boot / OCI-pulled), banger now
   passes init=/usr/local/libexec/banger-first-boot on the kernel
   cmdline. The script runs as PID 1, mounts /proc /sys /dev /run,
   checks for systemd — if present execs it immediately; if not
   (container images), installs systemd-sysv + openssh-server via the
   guest's package manager, then execs systemd.
 - Also passes kernel-level ip= parameter via BuildBootArgsWithKernelIP
   so the kernel configures the network interface before init runs
   (container images don't ship iproute2, so the userspace bootstrap
   script can't call ip(8)).
 - Masks dev-ttyS0.device and dev-vdb.device systemd units that
   otherwise wait 90s for udev events that never fire in Firecracker
   guests started from container rootfses.

first-boot.sh rewritten as universal init wrapper:
 - Works as PID 1 (mounts essential filesystems) OR as a systemd
   oneshot (existing behavior).
 - Installs both systemd-sysv AND openssh-server (container images
   have neither).
 - Dispatch updated: debian, alpine, fedora, arch, opensuse families
   + ID_LIKE fallback. All tests updated.

Opencode capability skip for direct-boot images:
 - The opencode readiness check (WaitReady on vsock port 4096) now
   returns nil for images without an initrd, since pulled container
   images don't ship the opencode service. Without this, the VM
   would be marked as error for lacking an opinionated add-on.

Docs: README and kernel-catalog.md updated to recommend generic-6.12
as the default kernel for OCI-pulled images. AGENTS.md notes the new
build script.

Verified live:
 - banger kernel pull generic-6.12
 - banger image pull docker.io/library/debian:bookworm --kernel-ref generic-6.12
 - banger vm create --image debian-bookworm --name testbox --nat
 - banger vm ssh testbox -- "id; uname -r; systemctl is-active banger-vsock-agent"
 → uid=0(root), kernel 6.12.8, Debian bookworm, vsock-agent active,
   sshd running, SSH working.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:12:56 -03:00
491c8e1ebb
Phase B-2: pre-inject banger guest agents into pulled rootfs
New imagepull.InjectGuestAgents writes banger's guest-side assets
straight into the pulled ext4 so systemd will start them at first boot:

  /usr/local/bin/banger-vsock-agent             (binary, 0755)
  /usr/local/libexec/banger-network-bootstrap   (script, 0755)
  /etc/systemd/system/banger-network.service    (unit, 0644)
  /etc/systemd/system/banger-vsock-agent.service (unit, 0644)
  /etc/modules-load.d/banger-vsock.conf         (modules, 0644)

  plus enable-at-boot symlinks under
  /etc/systemd/system/multi-user.target.wants/

All writes + ownership + symlinks go through one `debugfs -w -f -`
invocation. No sudo required because the caller owns the ext4 file.
Script is deterministic: shallow-first mkdir, then write, then sif,
then symlink. "File exists" errors from mkdir on already-present
dirs are tolerated (debugfs keeps going past them with -f, and we
filter them out of the output scan).

Asset content reuses the existing guestnet.BootstrapScript /
SystemdServiceUnit / ConfigPath and vsockagent.ServiceUnit /
ModulesLoadConfig / GuestInstallPath — one source of truth, no
duplicated systemd unit strings.

Daemon wiring: new d.finalizePulledRootfs seam runs both
ApplyOwnership (B-1) and InjectGuestAgents as one phase between
BuildExt4 and StageBootArtifacts. The companion vsock-agent binary
is resolved via paths.CompanionBinaryPath. Existing daemon tests
stub the seam with a no-op to avoid needing a real companion
binary + debugfs in the test harness.

Tests: real-ext4 round-trip that builds a minimal ext4, runs
InjectGuestAgents, then verifies every expected path is present
via `debugfs stat`, plus uid=0 and mode 0755 on the vsock-agent
binary. Also: missing-binary rejection, ancestor-collection order
test. debugfs/mkfs.ext4 tests skip on hosts without the binaries.

After B-1+B-2, any OCI image that already ships sshd boots with
banger-network and banger-vsock-agent running; image pull is
one step from "useful rootfs primitive". B-3 (first-boot sshd
install) unlocks images that don't ship sshd.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 18:08:56 -03:00
43982a4ae3
Phase B-1: ownership fixup via debugfs pass
imagepull.Flatten now captures per-file uid/gid/mode/type from the
tar headers as it walks layers, returning a Metadata map alongside
the extracted tree. Whiteouts correctly drop the victim's metadata.
The returned Metadata feeds the new imagepull.ApplyOwnership, which
pipes a batched `set_inode_field` script to `debugfs -w -f -`.

Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so
without this pass setuid binaries become setuid-nonroot and sshd
refuses to start on the resulting image. With the pass, a pulled
debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving
intact.

imagepull.BuildExt4 signature unchanged; ownership is applied as a
separate step by the daemon orchestrator between BuildExt4 and
StageBootArtifacts, keeping each helper focused. The seam
(d.pullAndFlatten) now returns (Metadata, error) for test stubs to
feed synthetic metadata.

StdinRunner is a new duck-typed extension next to CommandRunner;
the real system.Runner implements RunStdin, test mocks don't need
to unless they exercise stdin. Prevents every existing mock from
growing a new method.

Tests:
 - TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the
   tar-header walk
 - TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip —
   create ext4 with runner's uid, apply synthetic metadata setting
   uid=0 + setuid mode, verify via `debugfs -R stat` that the
   inode now has uid=0 and mode 04755
 - TestBuildOwnershipScriptDeterministic: sorted, well-formed
   sif script output

Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 18:04:22 -03:00
a8c9983542
Phase 2: daemon PullImage orchestration
(d *Daemon).PullImage downloads an OCI image, flattens it into an
ext4 rootfs, and registers the result as a managed banger image.

Flow (internal/daemon/images_pull.go):
 1. Parse + validate the OCI ref via go-containerregistry/name.
 2. Derive a friendly default name from the ref ("debian-bookworm")
    when --name is omitted.
 3. Reject if an image with that name already exists.
 4. Resolve kernel info via the new shared resolveKernelInputs
    helper (refactored out of RegisterImage); ValidateKernelPaths
    checks the kernel triple alone.
 5. Acquire imageOpsMu, generate a fresh image id, and stage at
    <ImagesDir>/<id>.staging.
 6. imagepull.Pull → cache layers under OCICacheDir;
    imagepull.Flatten → temp rootfs tree under os.TempDir (so the
    state filesystem doesn't temporarily double in size).
 7. Default size: max(treeSize × 1.25, 1 GiB); --size override
    accepted.
 8. imagepull.BuildExt4 produces the rootfs.ext4 in the staging dir.
 9. imagemgr.StageBootArtifacts stages the kernel/initrd/modules
    into the same dir (reused unchanged).
 10. Atomic os.Rename(staging, finalDir) publishes the artifact dir.
 11. Persist model.Image with Managed=true. Failure at any step
     removes the staging dir; failure post-rename removes finalDir.

The pullAndFlatten field on Daemon is the test seam: tests stub it
to write a fixture tree into destDir and skip the real registry.

Refactor: extracted the "kernel-ref vs direct paths" resolution
out of RegisterImage into d.resolveKernelInputs so PullImage and
RegisterImage share one source of truth for that policy. Split
ValidateRegisterPaths into a kernel-only ValidateKernelPaths so
PullImage (which produces the rootfs itself) can validate just
the kernel triple without the rootfs check.

API: ImagePullParams { Ref, Name, KernelPath, InitrdPath,
ModulesDir, KernelRef, SizeBytes }. RPC dispatch case image.pull
mirrors image.register.

Tests cover: happy-path producing a managed image with all four
artifacts present + staging cleaned up, name-collision rejection,
missing-kernel rejection, and staging cleanup on a failed pull.
defaultImageNameFromRef handles tag/digest/no-suffix cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 17:27:32 -03:00
da4a6bf45b
Add lint targets, fix gofmt drift, broaden Makefile build inputs
Three small operational improvements.

1. Makefile build dependencies now cover everything under cmd/ and
   internal/, not just *.go. The previous GO_SOURCES find pattern
   missed embedded assets (catalog.json today, anything else added
   later), so editing a JSON manifest didn't trigger a rebuild and
   left the binary stale. New BUILD_INPUTS covers all files; go's own
   build cache absorbs any redundant invocations. GO_SOURCES is kept
   for fmt/lint targets which still want only Go files.

2. New `make lint` (default + lint-go + lint-shell):
   - lint-go: gofmt -l (fail if any output) and go vet ./...
   - lint-shell: shellcheck --severity=error on scripts/*.sh
   The shell floor is set at error-level for now; the legacy
   make-rootfs-*.sh / make-*-kernel.sh / customize.sh scripts have
   warning-level findings (sudo-cat redirects, heredoc quoting) that
   would block landing this if we tightened immediately. Documented
   as tech debt in docs/kernel-catalog.md alongside a note about
   eventually replacing the per-distro bash with a uniform Go tool.

3. gofmt drift fixed in internal/daemon/imagemgr/build.go,
   session/session.go, and vm_create_ops.go (trailing newline +
   gofmt's preferred function-definition wrapping). Now
   `make lint` passes cleanly; future drift will fail CI/local lint
   instead of accumulating.

AGENTS.md gains a one-line note on make lint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 16:49:17 -03:00
f0668ee598
Phase 4: remote catalog + banger kernel pull
Introduces the headline feature of the kernel catalog: pulling a kernel
bundle over HTTP without any local build step.

Catalog format (internal/kernelcat/catalog.go):
 - Catalog { Version, Entries } + CatEntry { Name, Distro, Arch,
   KernelVersion, TarballURL, TarballSHA256, SizeBytes, Description }.
 - catalog.json is embedded via go:embed and ships with each banger
   binary. It starts empty (Phase 5's CI pipeline will populate it).
 - Lookup(name) returns the matching entry or os.ErrNotExist.

Fetch (internal/kernelcat/fetch.go):
 - HTTP GET with streaming SHA256 over the response body.
 - zstd-decode (github.com/klauspost/compress/zstd) -> tar extract into
   <kernelsDir>/<name>/.
 - Hardens against path-traversal tarball entries (members whose
   normalised path escapes the target dir, and unsafe symlink
   targets) and sha256-mismatch downloads; any failure removes the
   partially-populated target dir.
 - Regular files, directories, and safe symlinks are supported; other
   tar types (hardlinks, devices, fifos) are silently skipped.
 - After extraction, recomputes sha256 over the on-disk vmlinux and
   writes the manifest with Source="pull:<url>".

Daemon methods (internal/daemon/kernels.go):
 - KernelPull(ctx, {Name, Force}) - lookup in embedded catalog, refuse
   overwrite unless Force, delegate to kernelcat.Fetch.
 - KernelCatalog(ctx) - return the embedded catalog annotated per-entry
   with whether it has been pulled locally.

RPC: kernel.pull, kernel.catalog dispatch cases.

CLI:
 - `banger kernel pull <name> [--force]`.
 - `banger kernel list --available` prints the catalog with a
   pulled/available STATE column and a human-readable size.

Tests: fetch round-trip (extract + manifest + sha256), sha256 mismatch
rejection with cleanup, missing-vmlinux rejection, path-traversal
rejection, HTTP error propagation, catalog parsing, lookup,
pulled-status reconciliation. All 20 packages green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 15:05:42 -03:00
7192ba24ae
Phase 3: banger kernel import bridges make-*-kernel.sh output
`banger kernel import <name> --from <dir>` copies a staged kernel
bundle into the local catalog. <dir> is the output of
`make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/
or build/manual/alpine-kernel/).

kernelcat.DiscoverPaths locates artifacts under <dir>:
 1. Prefers metadata.json (written by make-void-kernel.sh).
 2. Falls back to globbing: boot/vmlinux-* or vmlinuz-* (Alpine
    fallback), boot/initramfs-*, lib/modules/<latest>.

The daemon's KernelImport copies kernel + optional initrd via
system.CopyFilePreferClone and modules via system.CopyDirContents
(no-sudo mode — catalog lives under ~/.local/state), computes SHA256
over the kernel, and writes the manifest via kernelcat.WriteLocal.

While wiring this up, fixed a latent bug in system.CopyDirContents:
filepath.Join(sourceDir, ".") silently drops the trailing dot, so
`cp -a source source/contents target/` was copying the whole source
directory (including its basename) instead of just its contents.
Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts
(the only existing caller) silently benefits.

scripts/register-void-image.sh and scripts/register-alpine-image.sh
are rewritten to use `banger kernel import … && banger image register
--kernel-ref …` instead of the find-and-pass-paths dance. Preserves
the same user-facing commands and env vars.

Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz
fallback, kernel-missing error, round-trip copy into the catalog, and
the --from required flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 14:53:49 -03:00
48e3a938cf
Phase 2: image register --kernel-ref resolves through the catalog
`banger image register --kernel-ref <name>` now substitutes for the
--kernel/--initrd/--modules triple. The daemon looks the name up via
kernelcat.ReadLocal under d.layout.KernelsDir, populates the three
paths from the resolved entry, then continues through the existing
validate/persist flow unchanged.

Passing both --kernel-ref and any of --kernel/--initrd/--modules is
rejected — at the CLI layer (before starting the daemon) and
defensively at the RPC layer. A missing catalog entry produces a clear
"run 'banger kernel list'" message.

Once registered, the image stores the resolved absolute paths, so
deleting the catalog entry later does not invalidate already-registered
images — managed image build still copies the kernel into its artifact
dir per imagemgr.StageBootArtifacts.

Tests cover: resolution success (absolute KernelPath populated from
catalog), mutual-exclusion rejection, and missing-entry error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 14:25:50 -03:00
83cc3aee15
Phase 1: local kernel catalog scaffolding
Introduces a read/write kernel catalog on disk without any network
dependency, so later phases (image register --kernel-ref, import, pull)
can build on a working foundation.

Layout: adds KernelsDir to paths.Layout, ensured under
~/.local/state/banger/kernels/. Each cataloged kernel lives at
<KernelsDir>/<name>/ with a manifest.json alongside vmlinux and optional
initrd.img / modules/.

New internal/kernelcat package owns the disk format:
- Entry (Name, Distro, Arch, KernelVersion, SHA256, Source, ImportedAt)
- ValidateName (alphanumeric + dots/hyphens/underscores, no traversal)
- ReadLocal / ListLocal / WriteLocal / DeleteLocal
- SumFile helper

The daemon exposes three RPC methods dispatched in daemon.go:
kernel.list, kernel.show, kernel.delete. Implementations live in a new
internal/daemon/kernels.go and are thin wrappers over kernelcat using
d.layout.KernelsDir.

CLI: new top-level `banger kernel` with list / show / rm subcommands
mirroring the image-command pattern (ensureDaemon, RPC call, table or
JSON output). No sudo required — kernel ops are user-space only.

Users can now manually populate ~/.local/state/banger/kernels/<name>/
and see it via `banger kernel list`. Phase 2 wires --kernel-ref into
image register; Phase 3 adds `banger kernel import`; Phase 4 adds
remote pulls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 14:21:10 -03:00
ca4865447c
Refresh daemon docs and mark web UI experimental
internal/daemon/doc.go and ARCHITECTURE.md were written before the
subpackage extractions and still referenced old structure (in-progress
phrasing, missing opstate/dmsnap/fcproc/imagemgr/session/workspace,
mentions of opRegistry by its old name). Both now describe the current
shape: composition root + six leaf subpackages, lock ordering rooted
at vmLocks[id], and the one intra-package dependency (workspace →
session for ShellQuote + FormatStepError).

README.md and AGENTS.md mark the local web UI as experimental. It is
still enabled by default at 127.0.0.1:7777, but the docs now state
plainly that its surface is not stable or hardened and not intended for
anything beyond single-user localhost use. AGENTS.md also points at
ARCHITECTURE.md for the subpackage layout.

No code changes; tests still green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:44:11 -03:00
1d51370d26
Extract workspace subpackage with pure repo helpers
Moves the stateless parts of the workspace subsystem into
internal/daemon/workspace:

- RepoSpec struct + InspectRepo for host-side git inspection
- ImportRepoToGuest (taking a minimal GuestClient interface) with the
  full-copy and metadata-only / shallow-overlay paths
- FinalizeScript, PrepareRepoCopy, ResolveSourcePath
- ListSubmodules, ListOverlayPaths, ParsePrepareMode
- Git helpers (GitOutput, GitTrimmedOutput, GitResolvedConfigValue,
  ParseNullSeparatedOutput, RunHostCommand, GitFileURL) and the
  HostCommandOutputFunc test seam
- ShallowFetchDepth const

The subpackage imports internal/daemon/session for ShellQuote and
FormatStepError so both workspace and session pure helpers live in
their own subpackages with a clean session→workspace direction of use.

daemon/workspace.go shrinks from 481 → 156 LOC, keeping just the three
orchestrator methods (Export, Prepare, prepareLocked) that still touch
d.store, d.FindVM, d.dialGuest, d.waitForGuestSSH, and the VM lock set.
guestSessionHostCommandOutputFunc is removed from guest_sessions.go (its
only caller was workspace.go; the new package has its own copy).

All tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:37:19 -03:00
37e02b1576
Extract session subpackage with pure guest-session helpers
Moves the stateless parts of the guest-session subsystem into
internal/daemon/session:

- consts (BackendSSH, attach/transport kinds, StateRoot, LogTailLineDefault)
- StateSnapshot plus ParseState / InspectStateFromDir / ApplyStateSnapshot / StateChanged
- 10 on-guest path helpers (StateDir, StdoutLogPath, StdinPipePath, …)
- 3 bash script generators (Script, InspectScript, SignalScript)
- small utilities (ShellQuote, ExitCode, CloneStringMap, TailFileContent,
  ProcessAlive + syscallKill test seam, FormatStepError)
- launch helpers (DefaultName, DefaultCWD, FailLaunch,
  NormalizeRequiredCommands, CWDPreflightScript, CommandPreflightScript,
  AttachInputCommand, AttachTailCommand, EnvLines)

Callers inside the daemon package import the new package under the
alias "sess" to avoid colliding with the local `session model.GuestSession`
variables threaded through the orchestrator code. guest_sessions.go
shrinks from 616 → 156 LOC; session_stream.go, session_attach.go,
session_lifecycle.go, workspace.go, and guest_sessions_test.go rewire to
the exported names.

The orchestrator methods (StartGuestSession, BeginGuestSessionAttach,
SendToGuestSession, GuestSessionLogs, refresh/inspect, sessionRegistry,
guestSessionController) stay on *Daemon. Full Manager-style extraction
would need prerequisite phases (operation protocol, workdisk helpers),
mirroring Phase 4a's trade-off.

All tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:33:12 -03:00
c13c8b11af
Extract imagemgr subpackage with pure image helpers
Moves the stateless helpers of the image subsystem into
internal/daemon/imagemgr:

paths.go — path validators (ValidateRegisterPaths,
ValidatePromotePaths), artifact staging (StageBootArtifacts,
StageOptionalArtifactPath), metadata (BuildMetadataPackages,
WritePackagesMetadata).

build.go — ResizeRootfs, WriteBuildLog, and the full guest
provisioning script generator (BuildProvisionScript, BuildModulesCommand
and all private script-append helpers) along with the mise/tmux/opencode
version constants.

The orchestrator methods (BuildImage, RegisterImage, PromoteImage,
DeleteImage, runImageBuildNative) stay on *Daemon: they still touch
d.store, d.imageOpsMu, d.beginOperation, capability hooks, and
fcproc-wrapped Daemon helpers — extracting them needs prerequisite
phases (operation protocol, workdisk helpers, tap pool). This commit is
strictly the pure-helper extraction that can land cleanly today.

imagebuild.go shrinks from 453 -> 225 LOC (half gone). images.go shrinks
from 450 -> 374 LOC. imagebuild_test.go updated to call the exported
imagemgr.BuildProvisionScript. Zero behavior change; all tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:24:22 -03:00
6e989914dd
Extract fcproc subpackage for firecracker process helpers
Moves the host-side firecracker primitives — bridge setup, socket dir,
binary resolution, tap creation, socket chown, PID lookup, resolve,
ctrl-alt-del, wait-for-exit, SIGKILL — plus the shared
ErrWaitForExitTimeout sentinel and a small waitForPath helper into
internal/daemon/fcproc.

Manager is stateless beyond its runner + config + logger. The daemon
package keeps thin forwarders (d.ensureBridge, d.createTap, etc.) so no
call site or test changes. A d.fc() helper builds a Manager on demand
from Daemon state, which lets tests keep constructing &Daemon{...}
literals without wiring fcproc explicitly.

This unblocks Phase 4 (imagemgr extraction): imagebuild.go's dependence
on d.createTap/d.firecrackerBinary/etc. can now be satisfied by
importing fcproc instead of reaching back to *Daemon.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:11:39 -03:00
fdab4a7e68
Extract opstate and dmsnap into subpackages
Two leaves of the daemon package that carry no back-references to Daemon
move out:

- internal/daemon/opstate: generic Registry[T AsyncOp]. The AsyncOp
  interface methods are capitalised (ID, IsDone, UpdatedAt, Cancel);
  vmCreateOperationState and imageBuildOperationState implement it.
- internal/daemon/dmsnap: Create, Cleanup, Remove plus the Handles type
  for device-mapper snapshot lifecycle. Takes an explicit Runner
  interface. The daemon-package snapshot.go keeps thin forwarders and a
  type alias so existing call sites and tests are untouched.

Skipped on purpose: tap_pool has too many Daemon-scoped dependencies
(config, store, closing, createTap) for a clean extraction at this
stage; nat.go is already a thin facade over internal/hostnat;
dns_routing.go tests tightly couple to package internals, so extraction
would be more churn than payoff. Each can be revisited when a
subsystem-level refactor forces the boundary.

All tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:02:43 -03:00
59f2766139
Move subsystem state/locks off Daemon into owning types
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:

- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
  vm-create and image-build operation maps (and their mutexes) into one
  shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
  lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
  createVMMu (CreateVM serialisation) so image ops and VM creates no
  longer block each other.

Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.

No behavior change; tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:58:33 -03:00
ea0db1e17e
Split internal/daemon vm.go and guest_sessions.go by concern
vm.go (1529 LOC) splits into vm_create, vm_lifecycle, vm_set, vm_stats,
vm_disk, vm_authsync; firecracker/DNS/helpers stay in vm.go.

guest_sessions.go (1266 LOC) splits into session_controller,
session_lifecycle, session_attach, session_stream; scripts and helpers
stay in guest_sessions.go.

Mechanical move only. No behavior change. Adds doc.go and
ARCHITECTURE.md capturing subsystem map and current lock ordering as
the baseline for the upcoming subsystem extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:47:08 -03:00
09590cbaa0
Fix Firecracker PID resolution and deprecated net.Error.Temporary
Use context.Background() for resolveFirecrackerPID so a cancelled
request context (client disconnect) doesn't prevent tracking the
spawned Firecracker process, leaving it orphaned on cleanup.

Drop ne.Temporary() check in accept loop; deprecated since Go 1.18
and unreliable. Retry on any net.Error instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 12:44:38 -03:00
0e764b0571
Fix two daemon bugs: Firecracker context and sessionControllers init
vm.go: Firecracker was launched with context.Background() instead of
the incoming request ctx. A cancelled or timed-out VM creation request
could not stop mid-flight Firecracker process spawning, leaving an
orphaned process and leaked resources. Replace the four firecrackerCtx
uses with ctx directly; the local variable is removed.

guest_sessions.go / daemon.go: sessionControllers map was lazily
initialized with a nil-check inside every mutating method. With d.mu
held this isn't a data race, but the pattern is fragile — any new
method that writes to the map without copying the guard can panic.
Initialize the map once in Open() alongside the other daemon maps and
channels, and remove the redundant nil-checks from setGuestSessionController
and claimGuestSessionController.
2026-04-14 19:53:26 -03:00
43dfda14f8
Fix TOCTOU race in lockVMID
The old pattern held vmLocksMu to get/create a *sync.Mutex, then
released vmLocksMu before calling lock.Lock(). In the gap between
the two operations a concurrent goroutine could observe the entry,
and any future cleanup path that deleted map entries could let a
third goroutine create a fresh *sync.Mutex for the same ID — leaving
two callers holding independent locks with no mutual exclusion.

Fix: replace the manual map + vmLocksMu pair with sync.Map and
LoadOrStore. LoadOrStore is atomic at the map level: exactly one
*sync.Mutex wins for each VM ID, with no release-then-reacquire
gap between the lookup and the insert. vmLocksMu is removed.
2026-04-14 19:50:04 -03:00
ff51b7ce21
workspace.export: add base_commit to capture worker git commits
Without base_commit, export diffs against the current guest HEAD.
If the worker ran git commit inside the VM, HEAD advanced and the
diff came back empty — committed work was silently lost.

With base_commit set to the head_commit from workspace.prepare,
the diff uses that fixed point instead. After git add -A the index
holds the full working state, so git diff --cached <base_commit>
captures everything: committed deltas (HEAD moved past base) and
any uncommitted changes on top, in one patch, applied with the
same git apply flow.

- WorkspaceExportParams gains base_commit
- WorkspaceExportResult echoes back the ref actually used
- CLI gains --base-commit flag
- Tests assert scripts use the caller-supplied ref and that
  omitting it falls back to HEAD
2026-04-14 16:13:05 -03:00
94c353f317
Add guest.session.send and vm.workspace.export RPCs
guest.session.send — write to a pipe-mode session's stdin without
holding the exclusive attach. The daemon dials a fresh SSH connection,
uploads the payload to a temp file, and cats it into the session's
named FIFO. Linux atomicity for writes ≤ PIPE_BUF covers all pi RPC
JSONL lines. Attach exclusivity is unchanged.

vm.workspace.export — pull changes from guest back to host. Runs
`git add -A && git diff --cached HEAD --binary` inside the guest via a
new RunScriptOutput helper on guest.Client (stdout-only capture,
distinct from RunScript which merges stderr). Returns a binary-safe
patch and a list of changed files. CLI writes the patch to stdout for
`| git apply` or to a file via --output.

RunScriptOutput is implemented as a direct SSH session (same pattern as
runSession) rather than going through StartCommand/StreamSession to
avoid closing the underlying Client, which is required since
ExportVMWorkspace calls it twice on the same connection.

New files: internal/daemon/workspace_test.go
2026-04-14 15:21:50 -03:00
797a9de1ce
Install claude and pi through mise
Provisioning was still installing `claude` and `pi` through a separate
npm-global prefix even after the guest images had switched to `mise` for
Node and opencode. That left two competing install paths and made the
runtime layout harder to reason about.

Switch the Debian and Void image setup flows to install `claude` and `pi`
as `mise` npm tools, assert their shims exist after `mise reshim`, and
symlink `node`, `npm`, `opencode`, `claude`, and `pi` directly from the
mise shim directory into `/usr/local/bin`.

Update the imagebuild test expectations and bump the Void rootfs default
size to 4G so the larger default toolset still fits reliably.
2026-04-13 18:29:02 -03:00
5e26fd7544
Fix guest session cwd preflight scripts
Guest session cwd and command preflight helpers were emitting literal
`\\n` separators, so the guest shell saw malformed one-line scripts and
could fail `preflight_cwd` even when `/root/repo` already existed.

Replace those builders with real newlines, and fix the nearby attach
helper commands that were making the same mistake.

Add a small daemon guest-SSH seam so workspace preparation and session
start can share a fake backend in tests, then cover the regression with
an end-to-end daemon test for `PrepareVMWorkspace` followed by
`StartGuestSession` on `/root/repo`.

Validation: `GOCACHE=/tmp/banger-gocache go test ./internal/daemon` and
`GOCACHE=/tmp/banger-gocache go test ./...`.
2026-04-13 18:26:19 -03:00
37c4c091ec
Add guest sessions and agent VM defaults
Add daemon-backed workspace and guest-session primitives so host
orchestrators can prepare /root/repo, launch long-lived guest commands,
and attach to pipe-mode sessions over the local stdio mux bridge.

Persist richer session metadata and launch diagnostics, preflight guest
cwd/command requirements, make pipe-mode attach rehydratable from guest
state after daemon restart, and allow submodules when workspace prepare
runs in full_copy mode.

At the same time, stop vm run from auto-attaching opencode, make it
print next-step commands instead, and make glibc guest images more
agent-ready by installing node, opencode, claude, and pi while syncing
opencode/claude/pi auth files into work disks on VM start.

Validation:
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
- banger vm workspace prepare --help
- banger vm session --help
- banger vm session start --help
- banger vm session attach --help
2026-04-12 23:48:42 -03:00
497e6dca3d
Rename experimental Void image to void
Replace the old `void-exp` repository defaults with `void` so the Make targets,
registration helper, example config, verification messaging, and sample test
fixtures all line up with the new managed image name.

Keep the scope to repo-facing naming only: config overrides, helper output, and
test fixtures now expect `void`, while runtime compatibility for existing local
`void-exp` VMs remains an operational concern outside this commit.

Validation: go test ./..., make build, and a local `banger vm create --image void`
smoke boot with ssh and opencode ports up.
2026-04-01 20:15:28 -03:00
42b4a18c63
Sync Git identity into guest VMs
Populate guest /root/.gitconfig from host git config --global during work-disk preparation so plain VM shells can commit.

Resolve user.name and user.email from the source repo for vm run and write them only into the imported checkout, preserving repo-specific identity overrides.

Update mounted guest .gitconfig through a host temp file plus sudo install instead of direct git config --file writes, since the mounted root-owned work disk blocks Git lockfile creation.

Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, and a live alpine vm create smoke check for guest git config.
2026-03-22 19:30:36 -03:00
f798e1db33
Stamp shared build metadata into banger binaries
Treat `banger`, `bangerd`, and `banger-vsock-agent` as one release by
stamping the same version, commit SHA, and build timestamp into every
binary through a shared ldflag-backed `internal/buildinfo` package.

Add `banger version`, extend daemon ping/status to report the running
daemon's build tuple, and keep the guest helper linked to the same build
metadata without adding a new public version surface for it.

Validate with `GOCACHE=/tmp/banger-gocache go test ./...`, `make build`,
`./build/bin/banger version`, and `./build/bin/banger daemon status`
after the daemon restarts onto the new binary.
2026-03-22 17:14:06 -03:00
ea2db1e868
Configure direct SSH access for .vm hosts
Make daemon startup sync a managed `Host *.vm` block into `~/.ssh/config` so plain `ssh root@<vm>.vm` uses banger's managed key and the same publickey-only options as `banger vm ssh`.

Write the block directly instead of relying on a separate include file so it still applies when a user's SSH config ends inside another `Host` stanza, and remove the legacy managed include path. Add daemon tests that cover fresh config creation and managed-block replacement while preserving user entries.

Validate with `go test ./...`, `make build`, `ssh -G alp.vm`, and `ssh alp.vm true`.
2026-03-22 16:48:42 -03:00
b7f6d1fe1b
Route .vm DNS through systemd-resolved
Banger was already serving VM records on 127.0.0.1:42069, but hosts using systemd-resolved were not routing .vm queries there. That made direct lookups against the local server work while normal host resolution and commands like opencode attach <vm>.vm:4096 failed.\n\nSync resolvectl dns/domain/default-route settings onto the banger bridge when the daemon opens and whenever VM DNS records are published, and revert that bridge-scoped configuration on daemon shutdown. This uses sudo resolvectl because unprivileged resolved reconfiguration on this host requires interactive authentication.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., make build, daemon restart, resolvectl dns/domain br-fc, resolvectl query vrum.vm, and curl http://vrum.vm:4096.
2026-03-22 15:07:22 -03:00