Commit graph

6 commits

Author SHA1 Message Date
e47b8146dc
daemon: thread per-RPC op_id end-to-end
Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.

Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.

vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.

The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.

Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:13:44 -03:00
466a7c30c4
daemon split (4/5): extract *VMService service
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.

Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.

Three small cleanups along the way:

  * preflight helpers (validateStartPrereqs / addBaseStartPrereqs
    / addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
    move with the VM methods that call them.
  * cleanupRuntime / rebuildDNS move to *VMService, with
    HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
    killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
    reached through s.net instead of the hostNet() facade.
  * vsockAgentBinary becomes a package-level function so both
    *Daemon (doctor) and *VMService (preflight) call one entry
    point instead of each owning a forwarder method.

WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.

Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:57:05 -03:00
da4a6bf45b
Add lint targets, fix gofmt drift, broaden Makefile build inputs
Three small operational improvements.

1. Makefile build dependencies now cover everything under cmd/ and
   internal/, not just *.go. The previous GO_SOURCES find pattern
   missed embedded assets (catalog.json today, anything else added
   later), so editing a JSON manifest didn't trigger a rebuild and
   left the binary stale. New BUILD_INPUTS covers all files; go's own
   build cache absorbs any redundant invocations. GO_SOURCES is kept
   for fmt/lint targets which still want only Go files.

2. New `make lint` (default + lint-go + lint-shell):
   - lint-go: gofmt -l (fail if any output) and go vet ./...
   - lint-shell: shellcheck --severity=error on scripts/*.sh
   The shell floor is set at error-level for now; the legacy
   make-rootfs-*.sh / make-*-kernel.sh / customize.sh scripts have
   warning-level findings (sudo-cat redirects, heredoc quoting) that
   would block landing this if we tightened immediately. Documented
   as tech debt in docs/kernel-catalog.md alongside a note about
   eventually replacing the per-distro bash with a uniform Go tool.

3. gofmt drift fixed in internal/daemon/imagemgr/build.go,
   session/session.go, and vm_create_ops.go (trailing newline +
   gofmt's preferred function-definition wrapping). Now
   `make lint` passes cleanly; future drift will fail CI/local lint
   instead of accumulating.

AGENTS.md gains a one-line note on make lint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 16:49:17 -03:00
fdab4a7e68
Extract opstate and dmsnap into subpackages
Two leaves of the daemon package that carry no back-references to Daemon
move out:

- internal/daemon/opstate: generic Registry[T AsyncOp]. The AsyncOp
  interface methods are capitalised (ID, IsDone, UpdatedAt, Cancel);
  vmCreateOperationState and imageBuildOperationState implement it.
- internal/daemon/dmsnap: Create, Cleanup, Remove plus the Handles type
  for device-mapper snapshot lifecycle. Takes an explicit Runner
  interface. The daemon-package snapshot.go keeps thin forwarders and a
  type alias so existing call sites and tests are untouched.

Skipped on purpose: tap_pool has too many Daemon-scoped dependencies
(config, store, closing, createTap) for a clean extraction at this
stage; nat.go is already a thin facade over internal/hostnat;
dns_routing.go tests tightly couple to package internals, so extraction
would be more churn than payoff. Each can be revisited when a
subsystem-level refactor forces the boundary.

All tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 16:02:43 -03:00
59f2766139
Move subsystem state/locks off Daemon into owning types
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:

- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
  vm-create and image-build operation maps (and their mutexes) into one
  shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
  lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
  createVMMu (CreateVM serialisation) so image ops and VM creates no
  longer block each other.

Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.

No behavior change; tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:58:33 -03:00
30f0c0b54a
Manage image artifacts and show VM create progress
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.

Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.

Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.

Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.
2026-03-21 14:48:01 -03:00