Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.
Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.
vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.
The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.
Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.
Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.
Three small cleanups along the way:
* preflight helpers (validateStartPrereqs / addBaseStartPrereqs
/ addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
move with the VM methods that call them.
* cleanupRuntime / rebuildDNS move to *VMService, with
HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
reached through s.net instead of the hostNet() facade.
* vsockAgentBinary becomes a package-level function so both
*Daemon (doctor) and *VMService (preflight) call one entry
point instead of each owning a forwarder method.
WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.
Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small operational improvements.
1. Makefile build dependencies now cover everything under cmd/ and
internal/, not just *.go. The previous GO_SOURCES find pattern
missed embedded assets (catalog.json today, anything else added
later), so editing a JSON manifest didn't trigger a rebuild and
left the binary stale. New BUILD_INPUTS covers all files; go's own
build cache absorbs any redundant invocations. GO_SOURCES is kept
for fmt/lint targets which still want only Go files.
2. New `make lint` (default + lint-go + lint-shell):
- lint-go: gofmt -l (fail if any output) and go vet ./...
- lint-shell: shellcheck --severity=error on scripts/*.sh
The shell floor is set at error-level for now; the legacy
make-rootfs-*.sh / make-*-kernel.sh / customize.sh scripts have
warning-level findings (sudo-cat redirects, heredoc quoting) that
would block landing this if we tightened immediately. Documented
as tech debt in docs/kernel-catalog.md alongside a note about
eventually replacing the per-distro bash with a uniform Go tool.
3. gofmt drift fixed in internal/daemon/imagemgr/build.go,
session/session.go, and vm_create_ops.go (trailing newline +
gofmt's preferred function-definition wrapping). Now
`make lint` passes cleanly; future drift will fail CI/local lint
instead of accumulating.
AGENTS.md gains a one-line note on make lint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two leaves of the daemon package that carry no back-references to Daemon
move out:
- internal/daemon/opstate: generic Registry[T AsyncOp]. The AsyncOp
interface methods are capitalised (ID, IsDone, UpdatedAt, Cancel);
vmCreateOperationState and imageBuildOperationState implement it.
- internal/daemon/dmsnap: Create, Cleanup, Remove plus the Handles type
for device-mapper snapshot lifecycle. Takes an explicit Runner
interface. The daemon-package snapshot.go keeps thin forwarders and a
type alias so existing call sites and tests are untouched.
Skipped on purpose: tap_pool has too many Daemon-scoped dependencies
(config, store, closing, createTap) for a clean extraction at this
stage; nat.go is already a thin facade over internal/hostnat;
dns_routing.go tests tightly couple to package internals, so extraction
would be more churn than payoff. Each can be revisited when a
subsystem-level refactor forces the boundary.
All tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:
- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
vm-create and image-build operation maps (and their mutexes) into one
shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
createVMMu (CreateVM serialisation) so image ops and VM creates no
longer block each other.
Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.
No behavior change; tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stop relying on ad hoc rootfs handling by adding image promotion, managed work-seed fingerprint metadata, and lazy self-healing for older managed images after the first create.
Rebuild guest images with baked SSH access, a guest NIC bootstrap, and default opencode services, and add the staged Void kernel/initramfs/modules workflow so void-exp uses a matching Void boot stack.
Replace the opaque blocking vm.create RPC with a begin/status flow that prints live stages in the CLI while still waiting for vsock health and opencode on guest port 4096.
Validate with GOCACHE=/tmp/banger-gocache go test ./... and live void-exp create/delete smoke runs.