Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.
Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.
vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.
The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.
Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
VMService carried guestWaitForSSH and guestDial fields + matching
constructor args ever since the daemon split, but nothing on
VMService ever read them. The live guest-SSH path runs on *Daemon
(d.waitForGuestSSH / d.dialGuest in guest_ssh.go); WorkspaceService
reaches those through closures wired in wireServices.
So the VMService copies were refactor residue: they made the
service look more decoupled than it actually is, and any future
test that stubbed VMService.guestDial would be stubbing nothing.
Delete the fields, the deps entries, the newVMService assignments,
and the wireServices passes.
Real seams on *Daemon are unchanged — those are the ones tests
(e.g. workspace_test.go) already set directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three test seams were still package-level mutable vars, which tests
had to swap before use. That's the classic path to flaky parallel
tests — two goroutines fighting over the same global fake. Push each
down to the struct that owns the behaviour.
internal/daemon/dns_routing.go
lookupExecutableFunc + vmDNSAddrFunc → fields on *HostNetwork,
defaulted at newHostNetwork time. dns_routing_test builds
HostNetwork{..., lookupExecutable: stub, vmDNSAddr: stub} inline,
no more t.Cleanup dance around package-level vars.
internal/daemon/preflight.go + doctor.go
vsockHostDevicePath (mutable string) → vsockHostDevice field on
*VMService, defaulted via defaultVsockHostDevice constant in
newVMService. Preflight reads s.vsockHostDevice; doctor reads
d.vm.vsockHostDevice. Logger test sets d.vm.vsockHostDevice = tmp
after wireServices.
internal/daemon/workspace/workspace.go
HostCommandOutputFunc → *Inspector struct with a Runner field.
Every git-using helper (GitOutput, GitTrimmedOutput,
GitResolvedConfigValue, RunHostCommand, ListSubmodules,
ListOverlayPaths, CountUntrackedPaths, InspectRepo,
ImportRepoToGuest, PrepareRepoCopy) is now a method on *Inspector.
NewInspector() wraps the real host runner for production;
WorkspaceService holds one via repoInspector, CLI deps holds one
too. cli_test.go's submodule-rejection test builds its own
Inspector with a scripted Runner instead of patching a global.
Pure helpers (FinalizeScript, ResolveSourcePath, ParsePrepareMode,
ShellQuote, FormatStepError, GitFileURL, ParseNullSeparatedOutput)
stay free functions since they don't touch the host.
Sentinel: grep for HostCommandOutputFunc, lookupExecutableFunc,
vmDNSAddrFunc, vsockHostDevicePath is now empty across internal/.
make lint test green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop passing *Daemon into capability hooks. Each capability
implementation is now a struct with explicit service-pointer fields
populated at wireServices time; the six dynamic-dispatch interfaces
(AddStartPreflight, PrepareHost, PostStart, Cleanup, ApplyConfigChange,
AddDoctorChecks) no longer have a *Daemon parameter. Capability
methods reach their dependencies through struct fields, not through
d.vm / d.ws / d.net.
- workDiskCapability carries {vm, ws, store, defaultImageName}
- dnsCapability carries {net}
- natCapability carries {vm, net, logger}
Daemon.defaultCapabilities() builds the production list from the
already-constructed services and is called from wireServices so
d.vmCaps is populated eagerly. Tests that preinstall d.vmCaps with
stubs still work — wireServices only overwrites an empty slice.
registeredCapabilities() is gone (every dispatch loop now reads
d.vmCaps directly). capabilities_test.go's testCapability fake drops
*Daemon from its method set to match the new interfaces.
This finishes the daemon service split: capability implementations
no longer reach through the composition root, there's no path back
to *Daemon from any service or capability, and test construction
goes through one explicit wireServices call instead of lazy getters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Factor the service + capability wiring out of Daemon.Open() into
wireServices(d), an idempotent helper that constructs HostNetwork,
ImageService, WorkspaceService, and VMService from whatever
infrastructure (runner, store, config, layout, logger, closing) is
already set on d. Open() calls it once after filling the composition
root; tests that build &Daemon{...} literals call it to get a working
service graph, preinstalling stubs on the fields they want to fake.
Drops the four lazy-init getters on *Daemon — d.hostNet(),
d.imageSvc(), d.workspaceSvc(), d.vmSvc() — whose sole purpose was
keeping test literals working. Every production call site now reads
d.net / d.img / d.ws / d.vm directly; the services are guaranteed
non-nil once Open returns. No behavior change.
Mechanical: all existing `d.xxxSvc()` calls (production + tests)
rewritten to field access; each `d := &Daemon{...}` in tests gets a
trailing wireServices(d) so the literal + wiring are side-by-side.
Tests that override a pre-built service (e.g. d.img = &ImageService{
bundleFetch: stub}) now set the override before wireServices so the
replacement propagates into VMService's peer pointer.
Also nil-guards HostNetwork.stopVMDNS and d.store in Close() so
partially-initialised daemons (pre-reconcile open failure) still
tear down cleanly — same contract the old lazy getters provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.
Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.
Three small cleanups along the way:
* preflight helpers (validateStartPrereqs / addBaseStartPrereqs
/ addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
move with the VM methods that call them.
* cleanupRuntime / rebuildDNS move to *VMService, with
HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
reached through s.net instead of the hostNet() facade.
* vsockAgentBinary becomes a package-level function so both
*Daemon (doctor) and *VMService (preflight) call one entry
point instead of each owning a forwarder method.
WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.
Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>