Update ARCHITECTURE.md's Composition section to reflect the finished
split: capabilities carry explicit service-pointer fields, nothing
reaches *Daemon at dispatch time, and wireServices(d) is the single
entry point that builds services + capabilities eagerly (from Open
in production, from tests after constructing &Daemon{...} literals).
Removes the paragraph admitting capability→*Daemon coupling and the
lazy-init getters justification, neither of which applies anymore.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.9 KiB
internal/daemon architecture
This document describes the current daemon package layout: the Daemon
composition root, the four services it wires together, the subpackages
that own stateless helpers, and the lock ordering every caller must
respect.
Composition
Daemon is a thin composition root. It holds shared infrastructure
(store, runner, logger, layout, config, listener) plus pointers to
four focused services. RPC dispatch is a pure forwarder into those
services; no lifecycle / image / workspace / networking behaviour
lives on *Daemon itself.
Daemon
├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process,
│ DM snapshots, vsock readiness
├── *ImageService — register, promote, delete, pull (bundle + OCI),
│ kernel catalog, managed-seed refresh
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
│ + git-identity sync onto the work disk
└── *VMService — VM lifecycle (create/start/stop/restart/kill/
delete/set), stats polling, ports query,
handle cache, per-VM lock set, create-op
registry, preflight validation
Each service owns its own state. Cross-service calls go through narrow consumer-defined seams:
WorkspaceServicedoes not hold a*VMServicepointer. It takes function-typed deps (vmResolver,aliveChecker,withVMLockByRef,imageResolver,imageWorkSeed) so it sees exactly the operations it needs and nothing more. Those deps are captured as closures so construction-order cycles don't recur.VMServiceholds direct pointers to*HostNetwork,*ImageService, and*WorkspaceService. Orchestrating a VM start really does compose all three (bridge + tap + image resolution + work-disk sync), and declaring a function-typed interface for every call would balloon the surface for no win — services are unexported, so package-external code can never reach them.- Capability hooks do not take
*Daemon. Each capability is a struct with explicit service-pointer fields (workDiskCapability{vm, ws, store, defaultImageName},dnsCapability{net},natCapability{vm, net, logger}) populated at wiring time.VMServiceinvokes them through acapabilityHooksstruct (function-typed bag) populated at construction; neither the service nor any capability has a*Daemonpointer.
Services + capabilities are built eagerly by wireServices(d), called
once from Daemon.Open after the composition root's infrastructure is
populated, and once per test that constructs a &Daemon{...} literal.
Tests that want to stub a particular service or the capability list
assign the field before calling wireServices — the helper is
idempotent and skips anything already set.
Service state
HostNetwork (host_network.go, nat.go, dns_routing.go, tap_pool.go, snapshot.go)
tapPool— TAP interface pool, owns its own lock.vmDNS *vmdns.Server— in-process DNS server for.vmnames.- No direct VM-state access. Where an operation needs a VM's tap name
(e.g.
ensureNAT), the signature takesguestIP+tapstring so the caller (VMService) resolves them first.
ImageService (image_service.go, images.go, images_pull.go, image_seed.go, kernels.go)
imageOpsMu sync.Mutex— the publication-window lock. Held only across the recheck-name + atomic-rename + UpsertImage commit atom. Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.- Test seams
pullAndFlatten,finalizePulledRootfs,bundleFetchare struct fields (not package globals), so tests inject per-instance fakes.
WorkspaceService (workspace_service.go, workspace.go, vm_authsync.go)
workspaceLocks vmLockSet— per-VM mutex scoped toworkspace.prepare/workspace.export. These ops acquirevmLocks[id](on VMService) only long enough to validate VM state and snapshot the fields they need, then release it and acquireworkspaceLocks[id]for the slow guest I/O phase. That keepsvm stop/delete/restartfrom queueing behind a running tar import.- Test seams
workspaceInspectRepo,workspaceImportare per-instance fields.
VMService (vm_service.go, vm_lifecycle.go, vm_create.go, vm_create_ops.go, vm_stats.go, vm_set.go, vm_disk.go, vm_handles.go, vm_authsync.go (via WorkspaceService), preflight.go, ports.go, vm.go)
vmLocks vmLockSet— per-VM*sync.Mutex, one per VM ID. Held for the entire lifecycle op on that VM:startholds it across preflight, bridge setup, firecracker spawn, and post-boot wiring (seconds to tens of seconds). Twostart/stop/delete/setcalls against the same VM therefore serialise; calls against different VMs run independently.createVMMu sync.Mutex— narrow reservation mutex.CreateVMresolves the image (possibly auto-pulling, which self-locks onimageOpsMu) and parses sizing flags outside this lock, then holdscreateVMMuonly to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only.createOps opstate.Registry[*vmCreateOperationState]— in-flight async create operations; owns its own lock.handles *handleCache— in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). Each VM directory holds a smallhandles.jsonscratch file so the cache can be rebuilt at daemon startup.- Test seams
guestWaitForSSH,guestDialare per-instance fields.
Subpackages
Stateless helpers with no need for a service pointer live in
subpackages. Each takes explicit dependencies (typically a
system.Runner-compatible interface) and holds no global state beyond
small test seams.
| Subpackage | Purpose |
|---|---|
internal/daemon/opstate |
Generic Registry[T AsyncOp] for async-operation bookkeeping. |
internal/daemon/dmsnap |
Device-mapper COW snapshot create/cleanup/remove. |
internal/daemon/fcproc |
Firecracker process primitives (bridge, tap, binary, PID, kill, wait). |
internal/daemon/imagemgr |
Image subsystem pure helpers: validators, staging, build script gen. |
internal/daemon/workspace |
Workspace helpers: git inspection, copy prep, guest import script. |
All subpackages are leaves — no intra-daemon subpackage imports another.
Lock ordering
Acquire in this order, release in reverse. Never acquire in the opposite direction.
VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id]
→ {VMService.createVMMu, ImageService.imageOpsMu}
→ subsystem-local locks
vmLocks[id] and workspaceLocks[id] are NEVER held at the same
time. workspace.prepare acquires vmLocks[id] just long enough to
validate VM state, releases it, then acquires workspaceLocks[id]
for the guest I/O phase. Regular lifecycle ops (start, stop,
delete, set) do NOT do this split — they hold vmLocks[id]
across the whole flow.
Subsystem-local locks (tapPool.mu, opstate.Registry mu,
handleCache.mu) are leaves. They do not contend with each other.
Notes:
vmLocks[id]is the outer lock for any operation scoped to a single VM. Acquired viaVMService.withVMLockByID/withVMLockByRef. The callback runs under the lock — treat the whole function body as critical section.createVMMuis held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it.imageOpsMuis held only across the publication atom (recheck name- atomic rename + UpsertImage, or the equivalent for Register / Promote / Delete). Network fetch, ext4 build, and file copies run unlocked.
- Holding a subsystem-local lock while calling into guest SSH is discouraged; copy needed state out under the lock and release before blocking I/O.
Reconcile and background work
Daemon.reconcile(ctx) is the orchestrator run at startup. It
rehydrates the handle cache, reaps stale VMs, and republishes DNS
records. Daemon.backgroundLoop() is the ticker fan-out —
VMService.pollStats, VMService.stopStaleVMs, and
VMService.pruneVMCreateOperations run on independent tickers.
External API
Only internal/cli imports this package. The surface is:
daemon.Open(ctx) (*Daemon, error)(*Daemon).Serve(ctx) error(*Daemon).Close() errordaemon.Doctor(...)— host diagnostics (no receiver).
All other methods live on the four services and are reached only
through the RPC dispatch switch in daemon.go. They are free to
move/rename during refactoring.