# `internal/daemon` architecture This document describes the current daemon package layout: the `Daemon` composition root, the four services it wires together, the subpackages that own stateless helpers, and the lock ordering every caller must respect. ## Composition `Daemon` is a thin composition root. It holds shared infrastructure (store, runner, logger, layout, config, listener) plus pointers to four focused services. RPC dispatch is a pure forwarder into those services; no lifecycle / image / workspace / networking behaviour lives on `*Daemon` itself. ``` Daemon ├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process, │ DM snapshots, vsock readiness ├── *ImageService — register, promote, delete, pull (bundle + OCI), │ kernel catalog, managed-seed refresh ├── *WorkspaceService — workspace.prepare / workspace.export, auth-key │ + git-identity sync onto the work disk └── *VMService — VM lifecycle (create/start/stop/restart/kill/ delete/set), stats polling, ports query, handle cache, per-VM lock set, create-op registry, preflight validation ``` Each service owns its own state. Cross-service calls go through narrow consumer-defined seams: - `WorkspaceService` does not hold a `*VMService` pointer. It takes function-typed deps (`vmResolver`, `aliveChecker`, `withVMLockByRef`, `imageResolver`, `imageWorkSeed`) so it sees exactly the operations it needs and nothing more. Those deps are captured as closures so construction-order cycles don't recur. - `VMService` holds direct pointers to `*HostNetwork`, `*ImageService`, and `*WorkspaceService`. Orchestrating a VM start really does compose all three (bridge + tap + image resolution + work-disk sync), and declaring a function-typed interface for every call would balloon the surface for no win — services are unexported, so package-external code can never reach them. - Capability hooks do not take `*Daemon`. Each capability is a struct with explicit service-pointer fields (`workDiskCapability{vm, ws, store, defaultImageName}`, `dnsCapability{net}`, `natCapability{vm, net, logger}`) populated at wiring time. `VMService` invokes them through a `capabilityHooks` struct (function-typed bag) populated at construction; neither the service nor any capability has a `*Daemon` pointer. Services + capabilities are built eagerly by `wireServices(d)`, called once from `Daemon.Open` after the composition root's infrastructure is populated, and once per test that constructs a `&Daemon{...}` literal. Tests that want to stub a particular service or the capability list assign the field before calling `wireServices` — the helper is idempotent and skips anything already set. ## Service state ### `HostNetwork` (`host_network.go`, `nat.go`, `dns_routing.go`, `tap_pool.go`, `snapshot.go`) - `tapPool` — TAP interface pool, owns its own lock. - `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names. - No direct VM-state access. Where an operation needs a VM's tap name (e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so the caller (VMService) resolves them first. ### `ImageService` (`image_service.go`, `images.go`, `images_pull.go`, `image_seed.go`, `kernels.go`) - `imageOpsMu sync.Mutex` — the publication-window lock. Held only across the recheck-name + atomic-rename + UpsertImage commit atom. Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked. - Test seams `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch` are struct fields (not package globals), so tests inject per-instance fakes. ### `WorkspaceService` (`workspace_service.go`, `workspace.go`, `vm_authsync.go`) - `workspaceLocks vmLockSet` — per-VM mutex scoped to `workspace.prepare` / `workspace.export`. These ops acquire `vmLocks[id]` (on VMService) only long enough to validate VM state and snapshot the fields they need, then release it and acquire `workspaceLocks[id]` for the slow guest I/O phase. That keeps `vm stop` / `delete` / `restart` from queueing behind a running tar import. - Test seams `workspaceInspectRepo`, `workspaceImport` are per-instance fields. ### `VMService` (`vm_service.go`, `vm_lifecycle.go`, `vm_create.go`, `vm_create_ops.go`, `vm_stats.go`, `vm_set.go`, `vm_disk.go`, `vm_handles.go`, `vm_authsync.go` (via WorkspaceService), `preflight.go`, `ports.go`, `vm.go`) - `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for the **entire lifecycle op** on that VM: `start` holds it across preflight, bridge setup, firecracker spawn, and post-boot wiring (seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` calls against the same VM therefore serialise; calls against different VMs run independently. - `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM` resolves the image (possibly auto-pulling, which self-locks on `imageOpsMu`) and parses sizing flags outside this lock, then holds `createVMMu` only to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only. - `createOps opstate.Registry[*vmCreateOperationState]` — in-flight async create operations; owns its own lock. - `handles *handleCache` — in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). Each VM directory holds a small `handles.json` scratch file so the cache can be rebuilt at daemon startup. - `vsockHostDevice` — path to `/dev/vhost-vsock` the preflight and doctor checks RequireFile against. Defaulted in wireServices; tests point at a tempfile to make the check pass without the kernel module loaded. Guest-SSH test seams live on `*Daemon` (`d.guestWaitForSSH`, `d.guestDial`), not VMService — workspace prepare is the only path that reaches guest SSH, and it gets there through closures WorkspaceService captured at wiring time. ## Subpackages Stateless helpers with no need for a service pointer live in subpackages. Each takes explicit dependencies (typically a `system.Runner`-compatible interface) and holds no global state beyond small test seams. | Subpackage | Purpose | | ---------------------------- | ---------------------------------------------------------------------- | | `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. | | `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. | | `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). | | `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. | | `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. | All subpackages are leaves — no intra-daemon subpackage imports another. ## Lock ordering Acquire in this order, release in reverse. Never acquire in the opposite direction. ``` VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id] → {VMService.createVMMu, ImageService.imageOpsMu} → subsystem-local locks ``` `vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same time. `workspace.prepare` acquires `vmLocks[id]` just long enough to validate VM state, releases it, then acquires `workspaceLocks[id]` for the guest I/O phase. Regular lifecycle ops (`start`, `stop`, `delete`, `set`) do NOT do this split — they hold `vmLocks[id]` across the whole flow. Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu, `handleCache.mu`) are leaves. They do not contend with each other. Notes: - `vmLocks[id]` is the outer lock for any operation scoped to a single VM. Acquired via `VMService.withVMLockByID` / `withVMLockByRef`. The callback runs under the lock — treat the whole function body as critical section. - `createVMMu` is held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it. - `imageOpsMu` is held only across the publication atom (recheck name + atomic rename + UpsertImage, or the equivalent for Register / Promote / Delete). Network fetch, ext4 build, and file copies run unlocked. - Holding a subsystem-local lock while calling into guest SSH is discouraged; copy needed state out under the lock and release before blocking I/O. ## Reconcile and background work `Daemon.reconcile(ctx)` is the orchestrator run at startup. It rehydrates the handle cache, reaps stale VMs, and republishes DNS records. `Daemon.backgroundLoop()` is the ticker fan-out — `VMService.pollStats`, `VMService.stopStaleVMs`, and `VMService.pruneVMCreateOperations` run on independent tickers. ## External API Only `internal/cli` imports this package. The surface is: - `daemon.Open(ctx) (*Daemon, error)` - `(*Daemon).Serve(ctx) error` - `(*Daemon).Close() error` - `daemon.Doctor(...)` — host diagnostics (no receiver). All other methods live on the four services and are reached only through the RPC `dispatch` switch in `daemon.go`. They are free to move/rename during refactoring.