# `internal/daemon` architecture This document describes the current daemon package layout: the `Daemon` composition root, the subpackages that own stateless helpers and shared primitives, and the lock ordering every caller must respect. ## Composition `Daemon` is the composition root. Subsystem state and locks live on their owning types: - Layout, config, store, runner, logger, pid — infrastructure handles. - `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for the **entire lifecycle op** on that VM: a `start` holds it across preflight, bridge setup, firecracker spawn, and post-boot wiring (seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` calls against the same VM therefore serialise; calls against different VMs run independently. If you need a slow guest-side operation to NOT block lifecycle ops on the same VM, scope it out of the lock explicitly the way `workspace.prepare` does (see below). - `workspaceLocks vmLockSet` — per-VM mutex scoped to `workspace.prepare` / `workspace.export`. These ops acquire `vmLocks[id]` only long enough to validate VM state + snapshot the fields they need, release it, then acquire `workspaceLocks[id]` for the slow guest I/O phase. That keeps `vm stop` / `delete` / `restart` from queueing behind a running tar import. - `handles *handleCache` — in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). The cache is rebuildable: each VM directory holds a small `handles.json` scratch file that the daemon reads at startup to reconstruct the cache and verify processes against `/proc` via pgrep. Nothing in the durable `vms` SQLite row describes transient kernel state. See `internal/daemon/vm_handles.go`. - `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM` resolves the image (possibly auto-pulling, which self-locks on `imageOpsMu`) and parses sizing flags outside this lock, then holds `createVMMu` only to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only. Parallel `vm create` calls therefore overlap on image resolution and boot; they contend only across the millisecond-scale name+IP claim. - `imageOpsMu sync.Mutex` — narrow **publication** mutex. `PullImage` (both bundle and OCI paths), `RegisterImage`, `PromoteImage`, and `DeleteImage` do their slow work (network fetch, ext4 build, ownership fixup, file copy, SSH-key seeding) without this lock and acquire it only for the commit atom: recheck name free, atomic rename of the staging dir to its final home, upsert the store row. Two pulls for different images run fully in parallel; two pulls that race to the same name are resolved at the recheck — the loser fails fast and its staging dir is cleaned up. - `createOps opstate.Registry[*vmCreateOperationState]` — in-flight VM create operations; owns its own lock. - `tapPool tapPool` — TAP interface pool; owns its own lock. - `listener`, `vmDNS` — networking. - `vmCaps` — registered VM capability hooks. - `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`, `requestHandler`, `guestWaitForSSH`, `guestDial`, `workspaceInspectRepo`, `workspaceImport` — injectable seams used by tests. ## Subpackages Stateless helpers that don't need the `Daemon` composition root have been lifted into subpackages. Lifecycle orchestration, image-registry orchestration, host networking bootstrap, background reconciliation, and the JSON-RPC dispatch all still live in this package — it is not "just orchestration." ~29 files and ~130 `func (d *Daemon)` methods share the root struct today. A future project would be to split VM lifecycle, image management, and the background reconciler into services with explicit interfaces; that's out of scope for v0.1.0. Each subpackage takes explicit dependencies (typically a `system.Runner`-compatible interface) and holds no global state beyond small test seams. | Subpackage | Purpose | | --------------------------------- | ---------------------------------------------------------------------- | | `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. | | `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. | | `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). | | `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. | | `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. | All subpackages are leaves — no intra-daemon subpackage imports another. ## Lock ordering Acquire in this order, release in reverse. Never acquire in the opposite direction. ``` vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks ``` `vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same time. `workspace.prepare` acquires `vmLocks[id]` just long enough to validate VM state, releases it, then acquires `workspaceLocks[id]` for the guest I/O phase. Regular lifecycle ops (`start`, `stop`, `delete`, `set`) do NOT do this split — they hold `vmLocks[id]` across the whole flow. Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu) are leaves. They do not contend with each other. Notes: - `vmLocks[id]` is the outer lock for any operation scoped to a single VM. Acquired via `withVMLockByID` / `withVMLockByRef`. The callback runs under the lock — treat the whole function body as critical section. - `createVMMu` is held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it. - `imageOpsMu` is held only across the publication atom (recheck name + atomic rename + UpsertImage, or the equivalent for Register / Promote / Delete). Network fetch, ext4 build, and file copies run unlocked. - Holding a subsystem-local lock while calling into guest SSH is discouraged; copy needed state out under the lock and release before blocking I/O. ## External API Only `internal/cli` imports this package. The surface is: - `daemon.Open(ctx) (*Daemon, error)` - `(*Daemon).Serve(ctx) error` - `(*Daemon).Close() error` - `daemon.Doctor(...)` — host diagnostics (no receiver). All other `*Daemon` methods are reached only through the RPC `dispatch` switch in `daemon.go` and are free to move/rename during refactoring.