diff --git a/internal/daemon/ARCHITECTURE.md b/internal/daemon/ARCHITECTURE.md index d1943aa..6e13796 100644 --- a/internal/daemon/ARCHITECTURE.md +++ b/internal/daemon/ARCHITECTURE.md @@ -10,14 +10,20 @@ primitives, and the lock ordering every caller must respect. owning types: - Layout, config, store, runner, logger, pid — infrastructure handles. -- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held only - across short, synchronous state validation and DB mutations so slow - guest I/O does not block lifecycle ops on the same VM. +- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for + the **entire lifecycle op** on that VM: a `start` holds it across + preflight, bridge setup, firecracker spawn, and post-boot wiring + (seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` calls + against the same VM therefore serialise; calls against different VMs + run independently. If you need a slow guest-side operation to NOT + block lifecycle ops on the same VM, scope it out of the lock + explicitly the way `workspace.prepare` does (see below). - `workspaceLocks vmLockSet` — per-VM mutex scoped to - `workspace.prepare` / `workspace.export`. Serialises concurrent - workspace operations on a single VM (two simultaneous tar imports - would clobber each other) without touching `vmLocks`, so - `vm stop` / `delete` / `restart` never queue behind a slow import. + `workspace.prepare` / `workspace.export`. These ops acquire + `vmLocks[id]` only long enough to validate VM state + snapshot the + fields they need, release it, then acquire `workspaceLocks[id]` for + the slow guest I/O phase. That keeps `vm stop` / `delete` / `restart` + from queueing behind a running tar import. - `handles *handleCache` — in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). The cache is rebuildable: each VM directory holds a small @@ -40,10 +46,18 @@ owning types: ## Subpackages -Pure helpers have moved into subpackages so the daemon package itself stays -focused on orchestration. Each subpackage takes explicit dependencies -(typically a `system.Runner`-compatible interface) and holds no global -state beyond small test seams. +Stateless helpers that don't need the `Daemon` composition root have +been lifted into subpackages. Lifecycle orchestration, image-registry +orchestration, host networking bootstrap, background reconciliation, +and the JSON-RPC dispatch all still live in this package — it is not +"just orchestration." ~29 files and ~130 `func (d *Daemon)` methods +share the root struct today. A future project would be to split VM +lifecycle, image management, and the background reconciler into +services with explicit interfaces; that's out of scope for v0.1.0. + +Each subpackage takes explicit dependencies (typically a +`system.Runner`-compatible interface) and holds no global state beyond +small test seams. | Subpackage | Purpose | | --------------------------------- | ---------------------------------------------------------------------- | @@ -67,7 +81,9 @@ vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsys `vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same time. `workspace.prepare` acquires `vmLocks[id]` just long enough to validate VM state, releases it, then acquires `workspaceLocks[id]` -for the guest I/O phase. +for the guest I/O phase. Regular lifecycle ops (`start`, `stop`, +`delete`, `set`) do NOT do this split — they hold `vmLocks[id]` +across the whole flow. Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu) are leaves. They do not contend with each other. @@ -75,7 +91,8 @@ They do not contend with each other. Notes: - `vmLocks[id]` is the outer lock for any operation scoped to a single VM. - Acquired via `withVMLockByID` / `withVMLockByRef`. + Acquired via `withVMLockByID` / `withVMLockByRef`. The callback runs + under the lock — treat the whole function body as critical section. - `createVMMu` and `imageOpsMu` are narrow: each guards one family of mutations and is released before any blocking guest I/O. - Holding a subsystem-local lock while calling into guest SSH is diff --git a/internal/daemon/doc.go b/internal/daemon/doc.go index f83aeab..2c12cd1 100644 --- a/internal/daemon/doc.go +++ b/internal/daemon/doc.go @@ -67,6 +67,10 @@ // // vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks // -// Subsystem-local locks (tapPool.mu, opstate.Registry mu) are leaves and -// do not contend with each other. See ARCHITECTURE.md for details. +// vmLocks[id] is held across entire lifecycle ops (start/stop/delete/set), +// not just a validation window — callers that want to avoid blocking +// lifecycle on slow guest I/O must explicitly split off to +// workspaceLocks[id] the way workspace.prepare does. Subsystem-local +// locks (tapPool.mu, opstate.Registry mu) are leaves and do not contend +// with each other. See ARCHITECTURE.md for details. package daemon