daemon: correct ARCHITECTURE doc to match actual package shape + lock scope
Two promises the doc was making that the code doesn't keep: 1. "Helpers moved out so the package stays focused on orchestration." The package still has ~29 files and ~130 func (d *Daemon) methods wiring VM lifecycle, image management, host networking, background reconciliation, and JSON-RPC dispatch. Calling it "just orchestration" sets readers up for surprise. Rewrite the subpackages preamble to say so, and flag the service split as a post-v0.1.0 project. 2. "vmLocks[id] is held only across short synchronous state validation and DB mutations." That's what workspace.prepare does; regular lifecycle ops (start/stop/delete/set) go through withVMLockByRef and hold the lock across the whole callback body, which for `start` means preflight + bridge + firecracker spawn + post-boot wiring. Rewrite the vmLocks bullet and the lock-ordering section to say that explicitly, so readers don't build "surely my long flow under the lock can't be what the doc means" reasoning on top of a false premise. Doc-only change. Code behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
34dd7644d8
commit
e69810610a
2 changed files with 36 additions and 15 deletions
|
|
@ -10,14 +10,20 @@ primitives, and the lock ordering every caller must respect.
|
|||
owning types:
|
||||
|
||||
- Layout, config, store, runner, logger, pid — infrastructure handles.
|
||||
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held only
|
||||
across short, synchronous state validation and DB mutations so slow
|
||||
guest I/O does not block lifecycle ops on the same VM.
|
||||
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for
|
||||
the **entire lifecycle op** on that VM: a `start` holds it across
|
||||
preflight, bridge setup, firecracker spawn, and post-boot wiring
|
||||
(seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` calls
|
||||
against the same VM therefore serialise; calls against different VMs
|
||||
run independently. If you need a slow guest-side operation to NOT
|
||||
block lifecycle ops on the same VM, scope it out of the lock
|
||||
explicitly the way `workspace.prepare` does (see below).
|
||||
- `workspaceLocks vmLockSet` — per-VM mutex scoped to
|
||||
`workspace.prepare` / `workspace.export`. Serialises concurrent
|
||||
workspace operations on a single VM (two simultaneous tar imports
|
||||
would clobber each other) without touching `vmLocks`, so
|
||||
`vm stop` / `delete` / `restart` never queue behind a slow import.
|
||||
`workspace.prepare` / `workspace.export`. These ops acquire
|
||||
`vmLocks[id]` only long enough to validate VM state + snapshot the
|
||||
fields they need, release it, then acquire `workspaceLocks[id]` for
|
||||
the slow guest I/O phase. That keeps `vm stop` / `delete` / `restart`
|
||||
from queueing behind a running tar import.
|
||||
- `handles *handleCache` — in-memory map of per-VM transient kernel/
|
||||
process handles (PID, tap device, loop devices, DM target). The
|
||||
cache is rebuildable: each VM directory holds a small
|
||||
|
|
@ -40,10 +46,18 @@ owning types:
|
|||
|
||||
## Subpackages
|
||||
|
||||
Pure helpers have moved into subpackages so the daemon package itself stays
|
||||
focused on orchestration. Each subpackage takes explicit dependencies
|
||||
(typically a `system.Runner`-compatible interface) and holds no global
|
||||
state beyond small test seams.
|
||||
Stateless helpers that don't need the `Daemon` composition root have
|
||||
been lifted into subpackages. Lifecycle orchestration, image-registry
|
||||
orchestration, host networking bootstrap, background reconciliation,
|
||||
and the JSON-RPC dispatch all still live in this package — it is not
|
||||
"just orchestration." ~29 files and ~130 `func (d *Daemon)` methods
|
||||
share the root struct today. A future project would be to split VM
|
||||
lifecycle, image management, and the background reconciler into
|
||||
services with explicit interfaces; that's out of scope for v0.1.0.
|
||||
|
||||
Each subpackage takes explicit dependencies (typically a
|
||||
`system.Runner`-compatible interface) and holds no global state beyond
|
||||
small test seams.
|
||||
|
||||
| Subpackage | Purpose |
|
||||
| --------------------------------- | ---------------------------------------------------------------------- |
|
||||
|
|
@ -67,7 +81,9 @@ vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsys
|
|||
`vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same
|
||||
time. `workspace.prepare` acquires `vmLocks[id]` just long enough to
|
||||
validate VM state, releases it, then acquires `workspaceLocks[id]`
|
||||
for the guest I/O phase.
|
||||
for the guest I/O phase. Regular lifecycle ops (`start`, `stop`,
|
||||
`delete`, `set`) do NOT do this split — they hold `vmLocks[id]`
|
||||
across the whole flow.
|
||||
|
||||
Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu) are leaves.
|
||||
They do not contend with each other.
|
||||
|
|
@ -75,7 +91,8 @@ They do not contend with each other.
|
|||
Notes:
|
||||
|
||||
- `vmLocks[id]` is the outer lock for any operation scoped to a single VM.
|
||||
Acquired via `withVMLockByID` / `withVMLockByRef`.
|
||||
Acquired via `withVMLockByID` / `withVMLockByRef`. The callback runs
|
||||
under the lock — treat the whole function body as critical section.
|
||||
- `createVMMu` and `imageOpsMu` are narrow: each guards one family of
|
||||
mutations and is released before any blocking guest I/O.
|
||||
- Holding a subsystem-local lock while calling into guest SSH is
|
||||
|
|
|
|||
|
|
@ -67,6 +67,10 @@
|
|||
//
|
||||
// vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks
|
||||
//
|
||||
// Subsystem-local locks (tapPool.mu, opstate.Registry mu) are leaves and
|
||||
// do not contend with each other. See ARCHITECTURE.md for details.
|
||||
// vmLocks[id] is held across entire lifecycle ops (start/stop/delete/set),
|
||||
// not just a validation window — callers that want to avoid blocking
|
||||
// lifecycle on slow guest I/O must explicitly split off to
|
||||
// workspaceLocks[id] the way workspace.prepare does. Subsystem-local
|
||||
// locks (tapPool.mu, opstate.Registry mu) are leaves and do not contend
|
||||
// with each other. See ARCHITECTURE.md for details.
|
||||
package daemon
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue