Phase 5 of the daemon god-struct refactor. Code motion landed in phases 1-4; this commit retells the architecture so the docs match the structure. ARCHITECTURE.md loses the "deferred v0.2 project" hedge about splitting services. The Composition section now describes the four services (HostNetwork, ImageService, WorkspaceService, VMService) that own behaviour, the consumer-defined seam pattern for cross-service calls, and the lazy-init getter pattern that keeps existing test literals compiling. doc.go inventories which methods live on which service, and the lock-ordering section gains the service prefixes (e.g. VMService.vmLocks instead of bare vmLocks) so readers don't have to guess which type owns which mutex. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
181 lines
8.6 KiB
Markdown
181 lines
8.6 KiB
Markdown
# `internal/daemon` architecture
|
|
|
|
This document describes the current daemon package layout: the `Daemon`
|
|
composition root, the four services it wires together, the subpackages
|
|
that own stateless helpers, and the lock ordering every caller must
|
|
respect.
|
|
|
|
## Composition
|
|
|
|
`Daemon` is a thin composition root. It holds shared infrastructure
|
|
(store, runner, logger, layout, config, listener) plus pointers to
|
|
four focused services. RPC dispatch is a pure forwarder into those
|
|
services; no lifecycle / image / workspace / networking behaviour
|
|
lives on `*Daemon` itself.
|
|
|
|
```
|
|
Daemon
|
|
├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process,
|
|
│ DM snapshots, vsock readiness
|
|
├── *ImageService — register, promote, delete, pull (bundle + OCI),
|
|
│ kernel catalog, managed-seed refresh
|
|
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
|
|
│ + git-identity sync onto the work disk
|
|
└── *VMService — VM lifecycle (create/start/stop/restart/kill/
|
|
delete/set), stats polling, ports query,
|
|
handle cache, per-VM lock set, create-op
|
|
registry, preflight validation
|
|
```
|
|
|
|
Each service owns its own state. Cross-service calls go through narrow
|
|
consumer-defined seams:
|
|
|
|
- `WorkspaceService` does not hold a `*VMService` pointer. It takes
|
|
function-typed deps (`vmResolver`, `aliveChecker`, `withVMLockByRef`,
|
|
`imageResolver`, `imageWorkSeed`) so it sees exactly the operations
|
|
it needs and nothing more. Those deps are captured as closures so
|
|
construction-order cycles don't recur.
|
|
- `VMService` holds direct pointers to `*HostNetwork`, `*ImageService`,
|
|
and `*WorkspaceService`. Orchestrating a VM start really does compose
|
|
all three (bridge + tap + image resolution + work-disk sync), and
|
|
declaring a function-typed interface for every call would balloon
|
|
the surface for no win — services are unexported, so package-external
|
|
code can never reach them.
|
|
- Capability hooks still take `*Daemon` as their receiver argument,
|
|
but `VMService` calls into them through a `capabilityHooks` struct
|
|
(function-typed bag) populated at construction. The service has no
|
|
`*Daemon` pointer.
|
|
|
|
Lazy-init getters (`d.hostNet()`, `d.imageSvc()`, `d.workspaceSvc()`,
|
|
`d.vmSvc()`) let existing test literals (`&Daemon{store: db, runner: r}`)
|
|
keep working — the getter constructs the service from whatever is on
|
|
the `Daemon` if nothing was pre-wired.
|
|
|
|
## Service state
|
|
|
|
### `HostNetwork` (`host_network.go`, `nat.go`, `dns_routing.go`, `tap_pool.go`, `snapshot.go`)
|
|
|
|
- `tapPool` — TAP interface pool, owns its own lock.
|
|
- `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names.
|
|
- No direct VM-state access. Where an operation needs a VM's tap name
|
|
(e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so
|
|
the caller (VMService) resolves them first.
|
|
|
|
### `ImageService` (`image_service.go`, `images.go`, `images_pull.go`, `image_seed.go`, `kernels.go`)
|
|
|
|
- `imageOpsMu sync.Mutex` — the publication-window lock. Held only
|
|
across the recheck-name + atomic-rename + UpsertImage commit atom.
|
|
Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.
|
|
- Test seams `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`
|
|
are struct fields (not package globals), so tests inject per-instance
|
|
fakes.
|
|
|
|
### `WorkspaceService` (`workspace_service.go`, `workspace.go`, `vm_authsync.go`)
|
|
|
|
- `workspaceLocks vmLockSet` — per-VM mutex scoped to
|
|
`workspace.prepare` / `workspace.export`. These ops acquire
|
|
`vmLocks[id]` (on VMService) only long enough to validate VM state
|
|
and snapshot the fields they need, then release it and acquire
|
|
`workspaceLocks[id]` for the slow guest I/O phase. That keeps
|
|
`vm stop` / `delete` / `restart` from queueing behind a running tar
|
|
import.
|
|
- Test seams `workspaceInspectRepo`, `workspaceImport` are per-instance
|
|
fields.
|
|
|
|
### `VMService` (`vm_service.go`, `vm_lifecycle.go`, `vm_create.go`, `vm_create_ops.go`, `vm_stats.go`, `vm_set.go`, `vm_disk.go`, `vm_handles.go`, `vm_authsync.go` (via WorkspaceService), `preflight.go`, `ports.go`, `vm.go`)
|
|
|
|
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for
|
|
the **entire lifecycle op** on that VM: `start` holds it across
|
|
preflight, bridge setup, firecracker spawn, and post-boot wiring
|
|
(seconds to tens of seconds). Two `start`/`stop`/`delete`/`set`
|
|
calls against the same VM therefore serialise; calls against
|
|
different VMs run independently.
|
|
- `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM`
|
|
resolves the image (possibly auto-pulling, which self-locks on
|
|
`imageOpsMu`) and parses sizing flags outside this lock, then holds
|
|
`createVMMu` only to re-check that the requested VM name is still
|
|
free, allocate the next guest IP, and insert the initial "created"
|
|
row. The subsequent boot flow runs under the per-VM lock only.
|
|
- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight
|
|
async create operations; owns its own lock.
|
|
- `handles *handleCache` — in-memory map of per-VM transient kernel/
|
|
process handles (PID, tap device, loop devices, DM target). Each
|
|
VM directory holds a small `handles.json` scratch file so the
|
|
cache can be rebuilt at daemon startup.
|
|
- Test seams `guestWaitForSSH`, `guestDial` are per-instance fields.
|
|
|
|
## Subpackages
|
|
|
|
Stateless helpers with no need for a service pointer live in
|
|
subpackages. Each takes explicit dependencies (typically a
|
|
`system.Runner`-compatible interface) and holds no global state beyond
|
|
small test seams.
|
|
|
|
| Subpackage | Purpose |
|
|
| ---------------------------- | ---------------------------------------------------------------------- |
|
|
| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. |
|
|
| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. |
|
|
| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). |
|
|
| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. |
|
|
| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. |
|
|
|
|
All subpackages are leaves — no intra-daemon subpackage imports another.
|
|
|
|
## Lock ordering
|
|
|
|
Acquire in this order, release in reverse. Never acquire in the
|
|
opposite direction.
|
|
|
|
```
|
|
VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id]
|
|
→ {VMService.createVMMu, ImageService.imageOpsMu}
|
|
→ subsystem-local locks
|
|
```
|
|
|
|
`vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same
|
|
time. `workspace.prepare` acquires `vmLocks[id]` just long enough to
|
|
validate VM state, releases it, then acquires `workspaceLocks[id]`
|
|
for the guest I/O phase. Regular lifecycle ops (`start`, `stop`,
|
|
`delete`, `set`) do NOT do this split — they hold `vmLocks[id]`
|
|
across the whole flow.
|
|
|
|
Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu,
|
|
`handleCache.mu`) are leaves. They do not contend with each other.
|
|
|
|
Notes:
|
|
|
|
- `vmLocks[id]` is the outer lock for any operation scoped to a single
|
|
VM. Acquired via `VMService.withVMLockByID` / `withVMLockByRef`. The
|
|
callback runs under the lock — treat the whole function body as
|
|
critical section.
|
|
- `createVMMu` is held only across the VM-name reservation + IP
|
|
allocation + initial UpsertVM. Image resolution and the full boot
|
|
flow happen outside it.
|
|
- `imageOpsMu` is held only across the publication atom (recheck name
|
|
+ atomic rename + UpsertImage, or the equivalent for Register /
|
|
Promote / Delete). Network fetch, ext4 build, and file copies run
|
|
unlocked.
|
|
- Holding a subsystem-local lock while calling into guest SSH is
|
|
discouraged; copy needed state out under the lock and release before
|
|
blocking I/O.
|
|
|
|
## Reconcile and background work
|
|
|
|
`Daemon.reconcile(ctx)` is the orchestrator run at startup. It
|
|
rehydrates the handle cache, reaps stale VMs, and republishes DNS
|
|
records. `Daemon.backgroundLoop()` is the ticker fan-out —
|
|
`VMService.pollStats`, `VMService.stopStaleVMs`, and
|
|
`VMService.pruneVMCreateOperations` run on independent tickers.
|
|
|
|
## External API
|
|
|
|
Only `internal/cli` imports this package. The surface is:
|
|
|
|
- `daemon.Open(ctx) (*Daemon, error)`
|
|
- `(*Daemon).Serve(ctx) error`
|
|
- `(*Daemon).Close() error`
|
|
- `daemon.Doctor(...)` — host diagnostics (no receiver).
|
|
|
|
All other methods live on the four services and are reached only
|
|
through the RPC `dispatch` switch in `daemon.go`. They are free to
|
|
move/rename during refactoring.
|