Update ARCHITECTURE.md's Composition section to reflect the finished
split: capabilities carry explicit service-pointer fields, nothing
reaches *Daemon at dispatch time, and wireServices(d) is the single
entry point that builds services + capabilities eagerly (from Open
in production, from tests after constructing &Daemon{...} literals).
Removes the paragraph admitting capability→*Daemon coupling and the
lazy-init getters justification, neither of which applies anymore.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
186 lines
8.9 KiB
Markdown
186 lines
8.9 KiB
Markdown
# `internal/daemon` architecture
|
|
|
|
This document describes the current daemon package layout: the `Daemon`
|
|
composition root, the four services it wires together, the subpackages
|
|
that own stateless helpers, and the lock ordering every caller must
|
|
respect.
|
|
|
|
## Composition
|
|
|
|
`Daemon` is a thin composition root. It holds shared infrastructure
|
|
(store, runner, logger, layout, config, listener) plus pointers to
|
|
four focused services. RPC dispatch is a pure forwarder into those
|
|
services; no lifecycle / image / workspace / networking behaviour
|
|
lives on `*Daemon` itself.
|
|
|
|
```
|
|
Daemon
|
|
├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process,
|
|
│ DM snapshots, vsock readiness
|
|
├── *ImageService — register, promote, delete, pull (bundle + OCI),
|
|
│ kernel catalog, managed-seed refresh
|
|
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
|
|
│ + git-identity sync onto the work disk
|
|
└── *VMService — VM lifecycle (create/start/stop/restart/kill/
|
|
delete/set), stats polling, ports query,
|
|
handle cache, per-VM lock set, create-op
|
|
registry, preflight validation
|
|
```
|
|
|
|
Each service owns its own state. Cross-service calls go through narrow
|
|
consumer-defined seams:
|
|
|
|
- `WorkspaceService` does not hold a `*VMService` pointer. It takes
|
|
function-typed deps (`vmResolver`, `aliveChecker`, `withVMLockByRef`,
|
|
`imageResolver`, `imageWorkSeed`) so it sees exactly the operations
|
|
it needs and nothing more. Those deps are captured as closures so
|
|
construction-order cycles don't recur.
|
|
- `VMService` holds direct pointers to `*HostNetwork`, `*ImageService`,
|
|
and `*WorkspaceService`. Orchestrating a VM start really does compose
|
|
all three (bridge + tap + image resolution + work-disk sync), and
|
|
declaring a function-typed interface for every call would balloon
|
|
the surface for no win — services are unexported, so package-external
|
|
code can never reach them.
|
|
- Capability hooks do not take `*Daemon`. Each capability is a struct
|
|
with explicit service-pointer fields (`workDiskCapability{vm, ws,
|
|
store, defaultImageName}`, `dnsCapability{net}`, `natCapability{vm,
|
|
net, logger}`) populated at wiring time. `VMService` invokes them
|
|
through a `capabilityHooks` struct (function-typed bag) populated at
|
|
construction; neither the service nor any capability has a `*Daemon`
|
|
pointer.
|
|
|
|
Services + capabilities are built eagerly by `wireServices(d)`, called
|
|
once from `Daemon.Open` after the composition root's infrastructure is
|
|
populated, and once per test that constructs a `&Daemon{...}` literal.
|
|
Tests that want to stub a particular service or the capability list
|
|
assign the field before calling `wireServices` — the helper is
|
|
idempotent and skips anything already set.
|
|
|
|
## Service state
|
|
|
|
### `HostNetwork` (`host_network.go`, `nat.go`, `dns_routing.go`, `tap_pool.go`, `snapshot.go`)
|
|
|
|
- `tapPool` — TAP interface pool, owns its own lock.
|
|
- `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names.
|
|
- No direct VM-state access. Where an operation needs a VM's tap name
|
|
(e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so
|
|
the caller (VMService) resolves them first.
|
|
|
|
### `ImageService` (`image_service.go`, `images.go`, `images_pull.go`, `image_seed.go`, `kernels.go`)
|
|
|
|
- `imageOpsMu sync.Mutex` — the publication-window lock. Held only
|
|
across the recheck-name + atomic-rename + UpsertImage commit atom.
|
|
Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.
|
|
- Test seams `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`
|
|
are struct fields (not package globals), so tests inject per-instance
|
|
fakes.
|
|
|
|
### `WorkspaceService` (`workspace_service.go`, `workspace.go`, `vm_authsync.go`)
|
|
|
|
- `workspaceLocks vmLockSet` — per-VM mutex scoped to
|
|
`workspace.prepare` / `workspace.export`. These ops acquire
|
|
`vmLocks[id]` (on VMService) only long enough to validate VM state
|
|
and snapshot the fields they need, then release it and acquire
|
|
`workspaceLocks[id]` for the slow guest I/O phase. That keeps
|
|
`vm stop` / `delete` / `restart` from queueing behind a running tar
|
|
import.
|
|
- Test seams `workspaceInspectRepo`, `workspaceImport` are per-instance
|
|
fields.
|
|
|
|
### `VMService` (`vm_service.go`, `vm_lifecycle.go`, `vm_create.go`, `vm_create_ops.go`, `vm_stats.go`, `vm_set.go`, `vm_disk.go`, `vm_handles.go`, `vm_authsync.go` (via WorkspaceService), `preflight.go`, `ports.go`, `vm.go`)
|
|
|
|
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for
|
|
the **entire lifecycle op** on that VM: `start` holds it across
|
|
preflight, bridge setup, firecracker spawn, and post-boot wiring
|
|
(seconds to tens of seconds). Two `start`/`stop`/`delete`/`set`
|
|
calls against the same VM therefore serialise; calls against
|
|
different VMs run independently.
|
|
- `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM`
|
|
resolves the image (possibly auto-pulling, which self-locks on
|
|
`imageOpsMu`) and parses sizing flags outside this lock, then holds
|
|
`createVMMu` only to re-check that the requested VM name is still
|
|
free, allocate the next guest IP, and insert the initial "created"
|
|
row. The subsequent boot flow runs under the per-VM lock only.
|
|
- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight
|
|
async create operations; owns its own lock.
|
|
- `handles *handleCache` — in-memory map of per-VM transient kernel/
|
|
process handles (PID, tap device, loop devices, DM target). Each
|
|
VM directory holds a small `handles.json` scratch file so the
|
|
cache can be rebuilt at daemon startup.
|
|
- Test seams `guestWaitForSSH`, `guestDial` are per-instance fields.
|
|
|
|
## Subpackages
|
|
|
|
Stateless helpers with no need for a service pointer live in
|
|
subpackages. Each takes explicit dependencies (typically a
|
|
`system.Runner`-compatible interface) and holds no global state beyond
|
|
small test seams.
|
|
|
|
| Subpackage | Purpose |
|
|
| ---------------------------- | ---------------------------------------------------------------------- |
|
|
| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. |
|
|
| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. |
|
|
| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). |
|
|
| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. |
|
|
| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. |
|
|
|
|
All subpackages are leaves — no intra-daemon subpackage imports another.
|
|
|
|
## Lock ordering
|
|
|
|
Acquire in this order, release in reverse. Never acquire in the
|
|
opposite direction.
|
|
|
|
```
|
|
VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id]
|
|
→ {VMService.createVMMu, ImageService.imageOpsMu}
|
|
→ subsystem-local locks
|
|
```
|
|
|
|
`vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same
|
|
time. `workspace.prepare` acquires `vmLocks[id]` just long enough to
|
|
validate VM state, releases it, then acquires `workspaceLocks[id]`
|
|
for the guest I/O phase. Regular lifecycle ops (`start`, `stop`,
|
|
`delete`, `set`) do NOT do this split — they hold `vmLocks[id]`
|
|
across the whole flow.
|
|
|
|
Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu,
|
|
`handleCache.mu`) are leaves. They do not contend with each other.
|
|
|
|
Notes:
|
|
|
|
- `vmLocks[id]` is the outer lock for any operation scoped to a single
|
|
VM. Acquired via `VMService.withVMLockByID` / `withVMLockByRef`. The
|
|
callback runs under the lock — treat the whole function body as
|
|
critical section.
|
|
- `createVMMu` is held only across the VM-name reservation + IP
|
|
allocation + initial UpsertVM. Image resolution and the full boot
|
|
flow happen outside it.
|
|
- `imageOpsMu` is held only across the publication atom (recheck name
|
|
+ atomic rename + UpsertImage, or the equivalent for Register /
|
|
Promote / Delete). Network fetch, ext4 build, and file copies run
|
|
unlocked.
|
|
- Holding a subsystem-local lock while calling into guest SSH is
|
|
discouraged; copy needed state out under the lock and release before
|
|
blocking I/O.
|
|
|
|
## Reconcile and background work
|
|
|
|
`Daemon.reconcile(ctx)` is the orchestrator run at startup. It
|
|
rehydrates the handle cache, reaps stale VMs, and republishes DNS
|
|
records. `Daemon.backgroundLoop()` is the ticker fan-out —
|
|
`VMService.pollStats`, `VMService.stopStaleVMs`, and
|
|
`VMService.pruneVMCreateOperations` run on independent tickers.
|
|
|
|
## External API
|
|
|
|
Only `internal/cli` imports this package. The surface is:
|
|
|
|
- `daemon.Open(ctx) (*Daemon, error)`
|
|
- `(*Daemon).Serve(ctx) error`
|
|
- `(*Daemon).Close() error`
|
|
- `daemon.Doctor(...)` — host diagnostics (no receiver).
|
|
|
|
All other methods live on the four services and are reached only
|
|
through the RPC `dispatch` switch in `daemon.go`. They are free to
|
|
move/rename during refactoring.
|