banger/internal/daemon/ARCHITECTURE.md
Thales Maciel 011b59a72f
daemon split (8/8): document capability decoupling + wireServices
Update ARCHITECTURE.md's Composition section to reflect the finished
split: capabilities carry explicit service-pointer fields, nothing
reaches *Daemon at dispatch time, and wireServices(d) is the single
entry point that builds services + capabilities eagerly (from Open
in production, from tests after constructing &Daemon{...} literals).

Removes the paragraph admitting capability→*Daemon coupling and the
lazy-init getters justification, neither of which applies anymore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:59:39 -03:00

186 lines
8.9 KiB
Markdown

# `internal/daemon` architecture
This document describes the current daemon package layout: the `Daemon`
composition root, the four services it wires together, the subpackages
that own stateless helpers, and the lock ordering every caller must
respect.
## Composition
`Daemon` is a thin composition root. It holds shared infrastructure
(store, runner, logger, layout, config, listener) plus pointers to
four focused services. RPC dispatch is a pure forwarder into those
services; no lifecycle / image / workspace / networking behaviour
lives on `*Daemon` itself.
```
Daemon
├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process,
│ DM snapshots, vsock readiness
├── *ImageService — register, promote, delete, pull (bundle + OCI),
│ kernel catalog, managed-seed refresh
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
│ + git-identity sync onto the work disk
└── *VMService — VM lifecycle (create/start/stop/restart/kill/
delete/set), stats polling, ports query,
handle cache, per-VM lock set, create-op
registry, preflight validation
```
Each service owns its own state. Cross-service calls go through narrow
consumer-defined seams:
- `WorkspaceService` does not hold a `*VMService` pointer. It takes
function-typed deps (`vmResolver`, `aliveChecker`, `withVMLockByRef`,
`imageResolver`, `imageWorkSeed`) so it sees exactly the operations
it needs and nothing more. Those deps are captured as closures so
construction-order cycles don't recur.
- `VMService` holds direct pointers to `*HostNetwork`, `*ImageService`,
and `*WorkspaceService`. Orchestrating a VM start really does compose
all three (bridge + tap + image resolution + work-disk sync), and
declaring a function-typed interface for every call would balloon
the surface for no win — services are unexported, so package-external
code can never reach them.
- Capability hooks do not take `*Daemon`. Each capability is a struct
with explicit service-pointer fields (`workDiskCapability{vm, ws,
store, defaultImageName}`, `dnsCapability{net}`, `natCapability{vm,
net, logger}`) populated at wiring time. `VMService` invokes them
through a `capabilityHooks` struct (function-typed bag) populated at
construction; neither the service nor any capability has a `*Daemon`
pointer.
Services + capabilities are built eagerly by `wireServices(d)`, called
once from `Daemon.Open` after the composition root's infrastructure is
populated, and once per test that constructs a `&Daemon{...}` literal.
Tests that want to stub a particular service or the capability list
assign the field before calling `wireServices` — the helper is
idempotent and skips anything already set.
## Service state
### `HostNetwork` (`host_network.go`, `nat.go`, `dns_routing.go`, `tap_pool.go`, `snapshot.go`)
- `tapPool` — TAP interface pool, owns its own lock.
- `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names.
- No direct VM-state access. Where an operation needs a VM's tap name
(e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so
the caller (VMService) resolves them first.
### `ImageService` (`image_service.go`, `images.go`, `images_pull.go`, `image_seed.go`, `kernels.go`)
- `imageOpsMu sync.Mutex` — the publication-window lock. Held only
across the recheck-name + atomic-rename + UpsertImage commit atom.
Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.
- Test seams `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`
are struct fields (not package globals), so tests inject per-instance
fakes.
### `WorkspaceService` (`workspace_service.go`, `workspace.go`, `vm_authsync.go`)
- `workspaceLocks vmLockSet` — per-VM mutex scoped to
`workspace.prepare` / `workspace.export`. These ops acquire
`vmLocks[id]` (on VMService) only long enough to validate VM state
and snapshot the fields they need, then release it and acquire
`workspaceLocks[id]` for the slow guest I/O phase. That keeps
`vm stop` / `delete` / `restart` from queueing behind a running tar
import.
- Test seams `workspaceInspectRepo`, `workspaceImport` are per-instance
fields.
### `VMService` (`vm_service.go`, `vm_lifecycle.go`, `vm_create.go`, `vm_create_ops.go`, `vm_stats.go`, `vm_set.go`, `vm_disk.go`, `vm_handles.go`, `vm_authsync.go` (via WorkspaceService), `preflight.go`, `ports.go`, `vm.go`)
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for
the **entire lifecycle op** on that VM: `start` holds it across
preflight, bridge setup, firecracker spawn, and post-boot wiring
(seconds to tens of seconds). Two `start`/`stop`/`delete`/`set`
calls against the same VM therefore serialise; calls against
different VMs run independently.
- `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM`
resolves the image (possibly auto-pulling, which self-locks on
`imageOpsMu`) and parses sizing flags outside this lock, then holds
`createVMMu` only to re-check that the requested VM name is still
free, allocate the next guest IP, and insert the initial "created"
row. The subsequent boot flow runs under the per-VM lock only.
- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight
async create operations; owns its own lock.
- `handles *handleCache` — in-memory map of per-VM transient kernel/
process handles (PID, tap device, loop devices, DM target). Each
VM directory holds a small `handles.json` scratch file so the
cache can be rebuilt at daemon startup.
- Test seams `guestWaitForSSH`, `guestDial` are per-instance fields.
## Subpackages
Stateless helpers with no need for a service pointer live in
subpackages. Each takes explicit dependencies (typically a
`system.Runner`-compatible interface) and holds no global state beyond
small test seams.
| Subpackage | Purpose |
| ---------------------------- | ---------------------------------------------------------------------- |
| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. |
| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. |
| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). |
| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. |
| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. |
All subpackages are leaves — no intra-daemon subpackage imports another.
## Lock ordering
Acquire in this order, release in reverse. Never acquire in the
opposite direction.
```
VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id]
→ {VMService.createVMMu, ImageService.imageOpsMu}
→ subsystem-local locks
```
`vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same
time. `workspace.prepare` acquires `vmLocks[id]` just long enough to
validate VM state, releases it, then acquires `workspaceLocks[id]`
for the guest I/O phase. Regular lifecycle ops (`start`, `stop`,
`delete`, `set`) do NOT do this split — they hold `vmLocks[id]`
across the whole flow.
Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu,
`handleCache.mu`) are leaves. They do not contend with each other.
Notes:
- `vmLocks[id]` is the outer lock for any operation scoped to a single
VM. Acquired via `VMService.withVMLockByID` / `withVMLockByRef`. The
callback runs under the lock — treat the whole function body as
critical section.
- `createVMMu` is held only across the VM-name reservation + IP
allocation + initial UpsertVM. Image resolution and the full boot
flow happen outside it.
- `imageOpsMu` is held only across the publication atom (recheck name
+ atomic rename + UpsertImage, or the equivalent for Register /
Promote / Delete). Network fetch, ext4 build, and file copies run
unlocked.
- Holding a subsystem-local lock while calling into guest SSH is
discouraged; copy needed state out under the lock and release before
blocking I/O.
## Reconcile and background work
`Daemon.reconcile(ctx)` is the orchestrator run at startup. It
rehydrates the handle cache, reaps stale VMs, and republishes DNS
records. `Daemon.backgroundLoop()` is the ticker fan-out —
`VMService.pollStats`, `VMService.stopStaleVMs`, and
`VMService.pruneVMCreateOperations` run on independent tickers.
## External API
Only `internal/cli` imports this package. The surface is:
- `daemon.Open(ctx) (*Daemon, error)`
- `(*Daemon).Serve(ctx) error`
- `(*Daemon).Close() error`
- `daemon.Doctor(...)` — host diagnostics (no receiver).
All other methods live on the four services and are reached only
through the RPC `dispatch` switch in `daemon.go`. They are free to
move/rename during refactoring.