diff --git a/internal/daemon/ARCHITECTURE.md b/internal/daemon/ARCHITECTURE.md index 2358693..7fafabe 100644 --- a/internal/daemon/ARCHITECTURE.md +++ b/internal/daemon/ARCHITECTURE.md @@ -1,94 +1,135 @@ # `internal/daemon` architecture This document describes the current daemon package layout: the `Daemon` -composition root, the subpackages that own stateless helpers and shared -primitives, and the lock ordering every caller must respect. +composition root, the four services it wires together, the subpackages +that own stateless helpers, and the lock ordering every caller must +respect. ## Composition -`Daemon` is the composition root. Subsystem state and locks live on their -owning types: +`Daemon` is a thin composition root. It holds shared infrastructure +(store, runner, logger, layout, config, listener) plus pointers to +four focused services. RPC dispatch is a pure forwarder into those +services; no lifecycle / image / workspace / networking behaviour +lives on `*Daemon` itself. + +``` +Daemon +├── *HostNetwork — bridge, tap pool, NAT, DNS, firecracker process, +│ DM snapshots, vsock readiness +├── *ImageService — register, promote, delete, pull (bundle + OCI), +│ kernel catalog, managed-seed refresh +├── *WorkspaceService — workspace.prepare / workspace.export, auth-key +│ + git-identity sync onto the work disk +└── *VMService — VM lifecycle (create/start/stop/restart/kill/ + delete/set), stats polling, ports query, + handle cache, per-VM lock set, create-op + registry, preflight validation +``` + +Each service owns its own state. Cross-service calls go through narrow +consumer-defined seams: + +- `WorkspaceService` does not hold a `*VMService` pointer. It takes + function-typed deps (`vmResolver`, `aliveChecker`, `withVMLockByRef`, + `imageResolver`, `imageWorkSeed`) so it sees exactly the operations + it needs and nothing more. Those deps are captured as closures so + construction-order cycles don't recur. +- `VMService` holds direct pointers to `*HostNetwork`, `*ImageService`, + and `*WorkspaceService`. Orchestrating a VM start really does compose + all three (bridge + tap + image resolution + work-disk sync), and + declaring a function-typed interface for every call would balloon + the surface for no win — services are unexported, so package-external + code can never reach them. +- Capability hooks still take `*Daemon` as their receiver argument, + but `VMService` calls into them through a `capabilityHooks` struct + (function-typed bag) populated at construction. The service has no + `*Daemon` pointer. + +Lazy-init getters (`d.hostNet()`, `d.imageSvc()`, `d.workspaceSvc()`, +`d.vmSvc()`) let existing test literals (`&Daemon{store: db, runner: r}`) +keep working — the getter constructs the service from whatever is on +the `Daemon` if nothing was pre-wired. + +## Service state + +### `HostNetwork` (`host_network.go`, `nat.go`, `dns_routing.go`, `tap_pool.go`, `snapshot.go`) + +- `tapPool` — TAP interface pool, owns its own lock. +- `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names. +- No direct VM-state access. Where an operation needs a VM's tap name + (e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so + the caller (VMService) resolves them first. + +### `ImageService` (`image_service.go`, `images.go`, `images_pull.go`, `image_seed.go`, `kernels.go`) + +- `imageOpsMu sync.Mutex` — the publication-window lock. Held only + across the recheck-name + atomic-rename + UpsertImage commit atom. + Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked. +- Test seams `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch` + are struct fields (not package globals), so tests inject per-instance + fakes. + +### `WorkspaceService` (`workspace_service.go`, `workspace.go`, `vm_authsync.go`) -- Layout, config, store, runner, logger, pid — infrastructure handles. -- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for - the **entire lifecycle op** on that VM: a `start` holds it across - preflight, bridge setup, firecracker spawn, and post-boot wiring - (seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` calls - against the same VM therefore serialise; calls against different VMs - run independently. If you need a slow guest-side operation to NOT - block lifecycle ops on the same VM, scope it out of the lock - explicitly the way `workspace.prepare` does (see below). - `workspaceLocks vmLockSet` — per-VM mutex scoped to `workspace.prepare` / `workspace.export`. These ops acquire - `vmLocks[id]` only long enough to validate VM state + snapshot the - fields they need, release it, then acquire `workspaceLocks[id]` for - the slow guest I/O phase. That keeps `vm stop` / `delete` / `restart` - from queueing behind a running tar import. -- `handles *handleCache` — in-memory map of per-VM transient kernel/ - process handles (PID, tap device, loop devices, DM target). The - cache is rebuildable: each VM directory holds a small - `handles.json` scratch file that the daemon reads at startup to - reconstruct the cache and verify processes against `/proc` via - pgrep. Nothing in the durable `vms` SQLite row describes transient - kernel state. See `internal/daemon/vm_handles.go`. + `vmLocks[id]` (on VMService) only long enough to validate VM state + and snapshot the fields they need, then release it and acquire + `workspaceLocks[id]` for the slow guest I/O phase. That keeps + `vm stop` / `delete` / `restart` from queueing behind a running tar + import. +- Test seams `workspaceInspectRepo`, `workspaceImport` are per-instance + fields. + +### `VMService` (`vm_service.go`, `vm_lifecycle.go`, `vm_create.go`, `vm_create_ops.go`, `vm_stats.go`, `vm_set.go`, `vm_disk.go`, `vm_handles.go`, `vm_authsync.go` (via WorkspaceService), `preflight.go`, `ports.go`, `vm.go`) + +- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held for + the **entire lifecycle op** on that VM: `start` holds it across + preflight, bridge setup, firecracker spawn, and post-boot wiring + (seconds to tens of seconds). Two `start`/`stop`/`delete`/`set` + calls against the same VM therefore serialise; calls against + different VMs run independently. - `createVMMu sync.Mutex` — narrow **reservation** mutex. `CreateVM` resolves the image (possibly auto-pulling, which self-locks on `imageOpsMu`) and parses sizing flags outside this lock, then holds `createVMMu` only to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only. - Parallel `vm create` calls therefore overlap on image resolution and - boot; they contend only across the millisecond-scale name+IP claim. -- `imageOpsMu sync.Mutex` — narrow **publication** mutex. `PullImage` - (both bundle and OCI paths), `RegisterImage`, `PromoteImage`, and - `DeleteImage` do their slow work (network fetch, ext4 build, - ownership fixup, file copy, SSH-key seeding) without this lock and - acquire it only for the commit atom: recheck name free, atomic - rename of the staging dir to its final home, upsert the store row. - Two pulls for different images run fully in parallel; two pulls that - race to the same name are resolved at the recheck — the loser fails - fast and its staging dir is cleaned up. -- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight VM - create operations; owns its own lock. -- `tapPool tapPool` — TAP interface pool; owns its own lock. -- `listener`, `vmDNS` — networking. -- `vmCaps` — registered VM capability hooks. -- `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`, - `requestHandler`, `guestWaitForSSH`, `guestDial`, - `workspaceInspectRepo`, `workspaceImport` — injectable seams used by tests. +- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight + async create operations; owns its own lock. +- `handles *handleCache` — in-memory map of per-VM transient kernel/ + process handles (PID, tap device, loop devices, DM target). Each + VM directory holds a small `handles.json` scratch file so the + cache can be rebuilt at daemon startup. +- Test seams `guestWaitForSSH`, `guestDial` are per-instance fields. ## Subpackages -Stateless helpers that don't need the `Daemon` composition root have -been lifted into subpackages. Lifecycle orchestration, image-registry -orchestration, host networking bootstrap, background reconciliation, -and the JSON-RPC dispatch all still live in this package — it is not -"just orchestration." ~29 files and ~130 `func (d *Daemon)` methods -share the root struct today. A future project would be to split VM -lifecycle, image management, and the background reconciler into -services with explicit interfaces; that's out of scope for v0.1.0. - -Each subpackage takes explicit dependencies (typically a +Stateless helpers with no need for a service pointer live in +subpackages. Each takes explicit dependencies (typically a `system.Runner`-compatible interface) and holds no global state beyond small test seams. -| Subpackage | Purpose | -| --------------------------------- | ---------------------------------------------------------------------- | -| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. | -| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. | -| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). | -| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. | -| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. | +| Subpackage | Purpose | +| ---------------------------- | ---------------------------------------------------------------------- | +| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. | +| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. | +| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). | +| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. | +| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. | All subpackages are leaves — no intra-daemon subpackage imports another. ## Lock ordering -Acquire in this order, release in reverse. Never acquire in the opposite -direction. +Acquire in this order, release in reverse. Never acquire in the +opposite direction. ``` -vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks +VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id] + → {VMService.createVMMu, ImageService.imageOpsMu} + → subsystem-local locks ``` `vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same @@ -98,14 +139,15 @@ for the guest I/O phase. Regular lifecycle ops (`start`, `stop`, `delete`, `set`) do NOT do this split — they hold `vmLocks[id]` across the whole flow. -Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu) are leaves. -They do not contend with each other. +Subsystem-local locks (`tapPool.mu`, `opstate.Registry` mu, +`handleCache.mu`) are leaves. They do not contend with each other. Notes: -- `vmLocks[id]` is the outer lock for any operation scoped to a single VM. - Acquired via `withVMLockByID` / `withVMLockByRef`. The callback runs - under the lock — treat the whole function body as critical section. +- `vmLocks[id]` is the outer lock for any operation scoped to a single + VM. Acquired via `VMService.withVMLockByID` / `withVMLockByRef`. The + callback runs under the lock — treat the whole function body as + critical section. - `createVMMu` is held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it. @@ -117,6 +159,14 @@ Notes: discouraged; copy needed state out under the lock and release before blocking I/O. +## Reconcile and background work + +`Daemon.reconcile(ctx)` is the orchestrator run at startup. It +rehydrates the handle cache, reaps stale VMs, and republishes DNS +records. `Daemon.backgroundLoop()` is the ticker fan-out — +`VMService.pollStats`, `VMService.stopStaleVMs`, and +`VMService.pruneVMCreateOperations` run on independent tickers. + ## External API Only `internal/cli` imports this package. The surface is: @@ -126,5 +176,6 @@ Only `internal/cli` imports this package. The surface is: - `(*Daemon).Close() error` - `daemon.Doctor(...)` — host diagnostics (no receiver). -All other `*Daemon` methods are reached only through the RPC `dispatch` -switch in `daemon.go` and are free to move/rename during refactoring. +All other methods live on the four services and are reached only +through the RPC `dispatch` switch in `daemon.go`. They are free to +move/rename during refactoring. diff --git a/internal/daemon/doc.go b/internal/daemon/doc.go index 2c12cd1..784c5c6 100644 --- a/internal/daemon/doc.go +++ b/internal/daemon/doc.go @@ -1,76 +1,74 @@ // Package daemon hosts the Banger daemon process. // -// The daemon exposes a JSON-RPC endpoint over a Unix socket. It owns VM -// lifecycle, image management, host networking bootstrap, and state -// persistence via internal/store. +// The daemon exposes a JSON-RPC endpoint over a Unix socket. The +// *Daemon type is a thin composition root: it holds shared +// infrastructure (store, runner, logger, layout, config, listener) +// plus pointers to four focused services and forwards RPCs to them. // -// The package is organised into cohesive groups. Pure stateless helpers for -// each group have been lifted into subpackages; orchestrator methods -// (Daemon receivers) stay here and compose them. +// Services: // -// Subpackages: +// *HostNetwork Bridge / tap pool / NAT / DNS / firecracker +// process / DM snapshots / vsock readiness. +// Owns tapPool and vmDNS. +// *ImageService Register / promote / delete / pull (bundle + +// OCI) / kernel catalog / managed-seed refresh. +// Owns imageOpsMu. +// *WorkspaceService workspace.prepare / workspace.export + the +// per-VM authorised-key and git-identity sync +// that runs at start. Owns workspaceLocks. +// *VMService VM lifecycle (create/start/stop/restart/kill/ +// delete/set), stats, ports, preflight. Owns +// vmLocks, createVMMu, createOps, handles. // -// internal/daemon/opstate Generic Registry[T AsyncOp] for async -// operations (VM create). +// Subpackages (stateless helpers): +// +// internal/daemon/opstate Generic Registry[T AsyncOp]. // internal/daemon/dmsnap Device-mapper COW snapshot lifecycle. -// internal/daemon/fcproc Firecracker process helpers: bridge/tap, -// binary resolution, PID lookup, wait/kill. -// internal/daemon/imagemgr Image subsystem helpers: path validation, -// artifact staging, guest provisioning script -// generator, metadata. -// internal/daemon/workspace Workspace helpers: git repo inspection, -// shallow copy prep, guest-side import, -// finalize script generation, shell quoting. +// internal/daemon/fcproc Firecracker process helpers. +// internal/daemon/imagemgr Image subsystem helpers. +// internal/daemon/workspace Workspace helpers. // -// VM lifecycle (in this package): +// File inventory: // -// vm_create.go CreateVM and create-time disk provisioning -// vm_lifecycle.go Start/Stop/Restart/Kill/Delete -// vm_set.go SetVM mutation -// vm_stats.go stats, health, ping, stale reaper -// vm_disk.go system overlay, work disk provisioning -// vm_authsync.go per-VM authorized_key, git identity, auth file sync -// vm_create_ops.go async begin/status/cancel (uses opstate.Registry) -// vm_locks.go vmLockSet: per-VM mutex set -// vm.go fcproc forwarders, DNS helpers, small utilities -// capabilities.go pluggable capability hooks executed at VM start -// preflight.go prereq validation for VM start -// snapshot.go dmsnap forwarders + dmSnapshotHandles type alias -// ports.go port forwarding inspection +// daemon.go Composition root, Open/Close/Serve, dispatch, +// reconcile orchestrator, backgroundLoop. +// host_network.go HostNetwork struct + constructor. +// image_service.go ImageService struct + constructor + FindImage. +// workspace_service.go WorkspaceService struct + constructor. +// vm_service.go VMService struct + constructor + FindVM, +// TouchVM, withVMLock* family, lockVMID. // -// Image management (in this package): +// nat.go, dns_routing.go, tap_pool.go, snapshot.go HostNetwork methods. +// images.go, images_pull.go, image_seed.go, kernels.go ImageService methods. +// workspace.go, vm_authsync.go WorkspaceService methods. +// vm_lifecycle.go, vm_create.go, vm_create_ops.go, +// vm_stats.go, vm_set.go, vm_disk.go, vm_handles.go, +// ports.go, preflight.go VMService methods. // -// images.go register, promote, delete, find, list -// images_pull.go image pull: catalog (bundle) + OCI paths -// image_seed.go managed work-seed SSH fingerprint refresh -// -// Guest interaction (in this package): -// -// guest_ssh.go guestSSHClient, dialGuest, waitForGuestSSH -// ssh_client_config.go daemon-managed SSH client key material -// workspace.go ExportVMWorkspace, PrepareVMWorkspace -// -// Host bootstrap (in this package): -// -// nat.go NAT prereq registration -// dns_routing.go systemd-resolved per-interface routing -// tap_pool.go TAP interface pool (state in tapPool type) -// -// Core (in this package): -// -// daemon.go Daemon struct, Open/Close/Serve, dispatch -// doctor.go host diagnostics -// logger.go slog configuration -// runtime_assets.go paths to bundled companion binaries +// vm.go Cross-service constants, rebuildDNS / +// cleanupRuntime / generateName (*VMService), +// and small stateless utilities. +// capabilities.go Pluggable capability hooks executed at VM +// start. Hook methods take *Daemon; VMService +// reaches them through a capabilityHooks seam. +// vm_locks.go vmLockSet primitive. +// guest_ssh.go guestSSHClient, dialGuest, waitForGuestSSH. +// ssh_client_config.go Daemon-managed SSH client key material. +// doctor.go Host diagnostics. +// logger.go slog configuration. +// runtime_assets.go Companion-binary paths. // // Lock ordering: // -// vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks +// VMService.vmLocks[id] → WorkspaceService.workspaceLocks[id] +// → {VMService.createVMMu, ImageService.imageOpsMu} +// → subsystem-local locks // -// vmLocks[id] is held across entire lifecycle ops (start/stop/delete/set), -// not just a validation window — callers that want to avoid blocking -// lifecycle on slow guest I/O must explicitly split off to -// workspaceLocks[id] the way workspace.prepare does. Subsystem-local -// locks (tapPool.mu, opstate.Registry mu) are leaves and do not contend -// with each other. See ARCHITECTURE.md for details. +// vmLocks[id] and workspaceLocks[id] are NEVER held at the same +// time. workspace.prepare acquires vmLocks[id] only long enough to +// validate VM state, releases it, then acquires workspaceLocks[id] +// for the slow guest I/O phase. Lifecycle ops (start/stop/delete/ +// set) hold vmLocks[id] across the whole flow. Subsystem-local +// locks (tapPool.mu, opstate.Registry mu, handleCache.mu) are +// leaves. See ARCHITECTURE.md for details. package daemon