Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.
Why:
Persisting ephemeral process handles to SQLite forced reconcile to
treat "running with a stale PID" as a first-class case and mix it
with real state transitions. The schema described what we last
observed, not what the VM is. Every time the observation model
shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
grew a new branch. Splitting lets each layer own what it's good at:
durable records describe intent, in-memory cache + scratch file
describe momentary reality.
Shape:
- `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
DMDev. Never in SQLite.
- `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
WorkDiskPath, LastError. All durable or deterministic.
- `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
plumbing (`writeHandlesFile` / `readHandlesFile` /
`rediscoverHandles`). See `internal/daemon/vm_handles.go`.
- `d.vmAlive(vm)` replaces the 20+ inline
`vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
spreads. Single source of truth for liveness.
- Startup reconcile: per running VM, load the scratch file, pgrep
the api sock, either keep (cache seeded from scratch) or demote
to stopped (scratch handles passed to cleanupRuntime first so DM
/ loops / tap actually get torn down).
Verification:
- `go test ./...` green.
- Live: `banger vm run --name handles-test -- cat /etc/hostname`
starts; `handles.json` appears in VMDir with the expected PID,
tap, loops, DM.
- `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
CLI, daemon auto-starts, reconcile recognises the VM as alive,
`banger vm ssh` still connects, `banger vm delete` cleans up.
Tests added:
- vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
behaviour, cache concurrency, rediscoverHandles prefers pgrep
over scratch, returns scratch contents even when process is
dead (so cleanup can tear down kernel state).
- vm_test.go: reconcile test rewritten to exercise the new flow
(write scratch → reconcile reads it → verifies process is gone →
issues dmsetup/losetup teardown).
ARCHITECTURE.md updated; `handles` added to Daemon field docs.
106 lines
5 KiB
Markdown
106 lines
5 KiB
Markdown
# `internal/daemon` architecture
|
|
|
|
This document describes the current daemon package layout: the `Daemon`
|
|
composition root, the subpackages that own stateless helpers and shared
|
|
primitives, and the lock ordering every caller must respect.
|
|
|
|
## Composition
|
|
|
|
`Daemon` is the composition root. Subsystem state and locks live on their
|
|
owning types:
|
|
|
|
- Layout, config, store, runner, logger, pid — infrastructure handles.
|
|
- `vmLocks vmLockSet` — per-VM `*sync.Mutex`, one per VM ID. Held only
|
|
across short, synchronous state validation and DB mutations so slow
|
|
guest I/O does not block lifecycle ops on the same VM.
|
|
- `workspaceLocks vmLockSet` — per-VM mutex scoped to
|
|
`workspace.prepare` / `workspace.export`. Serialises concurrent
|
|
workspace operations on a single VM (two simultaneous tar imports
|
|
would clobber each other) without touching `vmLocks`, so
|
|
`vm stop` / `delete` / `restart` never queue behind a slow import.
|
|
- `handles *handleCache` — in-memory map of per-VM transient kernel/
|
|
process handles (PID, tap device, loop devices, DM target). The
|
|
cache is rebuildable: each VM directory holds a small
|
|
`handles.json` scratch file that the daemon reads at startup to
|
|
reconstruct the cache and verify processes against `/proc` via
|
|
pgrep. Nothing in the durable `vms` SQLite row describes transient
|
|
kernel state. See `internal/daemon/vm_handles.go`.
|
|
- `createVMMu sync.Mutex` — serialises `CreateVM` (guards name uniqueness
|
|
+ guest IP allocation window).
|
|
- `imageOpsMu sync.Mutex` — serialises image-registry mutations
|
|
(`PullImage`, `RegisterImage`, `PromoteImage`, `DeleteImage`).
|
|
- `createOps opstate.Registry[*vmCreateOperationState]` — in-flight VM
|
|
create operations; owns its own lock.
|
|
- `tapPool tapPool` — TAP interface pool; owns its own lock.
|
|
- `sessions sessionRegistry` — active guest session controllers; owns
|
|
its own lock.
|
|
- `listener`, `webListener`, `webServer`, `webURL`, `vmDNS` — networking.
|
|
- `vmCaps` — registered VM capability hooks.
|
|
- `pullAndFlatten`, `finalizePulledRootfs`, `bundleFetch`,
|
|
`requestHandler`, `guestWaitForSSH`, `guestDial`,
|
|
`waitForGuestSessionReady` — injectable seams used by tests.
|
|
|
|
## Subpackages
|
|
|
|
Pure helpers have moved into subpackages so the daemon package itself stays
|
|
focused on orchestration. Each subpackage takes explicit dependencies
|
|
(typically a `system.Runner`-compatible interface) and holds no global
|
|
state beyond small test seams.
|
|
|
|
| Subpackage | Purpose |
|
|
| --------------------------------- | ---------------------------------------------------------------------- |
|
|
| `internal/daemon/opstate` | Generic `Registry[T AsyncOp]` for async-operation bookkeeping. |
|
|
| `internal/daemon/dmsnap` | Device-mapper COW snapshot create/cleanup/remove. |
|
|
| `internal/daemon/fcproc` | Firecracker process primitives (bridge, tap, binary, PID, kill, wait). |
|
|
| `internal/daemon/imagemgr` | Image subsystem pure helpers: validators, staging, build script gen. |
|
|
| `internal/daemon/session` | Guest-session helpers: state paths, scripts, parsing, utilities. |
|
|
| `internal/daemon/workspace` | Workspace helpers: git inspection, copy prep, guest import script. |
|
|
|
|
`workspace` imports `session` for `ShellQuote` and `FormatStepError`; all
|
|
other subpackages are leaves (no other intra-daemon subpackage imports).
|
|
|
|
## Lock ordering
|
|
|
|
Acquire in this order, release in reverse. Never acquire in the opposite
|
|
direction.
|
|
|
|
```
|
|
vmLocks[id] → workspaceLocks[id] → {createVMMu, imageOpsMu} → subsystem-local locks
|
|
```
|
|
|
|
`vmLocks[id]` and `workspaceLocks[id]` are NEVER held at the same
|
|
time. `workspace.prepare` acquires `vmLocks[id]` just long enough to
|
|
validate VM state, releases it, then acquires `workspaceLocks[id]`
|
|
for the guest I/O phase.
|
|
|
|
Subsystem-local locks (`tapPool.mu`, `sessionRegistry.mu`,
|
|
`opstate.Registry` mu, `guestSessionController.attachMu` /
|
|
`writeMu`) are leaves. They do not contend with each other.
|
|
|
|
Notes:
|
|
|
|
- `vmLocks[id]` is the outer lock for any operation scoped to a single VM.
|
|
Acquired via `withVMLockByID` / `withVMLockByRef`.
|
|
- `createVMMu` and `imageOpsMu` are narrow: each guards one family of
|
|
mutations and is released before any blocking guest I/O.
|
|
- Holding a subsystem-local lock while calling into guest SSH is
|
|
discouraged; copy needed state out under the lock and release before
|
|
blocking I/O.
|
|
|
|
## External API
|
|
|
|
Only `internal/cli` imports this package. The surface is:
|
|
|
|
- `daemon.Open(ctx) (*Daemon, error)`
|
|
- `(*Daemon).Serve(ctx) error`
|
|
- `(*Daemon).Close() error`
|
|
- `daemon.Doctor(...)` — host diagnostics (no receiver).
|
|
|
|
All other `*Daemon` methods are reached only through the RPC `dispatch`
|
|
switch in `daemon.go` and are free to move/rename during refactoring.
|
|
|
|
## Web UI
|
|
|
|
The optional web UI served at `web_listen_addr` is experimental. It is
|
|
enabled by default for local observability but is not considered a stable
|
|
or supported interface. Set `web_listen_addr = ""` in config to disable.
|