banger/internal/daemon/ARCHITECTURE.md
Thales Maciel 80ae4d6667
docs: resync package docs, AGENTS, and kernel-catalog with current code
Four drift fixes from a doc sweep.

internal/daemon/doc.go
  Replace the capability-hook description that still said "Hook
  methods take *Daemon; VMService reaches them through a
  capabilityHooks seam." Current reality: every capability is a
  plain struct carrying its own service pointers
  (workDiskCapability{vm,ws,store}, dnsCapability{net},
  natCapability{vm,net,logger}); wireServices builds the default
  list; no hook reaches *Daemon.

internal/daemon/ARCHITECTURE.md
  The VMService field list still claimed guestWaitForSSH and
  guestDial were "per-instance fields." Those were deleted as
  refactor residue. Update the note to say the seams live on
  *Daemon (reached by WorkspaceService via closures wired at
  construction) and document the vsockHostDevice field that
  replaced the old package-global vsockHostDevicePath.

AGENTS.md
  Drop the "experimental web UI" mention (removed) and the
  `session` subpackage (removed). Mention banger-vsock-agent as
  the third cmd/ binary while we're here — AGENTS hadn't listed
  it.

docs/kernel-catalog.md
  The trust-model section still read as if upstream kernel sources
  were fetched by HTTPS alone. Add a paragraph covering the PGP
  verification make-generic-kernel.sh now does against the
  detached .tar.sign and the three kernel.org release signing keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:01:11 -03:00

9.3 KiB

internal/daemon architecture

This document describes the current daemon package layout: the Daemon composition root, the four services it wires together, the subpackages that own stateless helpers, and the lock ordering every caller must respect.

Composition

Daemon is a thin composition root. It holds shared infrastructure (store, runner, logger, layout, config, listener) plus pointers to four focused services. RPC dispatch is a pure forwarder into those services; no lifecycle / image / workspace / networking behaviour lives on *Daemon itself.

Daemon
├── *HostNetwork      — bridge, tap pool, NAT, DNS, firecracker process,
│                       DM snapshots, vsock readiness
├── *ImageService     — register, promote, delete, pull (bundle + OCI),
│                       kernel catalog, managed-seed refresh
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
│                       + git-identity sync onto the work disk
└── *VMService        — VM lifecycle (create/start/stop/restart/kill/
                        delete/set), stats polling, ports query,
                        handle cache, per-VM lock set, create-op
                        registry, preflight validation

Each service owns its own state. Cross-service calls go through narrow consumer-defined seams:

  • WorkspaceService does not hold a *VMService pointer. It takes function-typed deps (vmResolver, aliveChecker, withVMLockByRef, imageResolver, imageWorkSeed) so it sees exactly the operations it needs and nothing more. Those deps are captured as closures so construction-order cycles don't recur.
  • VMService holds direct pointers to *HostNetwork, *ImageService, and *WorkspaceService. Orchestrating a VM start really does compose all three (bridge + tap + image resolution + work-disk sync), and declaring a function-typed interface for every call would balloon the surface for no win — services are unexported, so package-external code can never reach them.
  • Capability hooks do not take *Daemon. Each capability is a struct with explicit service-pointer fields (workDiskCapability{vm, ws, store, defaultImageName}, dnsCapability{net}, natCapability{vm, net, logger}) populated at wiring time. VMService invokes them through a capabilityHooks struct (function-typed bag) populated at construction; neither the service nor any capability has a *Daemon pointer.

Services + capabilities are built eagerly by wireServices(d), called once from Daemon.Open after the composition root's infrastructure is populated, and once per test that constructs a &Daemon{...} literal. Tests that want to stub a particular service or the capability list assign the field before calling wireServices — the helper is idempotent and skips anything already set.

Service state

HostNetwork (host_network.go, nat.go, dns_routing.go, tap_pool.go, snapshot.go)

  • tapPool — TAP interface pool, owns its own lock.
  • vmDNS *vmdns.Server — in-process DNS server for .vm names.
  • No direct VM-state access. Where an operation needs a VM's tap name (e.g. ensureNAT), the signature takes guestIP + tap string so the caller (VMService) resolves them first.

ImageService (image_service.go, images.go, images_pull.go, image_seed.go, kernels.go)

  • imageOpsMu sync.Mutex — the publication-window lock. Held only across the recheck-name + atomic-rename + UpsertImage commit atom. Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.
  • Test seams pullAndFlatten, finalizePulledRootfs, bundleFetch are struct fields (not package globals), so tests inject per-instance fakes.

WorkspaceService (workspace_service.go, workspace.go, vm_authsync.go)

  • workspaceLocks vmLockSet — per-VM mutex scoped to workspace.prepare / workspace.export. These ops acquire vmLocks[id] (on VMService) only long enough to validate VM state and snapshot the fields they need, then release it and acquire workspaceLocks[id] for the slow guest I/O phase. That keeps vm stop / delete / restart from queueing behind a running tar import.
  • Test seams workspaceInspectRepo, workspaceImport are per-instance fields.

VMService (vm_service.go, vm_lifecycle.go, vm_create.go, vm_create_ops.go, vm_stats.go, vm_set.go, vm_disk.go, vm_handles.go, vm_authsync.go (via WorkspaceService), preflight.go, ports.go, vm.go)

  • vmLocks vmLockSet — per-VM *sync.Mutex, one per VM ID. Held for the entire lifecycle op on that VM: start holds it across preflight, bridge setup, firecracker spawn, and post-boot wiring (seconds to tens of seconds). Two start/stop/delete/set calls against the same VM therefore serialise; calls against different VMs run independently.
  • createVMMu sync.Mutex — narrow reservation mutex. CreateVM resolves the image (possibly auto-pulling, which self-locks on imageOpsMu) and parses sizing flags outside this lock, then holds createVMMu only to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only.
  • createOps opstate.Registry[*vmCreateOperationState] — in-flight async create operations; owns its own lock.
  • handles *handleCache — in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). Each VM directory holds a small handles.json scratch file so the cache can be rebuilt at daemon startup.
  • vsockHostDevice — path to /dev/vhost-vsock the preflight and doctor checks RequireFile against. Defaulted in wireServices; tests point at a tempfile to make the check pass without the kernel module loaded. Guest-SSH test seams live on *Daemon (d.guestWaitForSSH, d.guestDial), not VMService — workspace prepare is the only path that reaches guest SSH, and it gets there through closures WorkspaceService captured at wiring time.

Subpackages

Stateless helpers with no need for a service pointer live in subpackages. Each takes explicit dependencies (typically a system.Runner-compatible interface) and holds no global state beyond small test seams.

Subpackage Purpose
internal/daemon/opstate Generic Registry[T AsyncOp] for async-operation bookkeeping.
internal/daemon/dmsnap Device-mapper COW snapshot create/cleanup/remove.
internal/daemon/fcproc Firecracker process primitives (bridge, tap, binary, PID, kill, wait).
internal/daemon/imagemgr Image subsystem pure helpers: validators, staging, build script gen.
internal/daemon/workspace Workspace helpers: git inspection, copy prep, guest import script.

All subpackages are leaves — no intra-daemon subpackage imports another.

Lock ordering

Acquire in this order, release in reverse. Never acquire in the opposite direction.

VMService.vmLocks[id]  →  WorkspaceService.workspaceLocks[id]
                      →  {VMService.createVMMu, ImageService.imageOpsMu}
                      →  subsystem-local locks

vmLocks[id] and workspaceLocks[id] are NEVER held at the same time. workspace.prepare acquires vmLocks[id] just long enough to validate VM state, releases it, then acquires workspaceLocks[id] for the guest I/O phase. Regular lifecycle ops (start, stop, delete, set) do NOT do this split — they hold vmLocks[id] across the whole flow.

Subsystem-local locks (tapPool.mu, opstate.Registry mu, handleCache.mu) are leaves. They do not contend with each other.

Notes:

  • vmLocks[id] is the outer lock for any operation scoped to a single VM. Acquired via VMService.withVMLockByID / withVMLockByRef. The callback runs under the lock — treat the whole function body as critical section.
  • createVMMu is held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it.
  • imageOpsMu is held only across the publication atom (recheck name
    • atomic rename + UpsertImage, or the equivalent for Register / Promote / Delete). Network fetch, ext4 build, and file copies run unlocked.
  • Holding a subsystem-local lock while calling into guest SSH is discouraged; copy needed state out under the lock and release before blocking I/O.

Reconcile and background work

Daemon.reconcile(ctx) is the orchestrator run at startup. It rehydrates the handle cache, reaps stale VMs, and republishes DNS records. Daemon.backgroundLoop() is the ticker fan-out — VMService.pollStats, VMService.stopStaleVMs, and VMService.pruneVMCreateOperations run on independent tickers.

External API

Only internal/cli imports this package. The surface is:

  • daemon.Open(ctx) (*Daemon, error)
  • (*Daemon).Serve(ctx) error
  • (*Daemon).Close() error
  • daemon.Doctor(...) — host diagnostics (no receiver).

All other methods live on the four services and are reached only through the RPC dispatch switch in daemon.go. They are free to move/rename during refactoring.