banger/internal/daemon/ARCHITECTURE.md
Thales Maciel 59e48e830b
daemon: split owner daemon from root helper
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.

Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.

Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.

Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
2026-04-26 12:43:17 -03:00

10 KiB

internal/daemon architecture

This document describes the current daemon package layout: the Daemon composition root, the four services it wires together, the subpackages that own stateless helpers, the privileged-ops seam used by the supported system install, and the lock ordering every caller must respect.

Supported service topology

On the supported host path (banger system install on a systemd host), banger runs as two cooperating services:

  • bangerd.service runs as the configured owner user. It owns the public RPC socket, store, image state, workspace prep, and the lifecycle state machine.
  • bangerd-root.service runs as root. It owns only the privileged host-kernel operations: bridge/tap, NAT/resolver routing, dm/loop snapshot plumbing, privileged ext4 mutation on dm devices, and firecracker process/socket ownership.

The owner daemon talks to the root helper through the privilegedOps seam. Non-system/dev paths still use the same seam, but it is backed by an in-process adapter instead of the helper RPC client.

Composition

Daemon is a thin composition root. It holds shared infrastructure (store, runner, logger, layout, config, listener, privileged-ops adapter) plus pointers to four focused services. RPC dispatch is a pure forwarder into those services; no lifecycle / image / workspace / networking behaviour lives on *Daemon itself.

Daemon
├── *HostNetwork      — bridge, tap pool, NAT, DNS, firecracker process,
│                       DM snapshots, vsock readiness
├── *ImageService     — register, promote, delete, pull (bundle + OCI),
│                       kernel catalog, managed-seed refresh
├── *WorkspaceService — workspace.prepare / workspace.export, auth-key
│                       + git-identity sync onto the work disk
└── *VMService        — VM lifecycle (create/start/stop/restart/kill/
                        delete/set), stats polling, ports query,
                        handle cache, per-VM lock set, create-op
                        registry, preflight validation

Each service owns its own state. Cross-service calls go through narrow consumer-defined seams:

  • WorkspaceService does not hold a *VMService pointer. It takes function-typed deps (vmResolver, aliveChecker, withVMLockByRef, imageResolver, imageWorkSeed) so it sees exactly the operations it needs and nothing more. Those deps are captured as closures so construction-order cycles don't recur.
  • VMService holds direct pointers to *HostNetwork, *ImageService, and *WorkspaceService. Orchestrating a VM start really does compose all three (bridge + tap + image resolution + work-disk sync), and declaring a function-typed interface for every call would balloon the surface for no win — services are unexported, so package-external code can never reach them.
  • Capability hooks do not take *Daemon. Each capability is a struct with explicit service-pointer fields (workDiskCapability{vm, ws, store, defaultImageName}, dnsCapability{net}, natCapability{vm, net, logger}) populated at wiring time. VMService invokes them through a capabilityHooks struct (function-typed bag) populated at construction; neither the service nor any capability has a *Daemon pointer.

Services + capabilities are built eagerly by wireServices(d), called once from Daemon.Open after the composition root's infrastructure is populated, and once per test that constructs a &Daemon{...} literal. Tests that want to stub a particular service or the capability list assign the field before calling wireServices — the helper is idempotent and skips anything already set.

Service state

HostNetwork (host_network.go, nat.go, dns_routing.go, tap_pool.go, snapshot.go)

  • tapPool — TAP interface pool, owns its own lock.
  • vmDNS *vmdns.Server — in-process DNS server for .vm names.
  • privilegedOps — the host-kernel seam used for bridge/tap/NAT, resolver routing, dm snapshots, privileged ext4 mutation, and firecracker ownership/kill flows.
  • No direct VM-state access. Where an operation needs a VM's tap name (e.g. ensureNAT), the signature takes guestIP + tap string so the caller (VMService) resolves them first.

ImageService (image_service.go, images.go, images_pull.go, image_seed.go, kernels.go)

  • imageOpsMu sync.Mutex — the publication-window lock. Held only across the recheck-name + atomic-rename + UpsertImage commit atom. Slow work (network fetch, ext4 build, SSH-key seeding) runs unlocked.
  • Test seams pullAndFlatten, finalizePulledRootfs, bundleFetch are struct fields (not package globals), so tests inject per-instance fakes.

WorkspaceService (workspace_service.go, workspace.go, vm_authsync.go)

  • workspaceLocks vmLockSet — per-VM mutex scoped to workspace.prepare / workspace.export. These ops acquire vmLocks[id] (on VMService) only long enough to validate VM state and snapshot the fields they need, then release it and acquire workspaceLocks[id] for the slow guest I/O phase. That keeps vm stop / delete / restart from queueing behind a running tar import.
  • Test seams workspaceInspectRepo, workspaceImport are per-instance fields.

VMService (vm_service.go, vm_lifecycle.go, vm_create.go, vm_create_ops.go, vm_stats.go, vm_set.go, vm_disk.go, vm_handles.go, vm_authsync.go (via WorkspaceService), preflight.go, ports.go, vm.go)

  • vmLocks vmLockSet — per-VM *sync.Mutex, one per VM ID. Held for the entire lifecycle op on that VM: start holds it across preflight, bridge setup, firecracker spawn, and post-boot wiring (seconds to tens of seconds). Two start/stop/delete/set calls against the same VM therefore serialise; calls against different VMs run independently.
  • createVMMu sync.Mutex — narrow reservation mutex. CreateVM resolves the image (possibly auto-pulling, which self-locks on imageOpsMu) and parses sizing flags outside this lock, then holds createVMMu only to re-check that the requested VM name is still free, allocate the next guest IP, and insert the initial "created" row. The subsequent boot flow runs under the per-VM lock only.
  • createOps opstate.Registry[*vmCreateOperationState] — in-flight async create operations; owns its own lock.
  • handles *handleCache — in-memory map of per-VM transient kernel/ process handles (PID, tap device, loop devices, DM target). Each VM directory holds a small handles.json scratch file so the cache can be rebuilt at daemon startup.
  • vsockHostDevice — path to /dev/vhost-vsock the preflight and doctor checks RequireFile against. Defaulted in wireServices; tests point at a tempfile to make the check pass without the kernel module loaded. Guest-SSH test seams live on *Daemon (d.guestWaitForSSH, d.guestDial), not VMService — workspace prepare is the only path that reaches guest SSH, and it gets there through closures WorkspaceService captured at wiring time.

Subpackages

Stateless helpers with no need for a service pointer live in subpackages. Each takes explicit dependencies (typically a system.Runner-compatible interface) and holds no global state beyond small test seams.

Subpackage Purpose
internal/daemon/opstate Generic Registry[T AsyncOp] for async-operation bookkeeping.
internal/daemon/dmsnap Device-mapper COW snapshot create/cleanup/remove.
internal/daemon/fcproc Firecracker process primitives (bridge, tap, binary, PID, kill, wait).
internal/daemon/imagemgr Image subsystem pure helpers: validators, staging, build script gen.
internal/daemon/workspace Workspace helpers: git inspection, copy prep, guest import script.

All subpackages are leaves — no intra-daemon subpackage imports another.

Lock ordering

Acquire in this order, release in reverse. Never acquire in the opposite direction.

VMService.vmLocks[id]  →  WorkspaceService.workspaceLocks[id]
                      →  {VMService.createVMMu, ImageService.imageOpsMu}
                      →  subsystem-local locks

vmLocks[id] and workspaceLocks[id] are NEVER held at the same time. workspace.prepare acquires vmLocks[id] just long enough to validate VM state, releases it, then acquires workspaceLocks[id] for the guest I/O phase. Regular lifecycle ops (start, stop, delete, set) do NOT do this split — they hold vmLocks[id] across the whole flow.

Subsystem-local locks (tapPool.mu, opstate.Registry mu, handleCache.mu) are leaves. They do not contend with each other.

Notes:

  • vmLocks[id] is the outer lock for any operation scoped to a single VM. Acquired via VMService.withVMLockByID / withVMLockByRef. The callback runs under the lock — treat the whole function body as critical section.
  • createVMMu is held only across the VM-name reservation + IP allocation + initial UpsertVM. Image resolution and the full boot flow happen outside it.
  • imageOpsMu is held only across the publication atom (recheck name
    • atomic rename + UpsertImage, or the equivalent for Register / Promote / Delete). Network fetch, ext4 build, and file copies run unlocked.
  • Holding a subsystem-local lock while calling into guest SSH is discouraged; copy needed state out under the lock and release before blocking I/O.

Reconcile and background work

Daemon.reconcile(ctx) is the orchestrator run at startup. It rehydrates the handle cache, reaps stale VMs, and republishes DNS records. Daemon.backgroundLoop() is the ticker fan-out — VMService.pollStats, VMService.stopStaleVMs, and VMService.pruneVMCreateOperations run on independent tickers. On the supported system path, any reconcile-time host cleanup that needs privilege goes through privilegedOps, not directly through the owner daemon process.

External API

Only internal/cli imports this package. The surface is:

  • daemon.Open(ctx) (*Daemon, error)
  • daemon.OpenSystem(ctx) (*Daemon, error)
  • (*Daemon).Serve(ctx) error
  • (*Daemon).Close() error
  • daemon.Doctor(...) — host diagnostics (no receiver).

All other methods live on the four services and are reached only through the RPC dispatch switch in daemon.go. They are free to move/rename during refactoring.