banger/internal/daemon
Thales Maciel 99d0811097
daemon: shrink createVMMu + imageOpsMu to reservation/publication windows
Before: createVMMu was held across the whole of CreateVM — including
image resolution (which could fire a full auto-pull) and startVMLocked
(boot of multiple seconds). imageOpsMu was held across the whole of
PullImage/RegisterImage/PromoteImage/DeleteImage, so any slow OCI pull,
bundle download, or file copy blocked every other image mutation and
every other VM create that needed to auto-pull. The async create API
bought nothing if all creates serialised on the same mutex.

CreateVM is now three phases:

 1. Validate + resolve image (possibly auto-pulling). No global lock.
 2. reserveVM: take createVMMu only long enough to re-check the name
    is free, allocate the next guest IP, and UpsertVM the "created"
    row. Milliseconds.
 3. startVMLocked: run the full boot flow under the per-VM lock only.

Parallel creates of different VMs now overlap on image resolution +
boot; they contend only across the reservation claim.

For the image surface a new publishImage helper isolates the commit
atom (recheck name free, atomic rename stagingDir→finalDir, UpsertImage)
under imageOpsMu. pullFromBundle + pullFromOCI do their network fetch
+ ext4 build + ownership fixup + agent injection outside the lock;
Register moves validation + kernel resolution outside; Promote moves
file copy + SSH-key seeding outside; Delete keeps a brief lock over
the lookup + reference check + store delete and does file cleanup
unlocked.

Two concurrency tests assert the new behaviour:
 - TestPullImageDoesNotSerialiseOnDifferentNames fails the old code
   (second pull blocks on imageOpsMu and never reaches the body).
 - TestPullImageRejectsNameClashAtPublish confirms the publish-window
   recheck is what enforces name uniqueness now that the body runs
   unlocked — exactly one winner.

ARCHITECTURE.md updated to describe the new scope explicitly instead
of calling the locks "narrow".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:44:22 -03:00
..
dmsnap Extract opstate and dmsnap into subpackages 2026-04-15 16:02:43 -03:00
fcproc runtime sockets: close the local-user race window around control-plane creation 2026-04-20 12:53:47 -03:00
imagemgr Remove image build --from-image; doctor treats catalog images as OK 2026-04-18 15:54:29 -03:00
opstate coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
workspace remove vm session feature 2026-04-20 12:47:58 -03:00
ARCHITECTURE.md daemon: shrink createVMMu + imageOpsMu to reservation/publication windows 2026-04-20 13:44:22 -03:00
autopull_test.go vm create: auto-pull image and kernel from catalogs if missing 2026-04-18 15:10:26 -03:00
capabilities.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
capabilities_test.go Remove opencode package + vm acp command (dead code) 2026-04-18 16:54:37 -03:00
concurrency_test.go daemon: shrink createVMMu + imageOpsMu to reservation/publication windows 2026-04-20 13:44:22 -03:00
daemon.go remove vm session feature 2026-04-20 12:47:58 -03:00
daemon_test.go remove experimental web UI 2026-04-19 14:28:08 -03:00
dns_routing.go Route .vm DNS through systemd-resolved 2026-03-22 15:07:22 -03:00
dns_routing_test.go Route .vm DNS through systemd-resolved 2026-03-22 15:07:22 -03:00
doc.go daemon: correct ARCHITECTURE doc to match actual package shape + lock scope 2026-04-20 13:02:36 -03:00
doctor.go docs + doctor: be honest about amd64-only support 2026-04-20 13:03:50 -03:00
fake_firecracker_test.go remove vm session feature 2026-04-20 12:47:58 -03:00
fastpath_test.go Manage image artifacts and show VM create progress 2026-03-21 14:48:01 -03:00
guest_ssh.go remove vm session feature 2026-04-20 12:47:58 -03:00
image_seed.go guest sshd: drop DEBUG3 + StrictModes no; normalise /root perms 2026-04-19 13:40:40 -03:00
images.go daemon: shrink createVMMu + imageOpsMu to reservation/publication windows 2026-04-20 13:44:22 -03:00
images_helpers_test.go coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
images_pull.go daemon: shrink createVMMu + imageOpsMu to reservation/publication windows 2026-04-20 13:44:22 -03:00
images_pull_bundle_test.go image pull: dispatch to imagecat bundle path before OCI 2026-04-17 15:43:33 -03:00
images_pull_test.go Phase B-2: pre-inject banger guest agents into pulled rootfs 2026-04-16 18:08:56 -03:00
kernels.go Phase 4: remote catalog + banger kernel pull 2026-04-16 15:05:42 -03:00
kernels_test.go Phase 4: remote catalog + banger kernel pull 2026-04-16 15:05:42 -03:00
logger.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
logger_test.go Remove image build --from-image; doctor treats catalog images as OK 2026-04-18 15:54:29 -03:00
nat.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
nat_test.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
open_close_test.go remove vm session feature 2026-04-20 12:47:58 -03:00
ports.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
preflight.go Remove image build --from-image; doctor treats catalog images as OK 2026-04-18 15:54:29 -03:00
runtime_assets.go Remove runtime-bundle image dependencies 2026-03-21 18:34:53 -03:00
snapshot.go Extract opstate and dmsnap into subpackages 2026-04-15 16:02:43 -03:00
snapshot_test.go Harden VM stop cleanup for stale snapshots 2026-03-18 12:28:15 -03:00
ssh_client_config.go ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
ssh_client_config_test.go ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
sshd_config_test.go guest sshd: drop DEBUG3 + StrictModes no; normalise /root perms 2026-04-19 13:40:40 -03:00
tap_pool.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_authsync.go guest sshd: drop DEBUG3 + StrictModes no; normalise /root perms 2026-04-19 13:40:40 -03:00
vm_create.go daemon: shrink createVMMu + imageOpsMu to reservation/publication windows 2026-04-20 13:44:22 -03:00
vm_create_ops.go Add lint targets, fix gofmt drift, broaden Makefile build inputs 2026-04-16 16:49:17 -03:00
vm_disk.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_handles.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_handles_test.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_lifecycle.go ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
vm_locks.go Move subsystem state/locks off Daemon into owning types 2026-04-15 15:58:33 -03:00
vm_set.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_stats.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
vm_test.go vm state: split transient kernel/process handles off the durable schema 2026-04-19 14:18:13 -03:00
workspace.go remove vm session feature 2026-04-20 12:47:58 -03:00
workspace_test.go cli + daemon: move test seams off package globals onto injected structs 2026-04-19 19:03:55 -03:00