banger/internal
Thales Maciel 72882e45d7
daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh
Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem
from `vm.create` paths assuming single-caller semantics:

1. **Kernel auto-pull manifest race.** Parallel `vm.create` calls that
   each need to auto-pull the same kernel ref both run kernelcat.Fetch
   in parallel against the same /var/lib/banger/kernels/<name>/. Fetch
   writes manifest.json non-atomically (truncate + write); the peer
   reads it back mid-write and trips
   "parse manifest for X: unexpected end of JSON input".

   Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock).
   `KernelPull` and `readOrAutoPullKernel` both acquire it and re-check
   `kernelcat.ReadLocal` after the lock so a peer who finished while we
   waited is treated as success — `readOrAutoPullKernel` does NOT call
   `s.KernelPull` because that path errors with "already pulled" on a
   peer-success, which would be wrong for auto-pull. Different kernels
   stay parallel.

2. **Image auto-pull race.** Same shape as the kernel race but on the
   image side: parallel `vm.create` calls both run pullFromBundle /
   pullFromOCI for the missing image (each ~minutes of OCI fetch +
   ext4 build). The publishImage atom under imageOpsMu only protects
   the rename + UpsertImage commit, so the loser does all the work
   only to fail at the recheck with "image already exists".

   Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock).
   `findOrAutoPullImage` acquires it, re-checks FindImage, and only
   then calls PullImage. Loser short-circuits with the
   freshly-published image instead of redoing minutes of work.
   PullImage's own publishImage recheck stays as defense-in-depth
   for callers that bypass the auto-pull path.

3. **Work-seed refresh race.** When the host's SSH key has rotated
   since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk`
   triggers `refreshManagedWorkSeedFingerprint`, which rewrote the
   shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create`
   calls doing parallel `MaterializeWorkDisk` rdumps observed a torn
   ext4 image — "Superblock checksum does not match superblock".

   Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`)
   and atomic-rename. Concurrent readers either have the file open
   (kernel keeps the pre-rename inode alive) or open after the rename
   (see the new inode) — never observe a partial state. Two parallel
   refreshes are idempotent (same daemon, same SSH key) so unique tmp
   names are enough; whichever rename lands last wins, with identical
   content. UpsertImage runs after the rename so the recorded
   fingerprint always matches what's on disk.

Plus one smoke harness fix: reclassify `vm_prune` from `pure` to
`global`. `vm prune -f` removes ALL stopped VMs system-wide, not just
the ones the scenario created — so a parallel peer scenario that
happens to have its VM in `created`/`stopped` momentarily gets wiped.
Moving prune to the post-pool serial phase keeps it from racing with
in-flight scenarios.

After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s
(serial baseline 141s; the small overhead is the buffered-output and
`wait -n` semaphore cost — well worth the parallelism for fast-iter
work on a 32-core box).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:24:11 -03:00
..
api images: remove the docker field 2026-04-26 20:28:40 -03:00
buildinfo Stamp shared build metadata into banger binaries 2026-03-22 17:14:06 -03:00
cli feat(vm): add vm exec command with workspace dirty detection 2026-04-26 23:53:45 -03:00
config daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
daemon daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh 2026-04-27 17:24:11 -03:00
firecracker daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
guest ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
guestconfig Refactor VM lifecycle around capabilities 2026-03-18 19:28:26 -03:00
guestnet Stop using kernel IP autoconfig for runtime VMs 2026-03-21 21:54:18 -03:00
hostnat coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
imagecat publish-golden-image: content-addressed tarball names 2026-04-18 15:26:57 -03:00
imagepull system: mkfs work disks with lazy_itable_init + lazy_journal_init 2026-04-26 21:32:57 -03:00
installmeta daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
kernelcat Prune legacy void/alpine + customize.sh flows 2026-04-18 15:39:53 -03:00
model feat(vm): add vm exec command with workspace dirty detection 2026-04-26 23:53:45 -03:00
namegen coverage: make targets + close zero-cov gaps (namegen, sessionstream) 2026-04-18 17:44:37 -03:00
paths daemon: split owner daemon from root helper 2026-04-26 12:43:17 -03:00
policy Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00
roothelper daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
rpc daemon: thread per-RPC op_id end-to-end 2026-04-26 22:13:44 -03:00
store feat(vm): add vm exec command with workspace dirty detection 2026-04-26 23:53:45 -03:00
system system: mkfs work disks with lazy_itable_init + lazy_journal_init 2026-04-26 21:32:57 -03:00
toolingplan coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vmdns coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vsockagent Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00