Four targeted fixes from a race-condition audit of the daemon package.
None change behaviour on the happy path; each closes a window where a
concurrent or interrupted RPC could strand state on the host.
- KernelDelete now holds the same per-name lock as KernelPull /
readOrAutoPullKernel. Without it, a delete racing a concurrent
pull could remove files mid-write or land between the pull's
manifest write and its first use.
- cleanupRuntime no longer early-returns on an inner waitForExit
failure; DM snapshot, capability, and tap teardown always run and
every error is folded into the returned errors.Join. EBUSY against
a still-alive firecracker is benign and surfaces in the joined
error rather than stranding kernel state across daemon restarts.
- Per-name image / kernel pull locks switch from *sync.Mutex to a
1-buffered chan struct{}. Acquire is a select on ctx.Done(), so a
peer waiting behind a pull whose RPC was cancelled can bail out
instead of blocking forever on a pull nobody is consuming.
- setVMHandles writes the per-VM scratch file before updating the
in-memory cache. A daemon crash between the two now leaves disk
ahead of memory (recoverable: reconcile re-seeds the cache from
the file on next start) rather than memory ahead of disk (lost
handles → stranded DM/loops/tap).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem
from `vm.create` paths assuming single-caller semantics:
1. **Kernel auto-pull manifest race.** Parallel `vm.create` calls that
each need to auto-pull the same kernel ref both run kernelcat.Fetch
in parallel against the same /var/lib/banger/kernels/<name>/. Fetch
writes manifest.json non-atomically (truncate + write); the peer
reads it back mid-write and trips
"parse manifest for X: unexpected end of JSON input".
Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock).
`KernelPull` and `readOrAutoPullKernel` both acquire it and re-check
`kernelcat.ReadLocal` after the lock so a peer who finished while we
waited is treated as success — `readOrAutoPullKernel` does NOT call
`s.KernelPull` because that path errors with "already pulled" on a
peer-success, which would be wrong for auto-pull. Different kernels
stay parallel.
2. **Image auto-pull race.** Same shape as the kernel race but on the
image side: parallel `vm.create` calls both run pullFromBundle /
pullFromOCI for the missing image (each ~minutes of OCI fetch +
ext4 build). The publishImage atom under imageOpsMu only protects
the rename + UpsertImage commit, so the loser does all the work
only to fail at the recheck with "image already exists".
Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock).
`findOrAutoPullImage` acquires it, re-checks FindImage, and only
then calls PullImage. Loser short-circuits with the
freshly-published image instead of redoing minutes of work.
PullImage's own publishImage recheck stays as defense-in-depth
for callers that bypass the auto-pull path.
3. **Work-seed refresh race.** When the host's SSH key has rotated
since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk`
triggers `refreshManagedWorkSeedFingerprint`, which rewrote the
shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create`
calls doing parallel `MaterializeWorkDisk` rdumps observed a torn
ext4 image — "Superblock checksum does not match superblock".
Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`)
and atomic-rename. Concurrent readers either have the file open
(kernel keeps the pre-rename inode alive) or open after the rename
(see the new inode) — never observe a partial state. Two parallel
refreshes are idempotent (same daemon, same SSH key) so unique tmp
names are enough; whichever rename lands last wins, with identical
content. UpsertImage runs after the rename so the recorded
fingerprint always matches what's on disk.
Plus one smoke harness fix: reclassify `vm_prune` from `pure` to
`global`. `vm prune -f` removes ALL stopped VMs system-wide, not just
the ones the scenario created — so a parallel peer scenario that
happens to have its VM in `created`/`stopped` momentarily gets wiped.
Moving prune to the post-pool serial phase keeps it from racing with
in-flight scenarios.
After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s
(serial baseline 141s; the small overhead is the buffered-output and
`wait -n` semaphore cost — well worth the parallelism for fast-iter
work on a 32-core box).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.
Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.
vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.
The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.
Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this change `banger image pull` (both OCI-direct and bundle
paths) shipped images with an empty WorkSeedPath — the BuildWorkSeedImage
helper existed only behind the hidden `banger internal work-seed` CLI.
Every pulled image hit ensureWorkDisk's no-seed branch, and the guest
booted with a bare /root (no .bashrc, no .profile, none of the distro
defaults).
Pull now calls BuildWorkSeedImage after the rootfs is finalised (OCI)
or fetched (bundle). The builder is behind a new `workSeedBuilder` test
seam so existing pull tests don't accidentally demand sudo mount. The
build failure is non-fatal: any error logs a warning and leaves
WorkSeedPath empty — images stay publishable even if the pulled rootfs
has no /root to extract.
Verified end-to-end by wiping the cached smoke image and re-pulling:
work-seed.ext4 lands in the artifact dir next to rootfs.ext4, and all
21 smoke scenarios pass.
Also refreshes the "feature /root work disk" fallback tooling check —
the no-seed path no longer touches mount/umount/cp after commit
0e28504, so the doctor check now only requires truncate + mkfs.ext4.
The warn copy updates from "new VM creates will be slower" to "guest
/root will be empty", which matches the actual tradeoff post-refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Factor the service + capability wiring out of Daemon.Open() into
wireServices(d), an idempotent helper that constructs HostNetwork,
ImageService, WorkspaceService, and VMService from whatever
infrastructure (runner, store, config, layout, logger, closing) is
already set on d. Open() calls it once after filling the composition
root; tests that build &Daemon{...} literals call it to get a working
service graph, preinstalling stubs on the fields they want to fake.
Drops the four lazy-init getters on *Daemon — d.hostNet(),
d.imageSvc(), d.workspaceSvc(), d.vmSvc() — whose sole purpose was
keeping test literals working. Every production call site now reads
d.net / d.img / d.ws / d.vm directly; the services are guaranteed
non-nil once Open returns. No behavior change.
Mechanical: all existing `d.xxxSvc()` calls (production + tests)
rewritten to field access; each `d := &Daemon{...}` in tests gets a
trailing wireServices(d) so the literal + wiring are side-by-side.
Tests that override a pre-built service (e.g. d.img = &ImageService{
bundleFetch: stub}) now set the override before wireServices so the
replacement propagates into VMService's peer pointer.
Also nil-guards HostNetwork.stopVMDNS and d.store in Close() so
partially-initialised daemons (pre-reconcile open failure) still
tear down cleanly — same contract the old lazy getters provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second phase of splitting the daemon god-struct. ImageService now owns
all image + kernel registry operations: register/promote/delete/pull
for images (bundle + OCI paths), the six kernel commands, and the
shared SSH-key/work-seed injection helpers. imageOpsMu (the
publication-window lock) lives on the service; so do the three OCI
pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch.
The four files images.go, images_pull.go, image_seed.go, kernels.go
flipped their receivers from *Daemon to *ImageService.
FindImage moved with the service. Daemon keeps a thin FindImage
forwarder so callers reading the dispatch code see the obvious
facade and tests that pre-date the split still compile.
flattenNestedWorkHome — called from image_seed.go, vm_authsync.go,
and vm_disk.go across future service boundaries — became a
package-level helper taking a CommandRunner explicitly. Daemon keeps
a deprecated forwarder for now; the other services will use the
package form.
Lazy-init helper imageSvc() on Daemon mirrors hostNet() from
Phase 1, so test literals like &Daemon{store: db, runner: r, ...}
that don't spell out an ImageService still get a working one.
Tests that override the image test seams (autopull_test,
concurrency_test, images_pull_test, images_pull_bundle_test) now
assign d.img = &ImageService{...seams...}; the two-statement pattern
matches what Phase 1 established for HostNetwork.
Dispatch in daemon.go is cleaner now: every image/kernel RPC handler
is a single-liner forwarding to d.imageSvc().*. Phase 5 will do the
same for VM lifecycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>