banger

Author	SHA1	Message	Date
Thales Maciel	c4e1cb5953	daemon: tighten concurrency around pulls, cleanup, and handle persistence Four targeted fixes from a race-condition audit of the daemon package. None change behaviour on the happy path; each closes a window where a concurrent or interrupted RPC could strand state on the host. - KernelDelete now holds the same per-name lock as KernelPull / readOrAutoPullKernel. Without it, a delete racing a concurrent pull could remove files mid-write or land between the pull's manifest write and its first use. - cleanupRuntime no longer early-returns on an inner waitForExit failure; DM snapshot, capability, and tap teardown always run and every error is folded into the returned errors.Join. EBUSY against a still-alive firecracker is benign and surfaces in the joined error rather than stranding kernel state across daemon restarts. - Per-name image / kernel pull locks switch from *sync.Mutex to a 1-buffered chan struct{}. Acquire is a select on ctx.Done(), so a peer waiting behind a pull whose RPC was cancelled can bail out instead of blocking forever on a pull nobody is consuming. - setVMHandles writes the per-VM scratch file before updating the in-memory cache. A daemon crash between the two now leaves disk ahead of memory (recoverable: reconcile re-seeds the cache from the file on next start) rather than memory ahead of disk (lost handles → stranded DM/loops/tap). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 19:32:43 -03:00
Thales Maciel	72882e45d7	daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem from `vm.create` paths assuming single-caller semantics: 1. Kernel auto-pull manifest race. Parallel `vm.create` calls that each need to auto-pull the same kernel ref both run kernelcat.Fetch in parallel against the same /var/lib/banger/kernels/<name>/. Fetch writes manifest.json non-atomically (truncate + write); the peer reads it back mid-write and trips "parse manifest for X: unexpected end of JSON input". Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock). `KernelPull` and `readOrAutoPullKernel` both acquire it and re-check `kernelcat.ReadLocal` after the lock so a peer who finished while we waited is treated as success — `readOrAutoPullKernel` does NOT call `s.KernelPull` because that path errors with "already pulled" on a peer-success, which would be wrong for auto-pull. Different kernels stay parallel. 2. Image auto-pull race. Same shape as the kernel race but on the image side: parallel `vm.create` calls both run pullFromBundle / pullFromOCI for the missing image (each ~minutes of OCI fetch + ext4 build). The publishImage atom under imageOpsMu only protects the rename + UpsertImage commit, so the loser does all the work only to fail at the recheck with "image already exists". Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock). `findOrAutoPullImage` acquires it, re-checks FindImage, and only then calls PullImage. Loser short-circuits with the freshly-published image instead of redoing minutes of work. PullImage's own publishImage recheck stays as defense-in-depth for callers that bypass the auto-pull path. 3. Work-seed refresh race. When the host's SSH key has rotated since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk` triggers `refreshManagedWorkSeedFingerprint`, which rewrote the shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create` calls doing parallel `MaterializeWorkDisk` rdumps observed a torn ext4 image — "Superblock checksum does not match superblock". Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`) and atomic-rename. Concurrent readers either have the file open (kernel keeps the pre-rename inode alive) or open after the rename (see the new inode) — never observe a partial state. Two parallel refreshes are idempotent (same daemon, same SSH key) so unique tmp names are enough; whichever rename lands last wins, with identical content. UpsertImage runs after the rename so the recorded fingerprint always matches what's on disk. Plus one smoke harness fix: reclassify `vm_prune` from `pure` to `global`. `vm prune -f` removes ALL stopped VMs system-wide, not just the ones the scenario created — so a parallel peer scenario that happens to have its VM in `created`/`stopped` momentarily gets wiped. Moving prune to the post-pool serial phase keeps it from racing with in-flight scenarios. After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s (serial baseline 141s; the small overhead is the buffered-output and `wait -n` semaphore cost — well worth the parallelism for fast-iter work on a 32-core box). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:24:11 -03:00
Thales Maciel	e47b8146dc	daemon: thread per-RPC op_id end-to-end Today there's no way to correlate a CLI failure with a daemon log line. operationLog records relative timing but no id, two concurrent vm.start calls log indistinguishably, and the async vmCreateOperationState.ID is user-facing yet never reaches the journal. The root helper logs plain text to stderr while bangerd logs JSON, so a merged journalctl is hard to grep across the trust-boundary split. Mint a per-RPC op id at dispatch entry, store it on context, and include it as an "op_id" attr on every operationLog record. The id is stamped onto every error response (including the early short-circuit paths bad_version and unknown_method). rpc.Call forwards the context op id on requests so a daemon RPC and the helper RPCs it triggers all share one id. The helper now logs JSON to match bangerd, adopts the inbound id, and emits a single "helper rpc completed" / "helper rpc failed" line per call so operators can see at a glance how long each privileged op took. vmCreateOperationState.ID is now the same id dispatch generated for vm.create.begin — one identifier between client status polls, daemon logs, and helper logs. The wire format gains two optional fields: rpc.Request.OpID and rpc.ErrorResponse.OpID, both omitempty so older peers (and the opposite direction) ignore them. ErrorResponse.Error() now appends "(op-XXXXXX)" to its string form when set; existing callers that just print err.Error() get the id for free. Tests cover: dispatch stamps op_id on unknown_method, bad_version, and handler-returned errors; rpc.Call exposes the typed *ErrorResponse via errors.As so the CLI can read code/op_id; ctx op_id is forwarded to the server in the request envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:13:44 -03:00
Thales Maciel	3edd7c6de7	daemon: build a work-seed during image pull, refresh doctor check Before this change `banger image pull` (both OCI-direct and bundle paths) shipped images with an empty WorkSeedPath — the BuildWorkSeedImage helper existed only behind the hidden `banger internal work-seed` CLI. Every pulled image hit ensureWorkDisk's no-seed branch, and the guest booted with a bare /root (no .bashrc, no .profile, none of the distro defaults). Pull now calls BuildWorkSeedImage after the rootfs is finalised (OCI) or fetched (bundle). The builder is behind a new `workSeedBuilder` test seam so existing pull tests don't accidentally demand sudo mount. The build failure is non-fatal: any error logs a warning and leaves WorkSeedPath empty — images stay publishable even if the pulled rootfs has no /root to extract. Verified end-to-end by wiping the cached smoke image and re-pulling: work-seed.ext4 lands in the artifact dir next to rootfs.ext4, and all 21 smoke scenarios pass. Also refreshes the "feature /root work disk" fallback tooling check — the no-seed path no longer touches mount/umount/cp after commit `0e28504`, so the doctor check now only requires truncate + mkfs.ext4. The warn copy updates from "new VM creates will be slower" to "guest /root will be empty", which matches the actual tradeoff post-refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 20:24:10 -03:00
Thales Maciel	16702bd5e1	daemon split (6/n): extract wireServices + drop lazy service getters Factor the service + capability wiring out of Daemon.Open() into wireServices(d), an idempotent helper that constructs HostNetwork, ImageService, WorkspaceService, and VMService from whatever infrastructure (runner, store, config, layout, logger, closing) is already set on d. Open() calls it once after filling the composition root; tests that build &Daemon{...} literals call it to get a working service graph, preinstalling stubs on the fields they want to fake. Drops the four lazy-init getters on *Daemon — d.hostNet(), d.imageSvc(), d.workspaceSvc(), d.vmSvc() — whose sole purpose was keeping test literals working. Every production call site now reads d.net / d.img / d.ws / d.vm directly; the services are guaranteed non-nil once Open returns. No behavior change. Mechanical: all existing `d.xxxSvc()` calls (production + tests) rewritten to field access; each `d := &Daemon{...}` in tests gets a trailing wireServices(d) so the literal + wiring are side-by-side. Tests that override a pre-built service (e.g. d.img = &ImageService{ bundleFetch: stub}) now set the override before wireServices so the replacement propagates into VMService's peer pointer. Also nil-guards HostNetwork.stopVMDNS and d.store in Close() so partially-initialised daemons (pre-reconcile open failure) still tear down cleanly — same contract the old lazy getters provided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:55:28 -03:00
Thales Maciel	d7614a3b2b	daemon split (2/5): extract ImageService service Second phase of splitting the daemon god-struct. ImageService now owns all image + kernel registry operations: register/promote/delete/pull for images (bundle + OCI paths), the six kernel commands, and the shared SSH-key/work-seed injection helpers. imageOpsMu (the publication-window lock) lives on the service; so do the three OCI pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch. The four files images.go, images_pull.go, image_seed.go, kernels.go flipped their receivers from Daemon to ImageService. FindImage moved with the service. Daemon keeps a thin FindImage forwarder so callers reading the dispatch code see the obvious facade and tests that pre-date the split still compile. flattenNestedWorkHome — called from image_seed.go, vm_authsync.go, and vm_disk.go across future service boundaries — became a package-level helper taking a CommandRunner explicitly. Daemon keeps a deprecated forwarder for now; the other services will use the package form. Lazy-init helper imageSvc() on Daemon mirrors hostNet() from Phase 1, so test literals like &Daemon{store: db, runner: r, ...} that don't spell out an ImageService still get a working one. Tests that override the image test seams (autopull_test, concurrency_test, images_pull_test, images_pull_bundle_test) now assign d.img = &ImageService{...seams...}; the two-statement pattern matches what Phase 1 established for HostNetwork. Dispatch in daemon.go is cleaner now: every image/kernel RPC handler is a single-liner forwarding to d.imageSvc().. Phase 5 will do the same for VM lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:30:32 -03:00

6 commits