banger

Author	SHA1	Message	Date
Thales Maciel	72882e45d7	daemon: serialise concurrent image/kernel pulls + atomic-rename seed refresh Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem from `vm.create` paths assuming single-caller semantics: 1. Kernel auto-pull manifest race. Parallel `vm.create` calls that each need to auto-pull the same kernel ref both run kernelcat.Fetch in parallel against the same /var/lib/banger/kernels/<name>/. Fetch writes manifest.json non-atomically (truncate + write); the peer reads it back mid-write and trips "parse manifest for X: unexpected end of JSON input". Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock). `KernelPull` and `readOrAutoPullKernel` both acquire it and re-check `kernelcat.ReadLocal` after the lock so a peer who finished while we waited is treated as success — `readOrAutoPullKernel` does NOT call `s.KernelPull` because that path errors with "already pulled" on a peer-success, which would be wrong for auto-pull. Different kernels stay parallel. 2. Image auto-pull race. Same shape as the kernel race but on the image side: parallel `vm.create` calls both run pullFromBundle / pullFromOCI for the missing image (each ~minutes of OCI fetch + ext4 build). The publishImage atom under imageOpsMu only protects the rename + UpsertImage commit, so the loser does all the work only to fail at the recheck with "image already exists". Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock). `findOrAutoPullImage` acquires it, re-checks FindImage, and only then calls PullImage. Loser short-circuits with the freshly-published image instead of redoing minutes of work. PullImage's own publishImage recheck stays as defense-in-depth for callers that bypass the auto-pull path. 3. Work-seed refresh race. When the host's SSH key has rotated since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk` triggers `refreshManagedWorkSeedFingerprint`, which rewrote the shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create` calls doing parallel `MaterializeWorkDisk` rdumps observed a torn ext4 image — "Superblock checksum does not match superblock". Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`) and atomic-rename. Concurrent readers either have the file open (kernel keeps the pre-rename inode alive) or open after the rename (see the new inode) — never observe a partial state. Two parallel refreshes are idempotent (same daemon, same SSH key) so unique tmp names are enough; whichever rename lands last wins, with identical content. UpsertImage runs after the rename so the recorded fingerprint always matches what's on disk. Plus one smoke harness fix: reclassify `vm_prune` from `pure` to `global`. `vm prune -f` removes ALL stopped VMs system-wide, not just the ones the scenario created — so a parallel peer scenario that happens to have its VM in `created`/`stopped` momentarily gets wiped. Moving prune to the post-pool serial phase keeps it from racing with in-flight scenarios. After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s (serial baseline 141s; the small overhead is the buffered-output and `wait -n` semaphore cost — well worth the parallelism for fast-iter work on a 32-core box). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:24:11 -03:00
Thales Maciel	e47b8146dc	daemon: thread per-RPC op_id end-to-end Today there's no way to correlate a CLI failure with a daemon log line. operationLog records relative timing but no id, two concurrent vm.start calls log indistinguishably, and the async vmCreateOperationState.ID is user-facing yet never reaches the journal. The root helper logs plain text to stderr while bangerd logs JSON, so a merged journalctl is hard to grep across the trust-boundary split. Mint a per-RPC op id at dispatch entry, store it on context, and include it as an "op_id" attr on every operationLog record. The id is stamped onto every error response (including the early short-circuit paths bad_version and unknown_method). rpc.Call forwards the context op id on requests so a daemon RPC and the helper RPCs it triggers all share one id. The helper now logs JSON to match bangerd, adopts the inbound id, and emits a single "helper rpc completed" / "helper rpc failed" line per call so operators can see at a glance how long each privileged op took. vmCreateOperationState.ID is now the same id dispatch generated for vm.create.begin — one identifier between client status polls, daemon logs, and helper logs. The wire format gains two optional fields: rpc.Request.OpID and rpc.ErrorResponse.OpID, both omitempty so older peers (and the opposite direction) ignore them. ErrorResponse.Error() now appends "(op-XXXXXX)" to its string form when set; existing callers that just print err.Error() get the id for free. Tests cover: dispatch stamps op_id on unknown_method, bad_version, and handler-returned errors; rpc.Call exposes the typed *ErrorResponse via errors.As so the CLI can read code/op_id; ctx op_id is forwarded to the server in the request envelope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 22:13:44 -03:00
Thales Maciel	caa6a2b996	model: validate VM names as DNS labels at CLI + daemon A VM name flows into five places that all have narrower grammars than "arbitrary string": - the guest's /etc/hostname (vm_disk.patchRootOverlay) - the guest's /etc/hosts (same) - the <name>.vm DNS record (vmdns.RecordName) - the kernel command line (system.BuildBootArgs*) - VM-dir file-path fragments (layout.VMsDir/<id>, etc.) Nothing in the chain was validating the input. A name with whitespace, newline, dot, slash, colon, or = would produce broken hostnames, weird DNS labels, smuggled kernel cmdline tokens, or (in the worst case) surprising traversal through the on-disk layout. Not host shell injection — we already avoid shelling out with the raw name — but a real correctness and supportability bug. New: model.ValidateVMName. Rules: - 1..63 chars (DNS label max per RFC 1123; also a comfortable /etc/hostname cap) - lowercase ASCII letters, digits, '-' only - no leading or trailing '-' - no normalization — the name is the user-visible identifier (store key, `ssh <name>.vm`, `vm show`); silently rewriting "MyVM" → "myvm" would hand the user back something different than they typed Called from two places: - internal/cli/commands_vm.go vmCreateParamsFromFlags — rejects bad `--name` values before any RPC. Empty name still passes through so the daemon can generate one. - internal/daemon/vm_create.go reserveVM — defense in depth for any non-CLI RPC caller (SDK, direct JSON over the socket). Tests: - internal/model/vm_name_test.go — exhaustive character-class matrix (space, newline, tab, dot, slash, colon, equals, quote, control chars, unicode letters, uppercase, leading/trailing hyphen, over-length, max-length-exact, digits-only). - internal/cli TestVMCreateParamsFromFlagsRejectsInvalidName — CLI wire-through + empty-name passthrough. - internal/daemon TestReserveVMRejectsInvalidName — daemon defense-in-depth (including `box/../evil` path-traversal). - scripts/smoke.sh — end-to-end rejection + no-leaked-row assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:06:40 -03:00
Thales Maciel	466a7c30c4	daemon split (4/5): extract VMService service Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op registry, handle cache, disk provisioning, stats polling, ports query, and the per-VM lock set all move off Daemon onto VMService. Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch surface) and is otherwise out of VM lifecycle. Lazy-init via d.vmSvc() mirrors the earlier services so test literals like \`&Daemon{store: db, runner: r}\` still get a functional service without spelling one out. Three small cleanups along the way: preflight helpers (validateStartPrereqs / addBaseStartPrereqs / addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs) move with the VM methods that call them. * cleanupRuntime / rebuildDNS move to VMService, with HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot, killVMProcess, releaseTap, waitForExit, sendCtrlAltDel) reached through s.net instead of the hostNet() facade. vsockAgentBinary becomes a package-level function so both Daemon (doctor) and VMService (preflight) call one entry point instead of each owning a forwarder method. WorkspaceService's peer deps switch from eager method values to closures — vmSvc() constructs VMService with WorkspaceService as a peer, so resolving d.vmSvc().FindVM at construction time recursed through workspaceSvc() → vmSvc(). Closures defer the lookup to call time. Pure code motion: build + unit tests green, lint clean. No RPC surface or lock-ordering changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:57:05 -03:00
Thales Maciel	d7614a3b2b	daemon split (2/5): extract ImageService service Second phase of splitting the daemon god-struct. ImageService now owns all image + kernel registry operations: register/promote/delete/pull for images (bundle + OCI paths), the six kernel commands, and the shared SSH-key/work-seed injection helpers. imageOpsMu (the publication-window lock) lives on the service; so do the three OCI pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch. The four files images.go, images_pull.go, image_seed.go, kernels.go flipped their receivers from Daemon to ImageService. FindImage moved with the service. Daemon keeps a thin FindImage forwarder so callers reading the dispatch code see the obvious facade and tests that pre-date the split still compile. flattenNestedWorkHome — called from image_seed.go, vm_authsync.go, and vm_disk.go across future service boundaries — became a package-level helper taking a CommandRunner explicitly. Daemon keeps a deprecated forwarder for now; the other services will use the package form. Lazy-init helper imageSvc() on Daemon mirrors hostNet() from Phase 1, so test literals like &Daemon{store: db, runner: r, ...} that don't spell out an ImageService still get a working one. Tests that override the image test seams (autopull_test, concurrency_test, images_pull_test, images_pull_bundle_test) now assign d.img = &ImageService{...seams...}; the two-statement pattern matches what Phase 1 established for HostNetwork. Dispatch in daemon.go is cleaner now: every image/kernel RPC handler is a single-liner forwarding to d.imageSvc().. Phase 5 will do the same for VM lifecycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:30:32 -03:00
Thales Maciel	eba9a553bf	daemon: use exact-name lookup for VM-create uniqueness reserveVM's duplicate-name guard routed through Daemon.FindVM, which falls back to prefix-matching on both ids and names when no exact match is found. That turns the uniqueness check into a correctness bug: a brand-new VM name can be rejected because it happens to prefix an existing VM's id, or an existing VM's name. So `vm create --name beta` fails when `beta-sandbox` already exists. Swap in a dedicated store.GetVMByName that does a literal `WHERE name = ?` lookup, and use it from reserveVM. FindVM keeps its prefix-matching behaviour for user-facing lookup paths (`vm ssh <partial>`, `vm stop <partial>`) where "did you mean" semantics are the feature. Tests: - TestReserveVMAllowsNameThatPrefixesExistingVM — seeds a VM whose id + name both start with "longname", then reserves two new VMs named "longname" and "longname-sandbox". Both must succeed. Under the old FindVM-based check, both would fail. - TestReserveVMRejectsExactDuplicateName — actual collisions are still rejected after the swap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:00:33 -03:00
Thales Maciel	99d0811097	daemon: shrink createVMMu + imageOpsMu to reservation/publication windows Before: createVMMu was held across the whole of CreateVM — including image resolution (which could fire a full auto-pull) and startVMLocked (boot of multiple seconds). imageOpsMu was held across the whole of PullImage/RegisterImage/PromoteImage/DeleteImage, so any slow OCI pull, bundle download, or file copy blocked every other image mutation and every other VM create that needed to auto-pull. The async create API bought nothing if all creates serialised on the same mutex. CreateVM is now three phases: 1. Validate + resolve image (possibly auto-pulling). No global lock. 2. reserveVM: take createVMMu only long enough to re-check the name is free, allocate the next guest IP, and UpsertVM the "created" row. Milliseconds. 3. startVMLocked: run the full boot flow under the per-VM lock only. Parallel creates of different VMs now overlap on image resolution + boot; they contend only across the reservation claim. For the image surface a new publishImage helper isolates the commit atom (recheck name free, atomic rename stagingDir→finalDir, UpsertImage) under imageOpsMu. pullFromBundle + pullFromOCI do their network fetch + ext4 build + ownership fixup + agent injection outside the lock; Register moves validation + kernel resolution outside; Promote moves file copy + SSH-key seeding outside; Delete keeps a brief lock over the lookup + reference check + store delete and does file cleanup unlocked. Two concurrency tests assert the new behaviour: - TestPullImageDoesNotSerialiseOnDifferentNames fails the old code (second pull blocks on imageOpsMu and never reaches the body). - TestPullImageRejectsNameClashAtPublish confirms the publish-window recheck is what enforces name uniqueness now that the body runs unlocked — exactly one winner. ARCHITECTURE.md updated to describe the new scope explicitly instead of calling the locks "narrow". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:44:22 -03:00
Thales Maciel	e0894376ea	vm create: auto-pull image and kernel from catalogs if missing One-command sandbox: `banger vm run` on a fresh host now Just Works. No prior `banger image pull` or `banger kernel pull` needed. Changes: - Default `default_image_name` flips from "default" to "debian-bookworm" so the golden image is the implicit target when `--image` is omitted. - `CreateVM` resolves the image via a new `findOrAutoPullImage`: try the local store first, and on miss fall back to the embedded imagecat catalog + auto-pull. Emits a vm-create progress stage so the user sees "pulling from image catalog" in the create output. - `resolveKernelInputs` gains context + the same pattern via `readOrAutoPullKernel`: try the local kernelcat, and on miss look up the embedded kernelcat and auto-pull. Fires whenever a bundle's manifest references a kernel the user hasn't pulled yet, not just during image pull — any CreateVM with an image that needs a kernel not yet local will resolve it. - `--image` help text updated on both `vm run` and `vm create`. Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog error propagation, and a non-ENOENT kernel read error does NOT trigger a misleading "not in catalog" claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:10:26 -03:00
Thales Maciel	59f2766139	Move subsystem state/locks off Daemon into owning types Daemon no longer owns a coarse mu shared across unrelated concerns. Each subsystem now carries its own state and lock: - tapPool: entries, next, and mu move onto a new tapPool struct. - sessionRegistry: sessionControllers + its mutex move off Daemon. - opRegistry[T asyncOp]: generic registry collapses the two ad-hoc vm-create and image-build operation maps (and their mutexes) into one shared type; the Begin/Status/Cancel/Prune methods simplify. - vmLockSet: the sync.Map of per-VM mutexes moves into its own type; lockVMID forwards. - Daemon.mu splits into imageOpsMu (image-registry mutations) and createVMMu (CreateVM serialisation) so image ops and VM creates no longer block each other. Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} -> subsystem-local leaves. doc.go and ARCHITECTURE.md updated. No behavior change; tests green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 15:58:33 -03:00
Thales Maciel	ea0db1e17e	Split internal/daemon vm.go and guest_sessions.go by concern vm.go (1529 LOC) splits into vm_create, vm_lifecycle, vm_set, vm_stats, vm_disk, vm_authsync; firecracker/DNS/helpers stay in vm.go. guest_sessions.go (1266 LOC) splits into session_controller, session_lifecycle, session_attach, session_stream; scripts and helpers stay in guest_sessions.go. Mechanical move only. No behavior change. Adds doc.go and ARCHITECTURE.md capturing subsystem map and current lock ordering as the baseline for the upcoming subsystem extraction. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 15:47:08 -03:00

10 commits