Commit graph

7 commits

Author SHA1 Message Date
466a7c30c4
daemon split (4/5): extract *VMService service
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.

Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.

Three small cleanups along the way:

  * preflight helpers (validateStartPrereqs / addBaseStartPrereqs
    / addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
    move with the VM methods that call them.
  * cleanupRuntime / rebuildDNS move to *VMService, with
    HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
    killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
    reached through s.net instead of the hostNet() facade.
  * vsockAgentBinary becomes a package-level function so both
    *Daemon (doctor) and *VMService (preflight) call one entry
    point instead of each owning a forwarder method.

WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.

Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:57:05 -03:00
d7614a3b2b
daemon split (2/5): extract *ImageService service
Second phase of splitting the daemon god-struct. ImageService now owns
all image + kernel registry operations: register/promote/delete/pull
for images (bundle + OCI paths), the six kernel commands, and the
shared SSH-key/work-seed injection helpers. imageOpsMu (the
publication-window lock) lives on the service; so do the three OCI
pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch.
The four files images.go, images_pull.go, image_seed.go, kernels.go
flipped their receivers from *Daemon to *ImageService.

FindImage moved with the service. Daemon keeps a thin FindImage
forwarder so callers reading the dispatch code see the obvious
facade and tests that pre-date the split still compile.

flattenNestedWorkHome — called from image_seed.go, vm_authsync.go,
and vm_disk.go across future service boundaries — became a
package-level helper taking a CommandRunner explicitly. Daemon keeps
a deprecated forwarder for now; the other services will use the
package form.

Lazy-init helper imageSvc() on Daemon mirrors hostNet() from
Phase 1, so test literals like &Daemon{store: db, runner: r, ...}
that don't spell out an ImageService still get a working one.
Tests that override the image test seams (autopull_test,
concurrency_test, images_pull_test, images_pull_bundle_test) now
assign d.img = &ImageService{...seams...}; the two-statement pattern
matches what Phase 1 established for HostNetwork.

Dispatch in daemon.go is cleaner now: every image/kernel RPC handler
is a single-liner forwarding to d.imageSvc().*. Phase 5 will do the
same for VM lifecycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:30:32 -03:00
eba9a553bf
daemon: use exact-name lookup for VM-create uniqueness
reserveVM's duplicate-name guard routed through Daemon.FindVM, which
falls back to prefix-matching on both ids and names when no exact
match is found. That turns the uniqueness check into a correctness
bug: a brand-new VM name can be rejected because it happens to
prefix an existing VM's id, or an existing VM's name. So `vm create
--name beta` fails when `beta-sandbox` already exists.

Swap in a dedicated store.GetVMByName that does a literal `WHERE
name = ?` lookup, and use it from reserveVM. FindVM keeps its
prefix-matching behaviour for user-facing lookup paths (`vm ssh
<partial>`, `vm stop <partial>`) where "did you mean" semantics
are the feature.

Tests:
 - TestReserveVMAllowsNameThatPrefixesExistingVM — seeds a VM whose
   id + name both start with "longname", then reserves two new VMs
   named "longname" and "longname-sandbox". Both must succeed.
   Under the old FindVM-based check, both would fail.
 - TestReserveVMRejectsExactDuplicateName — actual collisions are
   still rejected after the swap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:00:33 -03:00
99d0811097
daemon: shrink createVMMu + imageOpsMu to reservation/publication windows
Before: createVMMu was held across the whole of CreateVM — including
image resolution (which could fire a full auto-pull) and startVMLocked
(boot of multiple seconds). imageOpsMu was held across the whole of
PullImage/RegisterImage/PromoteImage/DeleteImage, so any slow OCI pull,
bundle download, or file copy blocked every other image mutation and
every other VM create that needed to auto-pull. The async create API
bought nothing if all creates serialised on the same mutex.

CreateVM is now three phases:

 1. Validate + resolve image (possibly auto-pulling). No global lock.
 2. reserveVM: take createVMMu only long enough to re-check the name
    is free, allocate the next guest IP, and UpsertVM the "created"
    row. Milliseconds.
 3. startVMLocked: run the full boot flow under the per-VM lock only.

Parallel creates of different VMs now overlap on image resolution +
boot; they contend only across the reservation claim.

For the image surface a new publishImage helper isolates the commit
atom (recheck name free, atomic rename stagingDir→finalDir, UpsertImage)
under imageOpsMu. pullFromBundle + pullFromOCI do their network fetch
+ ext4 build + ownership fixup + agent injection outside the lock;
Register moves validation + kernel resolution outside; Promote moves
file copy + SSH-key seeding outside; Delete keeps a brief lock over
the lookup + reference check + store delete and does file cleanup
unlocked.

Two concurrency tests assert the new behaviour:
 - TestPullImageDoesNotSerialiseOnDifferentNames fails the old code
   (second pull blocks on imageOpsMu and never reaches the body).
 - TestPullImageRejectsNameClashAtPublish confirms the publish-window
   recheck is what enforces name uniqueness now that the body runs
   unlocked — exactly one winner.

ARCHITECTURE.md updated to describe the new scope explicitly instead
of calling the locks "narrow".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 13:44:22 -03:00
e0894376ea
vm create: auto-pull image and kernel from catalogs if missing
One-command sandbox: `banger vm run` on a fresh host now Just Works.
No prior `banger image pull` or `banger kernel pull` needed.

Changes:

- Default `default_image_name` flips from "default" to "debian-bookworm"
  so the golden image is the implicit target when `--image` is omitted.
- `CreateVM` resolves the image via a new `findOrAutoPullImage`: try
  the local store first, and on miss fall back to the embedded imagecat
  catalog + auto-pull. Emits a vm-create progress stage so the user
  sees "pulling from image catalog" in the create output.
- `resolveKernelInputs` gains context + the same pattern via
  `readOrAutoPullKernel`: try the local kernelcat, and on miss look up
  the embedded kernelcat and auto-pull. Fires whenever a bundle's
  manifest references a kernel the user hasn't pulled yet, not just
  during image pull — any CreateVM with an image that needs a kernel
  not yet local will resolve it.
- `--image` help text updated on both `vm run` and `vm create`.

Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog
error propagation, and a non-ENOENT kernel read error does NOT trigger
a misleading "not in catalog" claim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 15:10:26 -03:00
59f2766139
Move subsystem state/locks off Daemon into owning types
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:

- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
  vm-create and image-build operation maps (and their mutexes) into one
  shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
  lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
  createVMMu (CreateVM serialisation) so image ops and VM creates no
  longer block each other.

Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.

No behavior change; tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:58:33 -03:00
ea0db1e17e
Split internal/daemon vm.go and guest_sessions.go by concern
vm.go (1529 LOC) splits into vm_create, vm_lifecycle, vm_set, vm_stats,
vm_disk, vm_authsync; firecracker/DNS/helpers stay in vm.go.

guest_sessions.go (1266 LOC) splits into session_controller,
session_lifecycle, session_attach, session_stream; scripts and helpers
stay in guest_sessions.go.

Mechanical move only. No behavior change. Adds doc.go and
ARCHITECTURE.md capturing subsystem map and current lock ordering as
the baseline for the upcoming subsystem extraction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 15:47:08 -03:00