A VM name flows into five places that all have narrower grammars
than "arbitrary string":
- the guest's /etc/hostname (vm_disk.patchRootOverlay)
- the guest's /etc/hosts (same)
- the <name>.vm DNS record (vmdns.RecordName)
- the kernel command line (system.BuildBootArgs*)
- VM-dir file-path fragments (layout.VMsDir/<id>, etc.)
Nothing in the chain was validating the input. A name with
whitespace, newline, dot, slash, colon, or = would produce broken
hostnames, weird DNS labels, smuggled kernel cmdline tokens, or
(in the worst case) surprising traversal through the on-disk
layout. Not host shell injection — we already avoid shelling out
with the raw name — but a real correctness and supportability bug.
New: model.ValidateVMName. Rules:
- 1..63 chars (DNS label max per RFC 1123; also a comfortable
/etc/hostname cap)
- lowercase ASCII letters, digits, '-' only
- no leading or trailing '-'
- no normalization — the name is the user-visible identifier
(store key, `ssh <name>.vm`, `vm show`); silently rewriting
"MyVM" → "myvm" would hand the user back something different
than they typed
Called from two places:
- internal/cli/commands_vm.go vmCreateParamsFromFlags — rejects
bad `--name` values before any RPC. Empty name still passes
through so the daemon can generate one.
- internal/daemon/vm_create.go reserveVM — defense in depth for
any non-CLI RPC caller (SDK, direct JSON over the socket).
Tests:
- internal/model/vm_name_test.go — exhaustive character-class
matrix (space, newline, tab, dot, slash, colon, equals, quote,
control chars, unicode letters, uppercase, leading/trailing
hyphen, over-length, max-length-exact, digits-only).
- internal/cli TestVMCreateParamsFromFlagsRejectsInvalidName —
CLI wire-through + empty-name passthrough.
- internal/daemon TestReserveVMRejectsInvalidName — daemon
defense-in-depth (including `box/../evil` path-traversal).
- scripts/smoke.sh — end-to-end rejection + no-leaked-row
assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of the daemon god-struct refactor. VM lifecycle, create-op
registry, handle cache, disk provisioning, stats polling, ports
query, and the per-VM lock set all move off *Daemon onto *VMService.
Daemon keeps thin forwarders only for FindVM / TouchVM (dispatch
surface) and is otherwise out of VM lifecycle. Lazy-init via
d.vmSvc() mirrors the earlier services so test literals like
\`&Daemon{store: db, runner: r}\` still get a functional service
without spelling one out.
Three small cleanups along the way:
* preflight helpers (validateStartPrereqs / addBaseStartPrereqs
/ addBaseStartCommandPrereqs / validateWorkDiskResizePrereqs)
move with the VM methods that call them.
* cleanupRuntime / rebuildDNS move to *VMService, with
HostNetwork primitives (findFirecrackerPID, cleanupDMSnapshot,
killVMProcess, releaseTap, waitForExit, sendCtrlAltDel)
reached through s.net instead of the hostNet() facade.
* vsockAgentBinary becomes a package-level function so both
*Daemon (doctor) and *VMService (preflight) call one entry
point instead of each owning a forwarder method.
WorkspaceService's peer deps switch from eager method values to
closures — vmSvc() constructs VMService with WorkspaceService as a
peer, so resolving d.vmSvc().FindVM at construction time recursed
through workspaceSvc() → vmSvc(). Closures defer the lookup to call
time.
Pure code motion: build + unit tests green, lint clean. No RPC
surface or lock-ordering changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second phase of splitting the daemon god-struct. ImageService now owns
all image + kernel registry operations: register/promote/delete/pull
for images (bundle + OCI paths), the six kernel commands, and the
shared SSH-key/work-seed injection helpers. imageOpsMu (the
publication-window lock) lives on the service; so do the three OCI
pull test seams pullAndFlatten / finalizePulledRootfs / bundleFetch.
The four files images.go, images_pull.go, image_seed.go, kernels.go
flipped their receivers from *Daemon to *ImageService.
FindImage moved with the service. Daemon keeps a thin FindImage
forwarder so callers reading the dispatch code see the obvious
facade and tests that pre-date the split still compile.
flattenNestedWorkHome — called from image_seed.go, vm_authsync.go,
and vm_disk.go across future service boundaries — became a
package-level helper taking a CommandRunner explicitly. Daemon keeps
a deprecated forwarder for now; the other services will use the
package form.
Lazy-init helper imageSvc() on Daemon mirrors hostNet() from
Phase 1, so test literals like &Daemon{store: db, runner: r, ...}
that don't spell out an ImageService still get a working one.
Tests that override the image test seams (autopull_test,
concurrency_test, images_pull_test, images_pull_bundle_test) now
assign d.img = &ImageService{...seams...}; the two-statement pattern
matches what Phase 1 established for HostNetwork.
Dispatch in daemon.go is cleaner now: every image/kernel RPC handler
is a single-liner forwarding to d.imageSvc().*. Phase 5 will do the
same for VM lifecycle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reserveVM's duplicate-name guard routed through Daemon.FindVM, which
falls back to prefix-matching on both ids and names when no exact
match is found. That turns the uniqueness check into a correctness
bug: a brand-new VM name can be rejected because it happens to
prefix an existing VM's id, or an existing VM's name. So `vm create
--name beta` fails when `beta-sandbox` already exists.
Swap in a dedicated store.GetVMByName that does a literal `WHERE
name = ?` lookup, and use it from reserveVM. FindVM keeps its
prefix-matching behaviour for user-facing lookup paths (`vm ssh
<partial>`, `vm stop <partial>`) where "did you mean" semantics
are the feature.
Tests:
- TestReserveVMAllowsNameThatPrefixesExistingVM — seeds a VM whose
id + name both start with "longname", then reserves two new VMs
named "longname" and "longname-sandbox". Both must succeed.
Under the old FindVM-based check, both would fail.
- TestReserveVMRejectsExactDuplicateName — actual collisions are
still rejected after the swap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: createVMMu was held across the whole of CreateVM — including
image resolution (which could fire a full auto-pull) and startVMLocked
(boot of multiple seconds). imageOpsMu was held across the whole of
PullImage/RegisterImage/PromoteImage/DeleteImage, so any slow OCI pull,
bundle download, or file copy blocked every other image mutation and
every other VM create that needed to auto-pull. The async create API
bought nothing if all creates serialised on the same mutex.
CreateVM is now three phases:
1. Validate + resolve image (possibly auto-pulling). No global lock.
2. reserveVM: take createVMMu only long enough to re-check the name
is free, allocate the next guest IP, and UpsertVM the "created"
row. Milliseconds.
3. startVMLocked: run the full boot flow under the per-VM lock only.
Parallel creates of different VMs now overlap on image resolution +
boot; they contend only across the reservation claim.
For the image surface a new publishImage helper isolates the commit
atom (recheck name free, atomic rename stagingDir→finalDir, UpsertImage)
under imageOpsMu. pullFromBundle + pullFromOCI do their network fetch
+ ext4 build + ownership fixup + agent injection outside the lock;
Register moves validation + kernel resolution outside; Promote moves
file copy + SSH-key seeding outside; Delete keeps a brief lock over
the lookup + reference check + store delete and does file cleanup
unlocked.
Two concurrency tests assert the new behaviour:
- TestPullImageDoesNotSerialiseOnDifferentNames fails the old code
(second pull blocks on imageOpsMu and never reaches the body).
- TestPullImageRejectsNameClashAtPublish confirms the publish-window
recheck is what enforces name uniqueness now that the body runs
unlocked — exactly one winner.
ARCHITECTURE.md updated to describe the new scope explicitly instead
of calling the locks "narrow".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One-command sandbox: `banger vm run` on a fresh host now Just Works.
No prior `banger image pull` or `banger kernel pull` needed.
Changes:
- Default `default_image_name` flips from "default" to "debian-bookworm"
so the golden image is the implicit target when `--image` is omitted.
- `CreateVM` resolves the image via a new `findOrAutoPullImage`: try
the local store first, and on miss fall back to the embedded imagecat
catalog + auto-pull. Emits a vm-create progress stage so the user
sees "pulling from image catalog" in the create output.
- `resolveKernelInputs` gains context + the same pattern via
`readOrAutoPullKernel`: try the local kernelcat, and on miss look up
the embedded kernelcat and auto-pull. Fires whenever a bundle's
manifest references a kernel the user hasn't pulled yet, not just
during image pull — any CreateVM with an image that needs a kernel
not yet local will resolve it.
- `--image` help text updated on both `vm run` and `vm create`.
Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog
error propagation, and a non-ENOENT kernel read error does NOT trigger
a misleading "not in catalog" claim.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:
- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
vm-create and image-build operation maps (and their mutexes) into one
shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
createVMMu (CreateVM serialisation) so image ops and VM creates no
longer block each other.
Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.
No behavior change; tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vm.go (1529 LOC) splits into vm_create, vm_lifecycle, vm_set, vm_stats,
vm_disk, vm_authsync; firecracker/DNS/helpers stay in vm.go.
guest_sessions.go (1266 LOC) splits into session_controller,
session_lifecycle, session_attach, session_stream; scripts and helpers
stay in guest_sessions.go.
Mechanical move only. No behavior change. Adds doc.go and
ARCHITECTURE.md capturing subsystem map and current lock ordering as
the baseline for the upcoming subsystem extraction.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>