Preserve cleanup after daemon restarts and harden OCI and tar imports
against filenames that debugfs cannot encode safely.
Mirror tap, loop, and dm teardown identity onto VM.Runtime, teach
cleanup and reconcile to fall back to those persisted fields when
handles.json is missing or corrupt, and clear the recovery state on
stop, error, and delete paths.
Reject debugfs-hostile entry names during flattening and in
ApplyOwnership itself, then add regression coverage for corrupt
handles.json recovery and unsafe import paths.
Verified with targeted go tests, make lint-go, make lint-shell, and
make build.
Closes commit 3 of the god-service decomposition. VMService still
owned 45+ methods after the startVMLocked extraction and RPC table
landed in commits 1 and 2. Stats / ports / health / vsock-ping sit
in a corner of that surface that doesn't share any state with
lifecycle orchestration — nothing about "what's this VM's CPU
doing" belongs in the same service as Create/Start/Stop/Delete/Set.
New StatsService owns:
- GetVMStats / getVMStatsLocked / collectStats (stats collection)
- HealthVM / PingVM (vsock-agent health probe)
- PortsVM + buildVMPorts + probeWebListener + probeHTTPScheme +
dedupeVMPorts (listening-port enumeration)
- pollStats (background ticker refresh)
- stopStaleVMs (auto-stop sweep past config.AutoStopStaleAfter)
The three VMService touch-points stats genuinely needs — vmAlive,
vmHandles, the per-VM lock helpers, plus cleanupRuntime for the
stale-sweep tear-down — come in as function-typed closures, not a
*VMService pointer. StatsService has no back-reference to its
sibling. Mirrors the dependency-struct pattern WorkspaceService
already uses.
Wiring: d.stats is populated in wireServices AFTER d.vm (closures
must see a non-nil d.vm at call time). Dispatch table's four
entries (vm.stats / vm.health / vm.ping / vm.ports) now resolve
through d.stats. Background loop's pollStats / stopStaleVMs
tickers do the same. Dispatch surface from the RPC client's
perspective is byte-identical.
After this commit:
- vm_stats.go and ports.go are deleted; their content (plus the
stats-specific fields) lives in stats_service.go.
- VMService loses 12 methods. It's still the biggest service
(~30 methods, all lifecycle-supporting: handle cache, disk
provisioning, preflight, create-ops registry, lock helpers,
the lifecycle verbs themselves) but it's finally one coherent
concern instead of five.
Tests:
- TestWireServicesInstantiatesStatsService — pins that the
wiring order puts d.stats non-nil + its five closures all
populated. Prevents a silent background-loop regression.
- All existing tests that called d.vm.HealthVM / d.vm.PingVM /
d.vm.PortsVM / d.vm.collectStats were re-pointed at d.stats.
Smoke: all 21 scenarios green, including vm ports (exercises the
new PortsVM entry end-to-end) and the long-running workspace
scenarios (exercise the background stats poller implicitly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>