workspace prepare: release VM mutex before guest I/O

Previously withVMLockByRef held the per-VM mutex across InspectRepo,
waitForGuestSSH, dialGuest, ImportRepoToGuest (the tar stream!), and
the readonly chmod. A large repo could block `vm stop` / `vm delete`
/ `vm restart` on the same VM for however long the import took.

Split into two phases:

  1. VM mutex held briefly to validate state (running + PID alive)
     and snapshot the fields needed for SSH (guest IP, api sock).
  2. VM mutex released. Acquire workspaceLocks[id] — a separate
     per-VM mutex scoped to workspace.prepare / workspace.export —
     for the guest I/O phase.

Lifecycle ops (stop/delete/restart/set) only take vmLocks, so they
no longer queue behind a slow import. Two concurrent prepares on the
same VM still serialise via workspaceLocks so tar streams don't
interleave. ExportVMWorkspace also acquires workspaceLocks to avoid
snapshotting a half-streamed import.

Two regression tests (sequential — they swap package-level seams):

  ReleasesVMLockDuringGuestIO: stall the import fake, assert the VM
  mutex is acquirable from another goroutine during the stall.

  SerialisesConcurrentPreparesOnSameVM: 3 concurrent prepares, assert
  Import is only ever invoked 1-at-a-time per VM.

ARCHITECTURE.md documents the split + updated lock ordering.
This commit is contained in:
Thales Maciel 2026-04-19 13:32:42 -03:00
parent 99de42385f
commit 6cd52d12f4
No known key found for this signature in database
GPG key ID: 33112E6833C34679
4 changed files with 265 additions and 22 deletions

View file

@ -30,15 +30,21 @@ import (
)
type Daemon struct {
layout paths.Layout
config model.DaemonConfig
store *store.Store
runner system.CommandRunner
logger *slog.Logger
imageOpsMu sync.Mutex
createVMMu sync.Mutex
createOps opstate.Registry[*vmCreateOperationState]
vmLocks vmLockSet
layout paths.Layout
config model.DaemonConfig
store *store.Store
runner system.CommandRunner
logger *slog.Logger
imageOpsMu sync.Mutex
createVMMu sync.Mutex
createOps opstate.Registry[*vmCreateOperationState]
vmLocks vmLockSet
// workspaceLocks serialises workspace.prepare / workspace.export
// calls on the same VM (two concurrent prepares would clobber each
// other's tar streams). It is a SEPARATE scope from vmLocks so
// slow guest I/O — SSH dial, tar upload, chmod — does not block
// vm stop/delete/restart. See ARCHITECTURE.md.
workspaceLocks vmLockSet
sessions sessionRegistry
tapPool tapPool
closing chan struct{}