Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem from `vm.create` paths assuming single-caller semantics: 1. **Kernel auto-pull manifest race.** Parallel `vm.create` calls that each need to auto-pull the same kernel ref both run kernelcat.Fetch in parallel against the same /var/lib/banger/kernels/<name>/. Fetch writes manifest.json non-atomically (truncate + write); the peer reads it back mid-write and trips "parse manifest for X: unexpected end of JSON input". Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock). `KernelPull` and `readOrAutoPullKernel` both acquire it and re-check `kernelcat.ReadLocal` after the lock so a peer who finished while we waited is treated as success — `readOrAutoPullKernel` does NOT call `s.KernelPull` because that path errors with "already pulled" on a peer-success, which would be wrong for auto-pull. Different kernels stay parallel. 2. **Image auto-pull race.** Same shape as the kernel race but on the image side: parallel `vm.create` calls both run pullFromBundle / pullFromOCI for the missing image (each ~minutes of OCI fetch + ext4 build). The publishImage atom under imageOpsMu only protects the rename + UpsertImage commit, so the loser does all the work only to fail at the recheck with "image already exists". Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock). `findOrAutoPullImage` acquires it, re-checks FindImage, and only then calls PullImage. Loser short-circuits with the freshly-published image instead of redoing minutes of work. PullImage's own publishImage recheck stays as defense-in-depth for callers that bypass the auto-pull path. 3. **Work-seed refresh race.** When the host's SSH key has rotated since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk` triggers `refreshManagedWorkSeedFingerprint`, which rewrote the shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create` calls doing parallel `MaterializeWorkDisk` rdumps observed a torn ext4 image — "Superblock checksum does not match superblock". Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`) and atomic-rename. Concurrent readers either have the file open (kernel keeps the pre-rename inode alive) or open after the rename (see the new inode) — never observe a partial state. Two parallel refreshes are idempotent (same daemon, same SSH key) so unique tmp names are enough; whichever rename lands last wins, with identical content. UpsertImage runs after the rename so the recorded fingerprint always matches what's on disk. Plus one smoke harness fix: reclassify `vm_prune` from `pure` to `global`. `vm prune -f` removes ALL stopped VMs system-wide, not just the ones the scenario created — so a parallel peer scenario that happens to have its VM in `created`/`stopped` momentarily gets wiped. Moving prune to the post-pool serial phase keeps it from racing with in-flight scenarios. After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s (serial baseline 141s; the small overhead is the buffered-output and `wait -n` semaphore cost — well worth the parallelism for fast-iter work on a 32-core box). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
88 lines
3.1 KiB
Go
88 lines
3.1 KiB
Go
package daemon
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"os"
|
|
"strings"
|
|
"time"
|
|
|
|
"banger/internal/guest"
|
|
"banger/internal/model"
|
|
"banger/internal/system"
|
|
)
|
|
|
|
func (s *ImageService) seedAuthorizedKeyOnExt4Image(ctx context.Context, imagePath string) (string, error) {
|
|
if strings.TrimSpace(s.config.SSHKeyPath) == "" {
|
|
return "", nil
|
|
}
|
|
fingerprint, err := guest.AuthorizedPublicKeyFingerprint(s.config.SSHKeyPath)
|
|
if err != nil {
|
|
return "", fmt.Errorf("derive authorized ssh key fingerprint: %w", err)
|
|
}
|
|
publicKey, err := guest.AuthorizedPublicKey(s.config.SSHKeyPath)
|
|
if err != nil {
|
|
return "", fmt.Errorf("derive authorized ssh key: %w", err)
|
|
}
|
|
if err := provisionAuthorizedKey(ctx, s.runner, imagePath, publicKey); err != nil {
|
|
return "", err
|
|
}
|
|
return fingerprint, nil
|
|
}
|
|
|
|
// refreshManagedWorkSeedFingerprint re-seeds work-seed.ext4 with the
|
|
// daemon's current SSH key when a previously-stored fingerprint has
|
|
// gone stale (host key rotated, image rebuilt without a new seed).
|
|
//
|
|
// This path is reachable from concurrent vm.create RPCs: each one
|
|
// reads the same stale image.SeededSSHPublicKeyFingerprint from the
|
|
// store and races into here. Modifying the seed in place via
|
|
// e2rm/e2cp is not concurrent-read-safe — peer vm.create calls doing
|
|
// `MaterializeWorkDisk` in parallel `RdumpExt4Dir` the seed and
|
|
// observe a torn ext4 image ("Superblock checksum does not match").
|
|
//
|
|
// Fix: stage the rewrite on a sibling tmpfile and atomic-rename. A
|
|
// concurrent reader either has the file open (kernel keeps the
|
|
// pre-rename inode alive) or opens after the rename (sees the new
|
|
// inode) — never observes a partial state. Two concurrent refreshes
|
|
// are idempotent (same daemon, same SSH key) so unique tmp suffixes
|
|
// are enough; whichever rename lands last wins, with identical
|
|
// content. UpsertImage runs after the rename so the recorded
|
|
// fingerprint always matches what's actually on disk for any reader
|
|
// that picks up the image record after this point.
|
|
func (s *ImageService) refreshManagedWorkSeedFingerprint(ctx context.Context, image model.Image, fingerprint string) error {
|
|
if !image.Managed || strings.TrimSpace(image.WorkSeedPath) == "" || strings.TrimSpace(fingerprint) == "" {
|
|
return nil
|
|
}
|
|
|
|
// Unique sibling tmp path: same dir guarantees a same-FS rename.
|
|
// Two concurrent refreshes get distinct paths so they don't clobber
|
|
// each other's tmpfile mid-write.
|
|
tmpPath := fmt.Sprintf("%s.refresh.%d-%d.tmp", image.WorkSeedPath, os.Getpid(), time.Now().UnixNano())
|
|
if err := system.CopyFilePreferClone(image.WorkSeedPath, tmpPath); err != nil {
|
|
return fmt.Errorf("stage seed for refresh: %w", err)
|
|
}
|
|
committed := false
|
|
defer func() {
|
|
if !committed {
|
|
_ = os.Remove(tmpPath)
|
|
}
|
|
}()
|
|
|
|
seededFingerprint, err := s.seedAuthorizedKeyOnExt4Image(ctx, tmpPath)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
if seededFingerprint == "" || seededFingerprint == image.SeededSSHPublicKeyFingerprint {
|
|
return nil
|
|
}
|
|
|
|
if err := os.Rename(tmpPath, image.WorkSeedPath); err != nil {
|
|
return fmt.Errorf("commit seed refresh: %w", err)
|
|
}
|
|
committed = true
|
|
|
|
image.SeededSSHPublicKeyFingerprint = seededFingerprint
|
|
image.UpdatedAt = model.Now()
|
|
return s.store.UpsertImage(ctx, image)
|
|
}
|