vm state: split transient kernel/process handles off the durable schema
Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.
Why:
Persisting ephemeral process handles to SQLite forced reconcile to
treat "running with a stale PID" as a first-class case and mix it
with real state transitions. The schema described what we last
observed, not what the VM is. Every time the observation model
shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
grew a new branch. Splitting lets each layer own what it's good at:
durable records describe intent, in-memory cache + scratch file
describe momentary reality.
Shape:
- `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
DMDev. Never in SQLite.
- `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
WorkDiskPath, LastError. All durable or deterministic.
- `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
plumbing (`writeHandlesFile` / `readHandlesFile` /
`rediscoverHandles`). See `internal/daemon/vm_handles.go`.
- `d.vmAlive(vm)` replaces the 20+ inline
`vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
spreads. Single source of truth for liveness.
- Startup reconcile: per running VM, load the scratch file, pgrep
the api sock, either keep (cache seeded from scratch) or demote
to stopped (scratch handles passed to cleanupRuntime first so DM
/ loops / tap actually get torn down).
Verification:
- `go test ./...` green.
- Live: `banger vm run --name handles-test -- cat /etc/hostname`
starts; `handles.json` appears in VMDir with the expected PID,
tap, loops, DM.
- `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
CLI, daemon auto-starts, reconcile recognises the VM as alive,
`banger vm ssh` still connects, `banger vm delete` cleans up.
Tests added:
- vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
behaviour, cache concurrency, rediscoverHandles prefers pgrep
over scratch, returns scratch contents even when process is
dead (so cleanup can tear down kernel state).
- vm_test.go: reconcile test rewritten to exercise the new flow
(write scratch → reconcile reads it → verifies process is gone →
issues dmsetup/losetup teardown).
ARCHITECTURE.md updated; `handles` added to Daemon field docs.
This commit is contained in:
parent
2e6e64bc04
commit
687fcf0b59
27 changed files with 688 additions and 152 deletions
|
|
@ -107,11 +107,22 @@ type VMSpec struct {
|
|||
NATEnabled bool `json:"nat_enabled"`
|
||||
}
|
||||
|
||||
// VMRuntime holds the durable runtime state that the daemon needs
|
||||
// to reach a VM: identity, declared state, and deterministic derived
|
||||
// paths. Transient kernel/process handles (PID, tap, loop devices,
|
||||
// dm-snapshot names) live on VMHandles, NOT here — the daemon keeps
|
||||
// them in an in-memory cache backed by a per-VM handles.json scratch
|
||||
// file, so a daemon restart rebuilds them from OS state rather than
|
||||
// trusting whatever was last written into a SQLite column.
|
||||
//
|
||||
// Everything in VMRuntime is safe to persist: the paths are
|
||||
// deterministic from (VM ID, layout) and survive restart unchanged;
|
||||
// GuestIP and DNSName are assigned at create time and never move;
|
||||
// LastError carries the last failure message for debugging. State
|
||||
// mirrors VMRecord.State.
|
||||
type VMRuntime struct {
|
||||
State VMState `json:"state"`
|
||||
PID int `json:"pid,omitempty"`
|
||||
GuestIP string `json:"guest_ip"`
|
||||
TapDevice string `json:"tap_device,omitempty"`
|
||||
APISockPath string `json:"api_sock_path,omitempty"`
|
||||
VSockPath string `json:"vsock_path,omitempty"`
|
||||
VSockCID uint32 `json:"vsock_cid,omitempty"`
|
||||
|
|
@ -121,10 +132,6 @@ type VMRuntime struct {
|
|||
VMDir string `json:"vm_dir"`
|
||||
SystemOverlay string `json:"system_overlay_path"`
|
||||
WorkDiskPath string `json:"work_disk_path"`
|
||||
BaseLoop string `json:"base_loop,omitempty"`
|
||||
COWLoop string `json:"cow_loop,omitempty"`
|
||||
DMName string `json:"dm_name,omitempty"`
|
||||
DMDev string `json:"dm_dev,omitempty"`
|
||||
LastError string `json:"last_error,omitempty"`
|
||||
}
|
||||
|
||||
|
|
|
|||
51
internal/model/vm_handles.go
Normal file
51
internal/model/vm_handles.go
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
package model
|
||||
|
||||
// VMHandles captures the transient, per-boot kernel/process handles
|
||||
// that banger obtains while starting a VM and releases when stopping
|
||||
// it. Unlike VMRuntime (durable spec + identity + derived paths),
|
||||
// nothing in VMHandles survives a daemon restart in authoritative
|
||||
// form: each value is either rediscovered from the OS (PID from the
|
||||
// firecracker api socket, DM name deterministically from the VM ID)
|
||||
// or read from a per-VM scratch file that the daemon rebuilds at
|
||||
// every start.
|
||||
//
|
||||
// The daemon keeps an in-memory cache keyed by VM ID. Lifecycle
|
||||
// transitions update the cache and a small `handles.json` scratch
|
||||
// file in the VM's state directory; daemon startup reconciles
|
||||
// by loading that file and verifying each handle against the live
|
||||
// OS state. If anything is stale the VM is marked stopped and the
|
||||
// cache entry is dropped.
|
||||
//
|
||||
// VMHandles never appears in the `vms` SQLite rows. Keeping it off
|
||||
// the durable schema was the whole point of the split — persistent
|
||||
// records describe what a VM SHOULD be; handles describe what is
|
||||
// currently true about it.
|
||||
type VMHandles struct {
|
||||
// PID is the firecracker process PID. Zero means "not running
|
||||
// (from our perspective)". Always verifiable via
|
||||
// /proc/<pid>/cmdline matching the api socket path.
|
||||
PID int `json:"pid,omitempty"`
|
||||
|
||||
// TapDevice is the kernel tap interface name (e.g. "tap-fc-0001")
|
||||
// bound to the VM's virtio-net. Released on stop.
|
||||
TapDevice string `json:"tap_device,omitempty"`
|
||||
|
||||
// BaseLoop and COWLoop are the two loop devices backing the
|
||||
// dm-snapshot layer (read-only base = rootfs; read-write overlay
|
||||
// = per-VM COW file). Released via losetup -d on stop.
|
||||
BaseLoop string `json:"base_loop,omitempty"`
|
||||
COWLoop string `json:"cow_loop,omitempty"`
|
||||
|
||||
// DMName is the device-mapper target name; deterministic from the
|
||||
// VM ID (see dmsnap.SnapshotName). DMDev is the corresponding
|
||||
// /dev/mapper/<name> path. Torn down by `dmsetup remove` on stop.
|
||||
DMName string `json:"dm_name,omitempty"`
|
||||
DMDev string `json:"dm_dev,omitempty"`
|
||||
}
|
||||
|
||||
// IsZero reports whether every handle field is unset. Useful as a
|
||||
// cheap "this VM has no kernel/process resources held on our behalf"
|
||||
// check.
|
||||
func (h VMHandles) IsZero() bool {
|
||||
return h.PID == 0 && h.TapDevice == "" && h.BaseLoop == "" && h.COWLoop == "" && h.DMName == "" && h.DMDev == ""
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue