Separates what a VM IS (durable intent + identity + deterministic
derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it
(firecracker PID, tap device, loop devices, dm-snapshot target — new
`VMHandles`). The durable state lives in the SQLite `vms` row; the
transient state lives in an in-memory cache on the daemon plus a
per-VM `handles.json` scratch file inside VMDir, rebuilt at startup
from OS inspection. Nothing kernel-level rides the SQLite schema
anymore.
Why:
Persisting ephemeral process handles to SQLite forced reconcile to
treat "running with a stale PID" as a first-class case and mix it
with real state transitions. The schema described what we last
observed, not what the VM is. Every time the observation model
shifted (tap pool, DM naming, pgrep fallback) the reconcile logic
grew a new branch. Splitting lets each layer own what it's good at:
durable records describe intent, in-memory cache + scratch file
describe momentary reality.
Shape:
- `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName,
DMDev. Never in SQLite.
- `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath,
VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay,
WorkDiskPath, LastError. All durable or deterministic.
- `handleCache` on `*Daemon` — mutex-guarded map + scratch-file
plumbing (`writeHandlesFile` / `readHandlesFile` /
`rediscoverHandles`). See `internal/daemon/vm_handles.go`.
- `d.vmAlive(vm)` replaces the 20+ inline
`vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)`
spreads. Single source of truth for liveness.
- Startup reconcile: per running VM, load the scratch file, pgrep
the api sock, either keep (cache seeded from scratch) or demote
to stopped (scratch handles passed to cleanupRuntime first so DM
/ loops / tap actually get torn down).
Verification:
- `go test ./...` green.
- Live: `banger vm run --name handles-test -- cat /etc/hostname`
starts; `handles.json` appears in VMDir with the expected PID,
tap, loops, DM.
- `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the
CLI, daemon auto-starts, reconcile recognises the VM as alive,
`banger vm ssh` still connects, `banger vm delete` cleans up.
Tests added:
- vm_handles_test.go: scratch-file roundtrip, missing/corrupt file
behaviour, cache concurrency, rediscoverHandles prefers pgrep
over scratch, returns scratch contents even when process is
dead (so cleanup can tear down kernel state).
- vm_test.go: reconcile test rewritten to exercise the new flow
(write scratch → reconcile reads it → verifies process is gone →
issues dmsetup/losetup teardown).
ARCHITECTURE.md updated; `handles` added to Daemon field docs.
213 lines
7.6 KiB
Go
213 lines
7.6 KiB
Go
package daemon
|
|
|
|
import (
|
|
"context"
|
|
"fmt"
|
|
"os"
|
|
"path/filepath"
|
|
"strconv"
|
|
"strings"
|
|
|
|
"banger/internal/guestconfig"
|
|
"banger/internal/guestnet"
|
|
"banger/internal/model"
|
|
"banger/internal/system"
|
|
)
|
|
|
|
type workDiskPreparation struct {
|
|
ClonedFromSeed bool
|
|
}
|
|
|
|
func (d *Daemon) ensureSystemOverlay(ctx context.Context, vm *model.VMRecord) error {
|
|
if exists(vm.Runtime.SystemOverlay) {
|
|
return nil
|
|
}
|
|
_, err := d.runner.Run(ctx, "truncate", "-s", strconv.FormatInt(vm.Spec.SystemOverlaySizeByte, 10), vm.Runtime.SystemOverlay)
|
|
return err
|
|
}
|
|
|
|
// patchRootOverlay writes the per-VM config files (resolv.conf,
|
|
// hostname, hosts, sshd drop-in, network bootstrap, fstab) into the
|
|
// rootfs overlay. Reads the DM device path from the handle cache,
|
|
// which the start flow populates before calling this.
|
|
func (d *Daemon) patchRootOverlay(ctx context.Context, vm model.VMRecord, image model.Image) error {
|
|
dmDev := d.vmHandles(vm.ID).DMDev
|
|
if dmDev == "" {
|
|
return fmt.Errorf("vm %q: DM device not in handle cache — start flow out of order?", vm.ID)
|
|
}
|
|
resolv := []byte(fmt.Sprintf("nameserver %s\n", d.config.DefaultDNS))
|
|
hostname := []byte(vm.Name + "\n")
|
|
hosts := []byte(fmt.Sprintf("127.0.0.1 localhost\n127.0.1.1 %s\n", vm.Name))
|
|
sshdConfig := []byte(sshdGuestConfig())
|
|
fstab, err := system.ReadDebugFSText(ctx, d.runner, dmDev, "/etc/fstab")
|
|
if err != nil {
|
|
fstab = ""
|
|
}
|
|
builder := guestconfig.NewBuilder()
|
|
builder.WriteFile("/etc/resolv.conf", resolv)
|
|
builder.WriteFile("/etc/hostname", hostname)
|
|
builder.WriteFile("/etc/hosts", hosts)
|
|
builder.WriteFile(guestnet.ConfigPath, guestnet.ConfigFile(vm.Runtime.GuestIP, d.config.BridgeIP, d.config.DefaultDNS))
|
|
builder.WriteFile(guestnet.GuestScriptPath, []byte(guestnet.BootstrapScript()))
|
|
builder.WriteFile("/etc/ssh/sshd_config.d/99-banger.conf", sshdConfig)
|
|
builder.DropMountTarget("/home")
|
|
builder.DropMountTarget("/var")
|
|
builder.AddMount(guestconfig.MountSpec{
|
|
Source: "tmpfs",
|
|
Target: "/run",
|
|
FSType: "tmpfs",
|
|
Options: []string{"defaults", "nodev", "nosuid", "mode=0755"},
|
|
Dump: 0,
|
|
Pass: 0,
|
|
})
|
|
builder.AddMount(guestconfig.MountSpec{
|
|
Source: "tmpfs",
|
|
Target: "/tmp",
|
|
FSType: "tmpfs",
|
|
Options: []string{"defaults", "nodev", "nosuid", "mode=1777"},
|
|
Dump: 0,
|
|
Pass: 0,
|
|
})
|
|
d.contributeGuestConfig(builder, vm, image)
|
|
builder.WriteFile("/etc/fstab", []byte(builder.RenderFSTab(fstab)))
|
|
files := builder.Files()
|
|
for _, guestPath := range builder.FilePaths() {
|
|
data := files[guestPath]
|
|
if guestPath == guestnet.GuestScriptPath {
|
|
if err := system.WriteExt4FileMode(ctx, d.runner, dmDev, guestPath, 0o755, data); err != nil {
|
|
return err
|
|
}
|
|
continue
|
|
}
|
|
if err := system.WriteExt4File(ctx, d.runner, dmDev, guestPath, data); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
|
|
func (d *Daemon) ensureWorkDisk(ctx context.Context, vm *model.VMRecord, image model.Image) (workDiskPreparation, error) {
|
|
if exists(vm.Runtime.WorkDiskPath) {
|
|
return workDiskPreparation{}, nil
|
|
}
|
|
if exists(image.WorkSeedPath) {
|
|
vmCreateStage(ctx, "prepare_work_disk", "cloning work seed")
|
|
if err := system.CopyFilePreferClone(image.WorkSeedPath, vm.Runtime.WorkDiskPath); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
seedInfo, err := os.Stat(image.WorkSeedPath)
|
|
if err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
if vm.Spec.WorkDiskSizeBytes < seedInfo.Size() {
|
|
return workDiskPreparation{}, fmt.Errorf("requested work disk size %d is smaller than seed image %d", vm.Spec.WorkDiskSizeBytes, seedInfo.Size())
|
|
}
|
|
if vm.Spec.WorkDiskSizeBytes > seedInfo.Size() {
|
|
vmCreateStage(ctx, "prepare_work_disk", "resizing work disk")
|
|
if err := system.ResizeExt4Image(ctx, d.runner, vm.Runtime.WorkDiskPath, vm.Spec.WorkDiskSizeBytes); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
}
|
|
return workDiskPreparation{ClonedFromSeed: true}, nil
|
|
}
|
|
vmCreateStage(ctx, "prepare_work_disk", "creating empty work disk")
|
|
if _, err := d.runner.Run(ctx, "truncate", "-s", strconv.FormatInt(vm.Spec.WorkDiskSizeBytes, 10), vm.Runtime.WorkDiskPath); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
if _, err := d.runner.Run(ctx, "mkfs.ext4", "-F", vm.Runtime.WorkDiskPath); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
dmDev := d.vmHandles(vm.ID).DMDev
|
|
if dmDev == "" {
|
|
return workDiskPreparation{}, fmt.Errorf("vm %q: DM device not in handle cache", vm.ID)
|
|
}
|
|
rootMount, cleanupRoot, err := system.MountTempDir(ctx, d.runner, dmDev, true)
|
|
if err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
defer cleanupRoot()
|
|
workMount, cleanupWork, err := system.MountTempDir(ctx, d.runner, vm.Runtime.WorkDiskPath, false)
|
|
if err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
defer cleanupWork()
|
|
vmCreateStage(ctx, "prepare_work_disk", "copying /root into work disk")
|
|
if err := system.CopyDirContents(ctx, d.runner, filepath.Join(rootMount, "root"), workMount, true); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
if err := d.flattenNestedWorkHome(ctx, workMount); err != nil {
|
|
return workDiskPreparation{}, err
|
|
}
|
|
return workDiskPreparation{}, nil
|
|
}
|
|
|
|
// sshdGuestConfig is the banger-authored drop-in that lands at
|
|
// /etc/ssh/sshd_config.d/99-banger.conf inside every guest.
|
|
//
|
|
// Banger VMs are single-user root sandboxes reachable only through the
|
|
// host bridge (default 172.16.0.0/24). The drop-in sets the minimum
|
|
// needed to make that usable while keeping the posture tight enough
|
|
// that a misconfigured host bridge does not immediately hand over an
|
|
// unauthenticated root shell.
|
|
//
|
|
// Why each line is here:
|
|
//
|
|
// - PermitRootLogin prohibit-password
|
|
// The guest IS root — there's no other account. prohibit-password
|
|
// allows pubkey login and blocks password auth at the source even
|
|
// if some future config flips PasswordAuthentication on.
|
|
//
|
|
// - PubkeyAuthentication yes
|
|
// The only auth method we expect. Explicit in case a future
|
|
// Debian default or distro package flips it off.
|
|
//
|
|
// - PasswordAuthentication no
|
|
//
|
|
// - KbdInteractiveAuthentication no
|
|
// Belt-and-braces: every interactive auth path is off, not just
|
|
// the PermitRootLogin path. These are already Debian defaults but
|
|
// stating them here means the drop-in documents the intent.
|
|
//
|
|
// - AuthorizedKeysFile /root/.ssh/authorized_keys
|
|
// Pins the lookup path so the banger-written file always wins,
|
|
// regardless of distro default ($HOME/.ssh/authorized_keys) and
|
|
// regardless of any per-image weirdness.
|
|
//
|
|
// Previously this file also contained `LogLevel DEBUG3` and
|
|
// `StrictModes no`. DEBUG3 was a leftover from debugging the
|
|
// first-boot flow and flooded journald in normal use. StrictModes no
|
|
// was a workaround for perm drift on /root inside the work disk; the
|
|
// real fix — normalising /root permissions at provisioning time — is
|
|
// in ensureAuthorizedKeyOnWorkDisk / seedAuthorizedKeyOnExt4Image.
|
|
func sshdGuestConfig() string {
|
|
return strings.Join([]string{
|
|
"PermitRootLogin prohibit-password",
|
|
"PubkeyAuthentication yes",
|
|
"PasswordAuthentication no",
|
|
"KbdInteractiveAuthentication no",
|
|
"AuthorizedKeysFile /root/.ssh/authorized_keys",
|
|
"",
|
|
}, "\n")
|
|
}
|
|
|
|
func (d *Daemon) flattenNestedWorkHome(ctx context.Context, workMount string) error {
|
|
nestedHome := filepath.Join(workMount, "root")
|
|
if !exists(nestedHome) {
|
|
return nil
|
|
}
|
|
if _, err := d.runner.RunSudo(ctx, "chmod", "755", nestedHome); err != nil {
|
|
return err
|
|
}
|
|
entries, err := os.ReadDir(nestedHome)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
for _, entry := range entries {
|
|
sourcePath := filepath.Join(nestedHome, entry.Name())
|
|
if _, err := d.runner.RunSudo(ctx, "cp", "-a", sourcePath, workMount+"/"); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
_, err = d.runner.RunSudo(ctx, "rm", "-rf", nestedHome)
|
|
return err
|
|
}
|