daemon: split owner daemon from root helper

Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.

Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.

Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.

Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
This commit is contained in:
Thales Maciel 2026-04-26 12:43:17 -03:00
parent 3edd7c6de7
commit 59e48e830b
No known key found for this signature in database
GPG key ID: 33112E6833C34679
53 changed files with 3239 additions and 726 deletions

View file

@ -2,16 +2,34 @@
This document describes the current daemon package layout: the `Daemon`
composition root, the four services it wires together, the subpackages
that own stateless helpers, and the lock ordering every caller must
that own stateless helpers, the privileged-ops seam used by the
supported system install, and the lock ordering every caller must
respect.
## Supported service topology
On the supported host path (`banger system install` on a `systemd`
host), banger runs as two cooperating services:
- `bangerd.service` runs as the configured owner user. It owns the
public RPC socket, store, image state, workspace prep, and the
lifecycle state machine.
- `bangerd-root.service` runs as root. It owns only the privileged
host-kernel operations: bridge/tap, NAT/resolver routing, dm/loop
snapshot plumbing, privileged ext4 mutation on dm devices, and
firecracker process/socket ownership.
The owner daemon talks to the root helper through the `privilegedOps`
seam. Non-system/dev paths still use the same seam, but it is backed
by an in-process adapter instead of the helper RPC client.
## Composition
`Daemon` is a thin composition root. It holds shared infrastructure
(store, runner, logger, layout, config, listener) plus pointers to
four focused services. RPC dispatch is a pure forwarder into those
services; no lifecycle / image / workspace / networking behaviour
lives on `*Daemon` itself.
(store, runner, logger, layout, config, listener, privileged-ops
adapter) plus pointers to four focused services. RPC dispatch is a
pure forwarder into those services; no lifecycle / image / workspace /
networking behaviour lives on `*Daemon` itself.
```
Daemon
@ -62,6 +80,9 @@ idempotent and skips anything already set.
- `tapPool` — TAP interface pool, owns its own lock.
- `vmDNS *vmdns.Server` — in-process DNS server for `.vm` names.
- `privilegedOps` — the host-kernel seam used for bridge/tap/NAT,
resolver routing, dm snapshots, privileged ext4 mutation, and
firecracker ownership/kill flows.
- No direct VM-state access. Where an operation needs a VM's tap name
(e.g. `ensureNAT`), the signature takes `guestIP` + `tap` string so
the caller (VMService) resolves them first.
@ -176,13 +197,17 @@ Notes:
rehydrates the handle cache, reaps stale VMs, and republishes DNS
records. `Daemon.backgroundLoop()` is the ticker fan-out —
`VMService.pollStats`, `VMService.stopStaleVMs`, and
`VMService.pruneVMCreateOperations` run on independent tickers.
`VMService.pruneVMCreateOperations` run on independent tickers. On the
supported system path, any reconcile-time host cleanup that needs
privilege goes through `privilegedOps`, not directly through the owner
daemon process.
## External API
Only `internal/cli` imports this package. The surface is:
- `daemon.Open(ctx) (*Daemon, error)`
- `daemon.OpenSystem(ctx) (*Daemon, error)`
- `(*Daemon).Serve(ctx) error`
- `(*Daemon).Close() error`
- `daemon.Doctor(...)` — host diagnostics (no receiver).

View file

@ -14,8 +14,10 @@ import (
"banger/internal/config"
ws "banger/internal/daemon/workspace"
"banger/internal/installmeta"
"banger/internal/model"
"banger/internal/paths"
"banger/internal/roothelper"
"banger/internal/rpc"
"banger/internal/store"
"banger/internal/system"
@ -28,11 +30,13 @@ import (
// loop forwards RPCs to them. No lifecycle / image / workspace /
// networking behavior lives on *Daemon itself — it's wiring.
type Daemon struct {
layout paths.Layout
config model.DaemonConfig
store *store.Store
runner system.CommandRunner
logger *slog.Logger
layout paths.Layout
userLayout paths.Layout
config model.DaemonConfig
store *store.Store
runner system.CommandRunner
logger *slog.Logger
priv privilegedOps
net *HostNetwork
img *ImageService
@ -48,6 +52,8 @@ type Daemon struct {
requestHandler func(context.Context, rpc.Request) rpc.Response
guestWaitForSSH func(context.Context, string, string, time.Duration) error
guestDial func(context.Context, string, string) (guestSSHClient, error)
clientUID int
clientGID int
}
func Open(ctx context.Context) (d *Daemon, err error) {
@ -62,6 +68,31 @@ func Open(ctx context.Context) (d *Daemon, err error) {
if err != nil {
return nil, err
}
return openWithConfig(ctx, layout, layout, cfg, os.Getuid(), os.Getgid(), true, nil)
}
func OpenSystem(ctx context.Context) (*Daemon, error) {
meta, err := installmeta.Load(installmeta.DefaultPath)
if err != nil {
return nil, err
}
layout := paths.ResolveSystem()
if err := paths.EnsureSystemOwned(layout); err != nil {
return nil, err
}
ownerLayout, err := paths.ResolveUserForHome(meta.OwnerHome)
if err != nil {
return nil, err
}
cfg, err := config.LoadDaemon(ownerLayout, meta.OwnerHome)
if err != nil {
return nil, err
}
helper := newHelperPrivilegedOps(roothelper.NewClient(installmeta.DefaultRootHelperSocketPath), cfg, layout)
return openWithConfig(ctx, layout, ownerLayout, cfg, -1, -1, false, helper)
}
func openWithConfig(ctx context.Context, layout, userLayout paths.Layout, cfg model.DaemonConfig, clientUID, clientGID int, syncSSHConfig bool, priv privilegedOps) (d *Daemon, err error) {
logger, normalizedLevel, err := newDaemonLogger(os.Stderr, cfg.LogLevel)
if err != nil {
return nil, err
@ -74,13 +105,17 @@ func Open(ctx context.Context) (d *Daemon, err error) {
closing := make(chan struct{})
runner := system.NewRunner()
d = &Daemon{
layout: layout,
config: cfg,
store: db,
runner: runner,
logger: logger,
closing: closing,
pid: os.Getpid(),
layout: layout,
userLayout: userLayout,
config: cfg,
store: db,
runner: runner,
logger: logger,
closing: closing,
pid: os.Getpid(),
clientUID: clientUID,
clientGID: clientGID,
priv: priv,
}
wireServices(d)
// From here on, every failure path must run Close() so the host
@ -95,7 +130,9 @@ func Open(ctx context.Context) (d *Daemon, err error) {
}
}()
d.ensureVMSSHClientConfig()
if syncSSHConfig {
d.ensureVMSSHClientConfig()
}
d.logger.Info("daemon opened", "socket", layout.SocketPath, "state_dir", layout.StateDir, "log_level", cfg.LogLevel)
if err = d.net.startVMDNS(vmdns.DefaultListenAddr); err != nil {
d.logger.Error("daemon open failed", "stage", "start_vm_dns", "error", err.Error())
@ -157,9 +194,28 @@ func (d *Daemon) Serve(ctx context.Context) error {
d.listener = listener
defer listener.Close()
defer os.Remove(d.layout.SocketPath)
serveDone := make(chan struct{})
defer close(serveDone)
go func() {
select {
case <-ctx.Done():
_ = listener.Close()
case <-d.closing:
case <-serveDone:
}
}()
// Tighten the socket mode while root still owns it, then hand it to
// the configured client uid/gid. In the hardened systemd unit we keep
// CAP_CHOWN but intentionally do not keep the broader file-ownership
// capability set that would be needed to chmod after chown.
if err := os.Chmod(d.layout.SocketPath, 0o600); err != nil {
return err
}
if d.clientUID >= 0 && d.clientGID >= 0 {
if err := os.Chown(d.layout.SocketPath, d.clientUID, d.clientGID); err != nil {
return err
}
}
if d.logger != nil {
d.logger.Info("daemon serving", "socket", d.layout.SocketPath, "pid", d.pid)
}
@ -366,6 +422,13 @@ func (d *Daemon) TouchVM(ctx context.Context, idOrName string) (model.VMRecord,
// the ws↔vm construction order doesn't recurse: the closures read d.vm
// at call time, by which point it is populated.
func wireServices(d *Daemon) {
if d.priv == nil {
clientUID, clientGID := d.clientUID, d.clientGID
if clientUID == 0 && clientGID == 0 {
clientUID, clientGID = -1, -1
}
d.priv = newLocalPrivilegedOps(d.runner, d.logger, d.config, d.layout, clientUID, clientGID)
}
if d.net == nil {
d.net = newHostNetwork(hostNetworkDeps{
runner: d.runner,
@ -373,6 +436,7 @@ func wireServices(d *Daemon) {
config: d.config,
layout: d.layout,
closing: d.closing,
priv: d.priv,
})
}
if d.img == nil {
@ -425,6 +489,7 @@ func wireServices(d *Daemon) {
net: d.net,
img: d.img,
ws: d.ws,
priv: d.priv,
capHooks: d.buildCapabilityHooks(),
beginOperation: d.beginOperation,
vsockHostDevice: defaultVsockHostDevice,

View file

@ -3,10 +3,16 @@ package daemon
import (
"context"
"encoding/json"
"errors"
"io"
"log/slog"
"net"
"os"
"path/filepath"
"strings"
"syscall"
"testing"
"time"
"banger/internal/api"
"banger/internal/buildinfo"
@ -56,6 +62,75 @@ func TestDispatchPingIncludesBuildInfo(t *testing.T) {
}
}
func TestServeReturnsOnContextCancel(t *testing.T) {
dir := t.TempDir()
runtimeDir := filepath.Join(dir, "runtime")
if err := os.MkdirAll(runtimeDir, 0o755); err != nil {
t.Fatalf("MkdirAll runtime: %v", err)
}
socketPath := filepath.Join(runtimeDir, "bangerd.sock")
probe, err := net.Listen("unix", filepath.Join(runtimeDir, "probe.sock"))
if err != nil {
if errors.Is(err, syscall.EPERM) || strings.Contains(err.Error(), "operation not permitted") {
t.Skipf("unix socket listen blocked in this environment: %v", err)
}
t.Fatalf("probe listen: %v", err)
}
_ = probe.Close()
_ = os.Remove(filepath.Join(runtimeDir, "probe.sock"))
d := &Daemon{
layout: paths.Layout{
RuntimeDir: runtimeDir,
SocketPath: socketPath,
},
config: model.DaemonConfig{
StatsPollInterval: time.Hour,
},
store: openDaemonStore(t),
runner: system.NewRunner(),
logger: slog.New(slog.NewTextHandler(io.Discard, nil)),
closing: make(chan struct{}),
clientUID: -1,
clientGID: -1,
}
wireServices(d)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
serveErr := make(chan error, 1)
go func() {
serveErr <- d.Serve(ctx)
}()
deadline := time.Now().Add(2 * time.Second)
for {
if _, err := os.Stat(socketPath); err == nil {
break
}
select {
case err := <-serveErr:
t.Fatalf("Serve() returned before socket was ready: %v", err)
default:
}
if time.Now().After(deadline) {
t.Fatalf("socket %s not created before deadline", socketPath)
}
time.Sleep(25 * time.Millisecond)
}
cancel()
select {
case err := <-serveErr:
if err != nil {
t.Fatalf("Serve() error = %v, want nil on context cancel", err)
}
case <-time.After(2 * time.Second):
t.Fatal("Serve() did not return after context cancel")
}
}
func TestPromoteImageCopiesBootArtifactsIntoArtifactDir(t *testing.T) {
dir := t.TempDir()
rootfs := filepath.Join(dir, "rootfs.ext4")

View file

@ -24,14 +24,7 @@ func (n *HostNetwork) syncVMDNSResolverRouting(ctx context.Context) error {
if serverAddr == "" {
return nil
}
if _, err := n.runner.RunSudo(ctx, "resolvectl", "dns", n.config.BridgeName, serverAddr); err != nil {
return err
}
if _, err := n.runner.RunSudo(ctx, "resolvectl", "domain", n.config.BridgeName, vmResolverRouteDomain); err != nil {
return err
}
_, err := n.runner.RunSudo(ctx, "resolvectl", "default-route", n.config.BridgeName, "no")
return err
return n.privOps().SyncResolverRouting(ctx, serverAddr)
}
func (n *HostNetwork) clearVMDNSResolverRouting(ctx context.Context) error {
@ -44,8 +37,7 @@ func (n *HostNetwork) clearVMDNSResolverRouting(ctx context.Context) error {
if _, err := n.runner.Run(ctx, "ip", "link", "show", n.config.BridgeName); err != nil {
return nil
}
_, err := n.runner.RunSudo(ctx, "resolvectl", "revert", n.config.BridgeName)
return err
return n.privOps().ClearResolverRouting(ctx)
}
func (n *HostNetwork) ensureVMDNSResolverRouting(ctx context.Context) {

View file

@ -1,9 +1,16 @@
// Package daemon hosts the Banger daemon process.
// Package daemon hosts the Banger owner-daemon process.
//
// The daemon exposes a JSON-RPC endpoint over a Unix socket. The
// *Daemon type is a thin composition root: it holds shared
// infrastructure (store, runner, logger, layout, config, listener)
// plus pointers to four focused services and forwards RPCs to them.
// infrastructure (store, runner, logger, layout, config, listener,
// privileged-ops adapter) plus pointers to four focused services and
// forwards RPCs to them.
//
// On the supported systemd install path, this package runs inside
// `bangerd.service` as the configured owner user and delegates
// privileged host-kernel operations to `bangerd-root.service` through
// the privileged-ops seam. Non-system/dev paths use the same seam with
// an in-process adapter instead.
//
// Services:
//

View file

@ -16,14 +16,15 @@ import (
)
func Doctor(ctx context.Context) (system.Report, error) {
layout, err := paths.Resolve()
userLayout, err := paths.Resolve()
if err != nil {
return system.Report{}, err
}
cfg, err := config.Load(layout)
cfg, err := config.Load(userLayout)
if err != nil {
return system.Report{}, err
}
layout := paths.ResolveSystem()
// Doctor must be read-only: running it should never mutate the
// state DB (no migrations, no WAL checkpoint, no pragma writes).
// Skip OpenReadOnly entirely when the DB file doesn't exist —
@ -32,9 +33,10 @@ func Doctor(ctx context.Context) (system.Report, error) {
// "no DB yet" (pass) from "DB present but unreadable" (fail) in
// the report.
d := &Daemon{
layout: layout,
config: cfg,
runner: system.NewRunner(),
layout: layout,
userLayout: userLayout,
config: cfg,
runner: system.NewRunner(),
}
var storeErr error
storeMissing := false
@ -90,7 +92,7 @@ func (d *Daemon) doctorReport(ctx context.Context, storeErr error, storeMissing
// This is intentionally a warn, not a fail — the shortcut is opt-in
// convenience and `banger vm ssh` works either way.
func (d *Daemon) addSSHShortcutCheck(report *system.Report) {
bangerConfig := BangerSSHConfigPath(d.layout)
bangerConfig := BangerSSHConfigPath(d.userLayout)
if strings.TrimSpace(bangerConfig) == "" {
return
}

View file

@ -73,19 +73,29 @@ func (m *Manager) EnsureBridge(ctx context.Context) error {
// vsock sockets all live inside, so it must be readable only by the
// invoking user.
func (m *Manager) EnsureSocketDir() error {
if err := os.MkdirAll(m.cfg.RuntimeDir, 0o700); err != nil {
mode := os.FileMode(0o700)
if os.Geteuid() == 0 {
mode = 0o711
}
if err := os.MkdirAll(m.cfg.RuntimeDir, mode); err != nil {
return err
}
return os.Chmod(m.cfg.RuntimeDir, 0o700)
return os.Chmod(m.cfg.RuntimeDir, mode)
}
// CreateTap (re)creates a TAP owned by the current uid/gid, attaches it to
// the bridge, and brings both up.
func (m *Manager) CreateTap(ctx context.Context, tap string) error {
return m.CreateTapOwned(ctx, tap, os.Getuid(), os.Getgid())
}
// CreateTapOwned (re)creates a TAP owned by uid:gid, attaches it to the
// bridge, and brings both up.
func (m *Manager) CreateTapOwned(ctx context.Context, tap string, uid, gid int) error {
if _, err := m.runner.Run(ctx, "ip", "link", "show", tap); err == nil {
_, _ = m.runner.RunSudo(ctx, "ip", "link", "del", tap)
}
if _, err := m.runner.RunSudo(ctx, "ip", "tuntap", "add", "dev", tap, "mode", "tap", "user", strconv.Itoa(os.Getuid()), "group", strconv.Itoa(os.Getgid())); err != nil {
if _, err := m.runner.RunSudo(ctx, "ip", "tuntap", "add", "dev", tap, "mode", "tap", "user", strconv.Itoa(uid), "group", strconv.Itoa(gid)); err != nil {
return err
}
if _, err := m.runner.RunSudo(ctx, "ip", "link", "set", tap, "master", m.cfg.BridgeName); err != nil {
@ -121,13 +131,26 @@ func (m *Manager) ResolveBinary() (string, error) {
// EnsureSocketAccess waits for the socket to appear then chowns/chmods it to
// the current uid/gid, mode 0600.
func (m *Manager) EnsureSocketAccess(ctx context.Context, socketPath, label string) error {
return m.EnsureSocketAccessFor(ctx, socketPath, label, os.Getuid(), os.Getgid())
}
// EnsureSocketAccessFor waits for the socket to appear then chowns/chmods it
// to uid:gid, mode 0600.
func (m *Manager) EnsureSocketAccessFor(ctx context.Context, socketPath, label string, uid, gid int) error {
if err := waitForPath(ctx, socketPath, 5*time.Second, label); err != nil {
return err
}
if _, err := m.runner.RunSudo(ctx, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), socketPath); err != nil {
if os.Geteuid() == 0 {
if _, err := m.runner.Run(ctx, "chmod", "600", socketPath); err != nil {
return err
}
_, err := m.runner.Run(ctx, "chown", fmt.Sprintf("%d:%d", uid, gid), socketPath)
return err
}
_, err := m.runner.RunSudo(ctx, "chmod", "600", socketPath)
if _, err := m.runner.RunSudo(ctx, "chmod", "600", socketPath); err != nil {
return err
}
_, err := m.runner.RunSudo(ctx, "chown", fmt.Sprintf("%d:%d", uid, gid), socketPath)
return err
}

View file

@ -107,37 +107,10 @@ func TestWaitForPathRespectsContextCancellation(t *testing.T) {
}
}
// TestEnsureSocketAccessChownFailureBubbles verifies a sudo chown
// error surfaces untouched. The daemon's cleanup path relies on
// this — if chown fails, the socket is still root-owned and can't
// be used by the invoking user, so we absolutely must not pretend
// success.
func TestEnsureSocketAccessChownFailureBubbles(t *testing.T) {
socketPath := filepath.Join(t.TempDir(), "present.sock")
if err := os.WriteFile(socketPath, []byte{}, 0o600); err != nil {
t.Fatalf("WriteFile: %v", err)
}
chownErr := errors.New("sudo chown failed")
runner := &scriptedRunner{
t: t,
sudos: []scriptedCall{{err: chownErr}},
}
mgr := New(runner, Config{}, slog.Default())
err := mgr.EnsureSocketAccess(context.Background(), socketPath, "api socket")
if !errors.Is(err, chownErr) {
t.Fatalf("err = %v, want chown error", err)
}
// chmod must not have been attempted.
if len(runner.sudos) != 0 {
t.Fatalf("chmod was attempted after chown failed: %d sudo calls left", len(runner.sudos))
}
}
// TestEnsureSocketAccessChmodFailureBubbles verifies the chmod step
// (the belt-and-braces tighten to 0600 after chown) also surfaces
// errors cleanly.
// fails fast before any ownership handoff. Once chown runs, the
// bounded helper no longer owns the socket and can't tighten its mode
// without CAP_FOWNER, so the order matters.
func TestEnsureSocketAccessChmodFailureBubbles(t *testing.T) {
socketPath := filepath.Join(t.TempDir(), "present.sock")
if err := os.WriteFile(socketPath, []byte{}, 0o600); err != nil {
@ -146,11 +119,8 @@ func TestEnsureSocketAccessChmodFailureBubbles(t *testing.T) {
chmodErr := errors.New("sudo chmod failed")
runner := &scriptedRunner{
t: t,
sudos: []scriptedCall{
{}, // chown succeeds
{err: chmodErr}, // chmod fails
},
t: t,
sudos: []scriptedCall{{err: chmodErr}},
}
mgr := New(runner, Config{}, slog.Default())
@ -158,6 +128,34 @@ func TestEnsureSocketAccessChmodFailureBubbles(t *testing.T) {
if !errors.Is(err, chmodErr) {
t.Fatalf("err = %v, want chmod error", err)
}
// chown must not have been attempted.
if len(runner.sudos) != 0 {
t.Fatalf("chown was attempted after chmod failed: %d sudo calls left", len(runner.sudos))
}
}
// TestEnsureSocketAccessChownFailureBubbles verifies the ownership
// handoff still surfaces errors after chmod succeeds.
func TestEnsureSocketAccessChownFailureBubbles(t *testing.T) {
socketPath := filepath.Join(t.TempDir(), "present.sock")
if err := os.WriteFile(socketPath, []byte{}, 0o600); err != nil {
t.Fatalf("WriteFile: %v", err)
}
chownErr := errors.New("sudo chown failed")
runner := &scriptedRunner{
t: t,
sudos: []scriptedCall{
{}, // chmod succeeds
{err: chownErr}, // chown fails
},
}
mgr := New(runner, Config{}, slog.Default())
err := mgr.EnsureSocketAccess(context.Background(), socketPath, "api socket")
if !errors.Is(err, chownErr) {
t.Fatalf("err = %v, want chown error", err)
}
}
// TestEnsureSocketAccessTimesOutBeforeTouchingRunner pins the

View file

@ -38,6 +38,7 @@ type HostNetwork struct {
config model.DaemonConfig
layout paths.Layout
closing chan struct{}
priv privilegedOps
tapPool tapPool
vmDNS *vmdns.Server
@ -58,6 +59,7 @@ type hostNetworkDeps struct {
config model.DaemonConfig
layout paths.Layout
closing chan struct{}
priv privilegedOps
}
func newHostNetwork(deps hostNetworkDeps) *HostNetwork {
@ -67,6 +69,7 @@ func newHostNetwork(deps hostNetworkDeps) *HostNetwork {
config: deps.config,
layout: deps.layout,
closing: deps.closing,
priv: deps.priv,
lookupExecutable: system.LookupExecutable,
vmDNSAddr: func(server *vmdns.Server) string { return server.Addr() },
}
@ -140,7 +143,7 @@ func (n *HostNetwork) fc() *fcproc.Manager {
}
func (n *HostNetwork) ensureBridge(ctx context.Context) error {
return n.fc().EnsureBridge(ctx)
return n.privOps().EnsureBridge(ctx)
}
func (n *HostNetwork) ensureSocketDir() error {
@ -148,19 +151,19 @@ func (n *HostNetwork) ensureSocketDir() error {
}
func (n *HostNetwork) createTap(ctx context.Context, tap string) error {
return n.fc().CreateTap(ctx, tap)
return n.privOps().CreateTap(ctx, tap)
}
func (n *HostNetwork) firecrackerBinary() (string, error) {
return n.fc().ResolveBinary()
func (n *HostNetwork) firecrackerBinary(ctx context.Context) (string, error) {
return n.privOps().ResolveFirecrackerBinary(ctx, n.config.FirecrackerBin)
}
func (n *HostNetwork) ensureSocketAccess(ctx context.Context, socketPath, label string) error {
return n.fc().EnsureSocketAccess(ctx, socketPath, label)
return n.privOps().EnsureSocketAccess(ctx, socketPath, label)
}
func (n *HostNetwork) findFirecrackerPID(ctx context.Context, apiSock string) (int, error) {
return n.fc().FindPID(ctx, apiSock)
return n.privOps().FindFirecrackerPID(ctx, apiSock)
}
func (n *HostNetwork) resolveFirecrackerPID(ctx context.Context, machine *firecracker.Machine, apiSock string) int {
@ -168,15 +171,35 @@ func (n *HostNetwork) resolveFirecrackerPID(ctx context.Context, machine *firecr
}
func (n *HostNetwork) sendCtrlAltDel(ctx context.Context, apiSockPath string) error {
return n.fc().SendCtrlAltDel(ctx, apiSockPath)
if err := n.ensureSocketAccess(ctx, apiSockPath, "firecracker api socket"); err != nil {
return err
}
return firecracker.New(apiSockPath, n.logger).SendCtrlAltDel(ctx)
}
func (n *HostNetwork) waitForExit(ctx context.Context, pid int, apiSock string, timeout time.Duration) error {
return n.fc().WaitForExit(ctx, pid, apiSock, timeout)
deadline := time.Now().Add(timeout)
for {
running, err := n.privOps().ProcessRunning(ctx, pid, apiSock)
if err != nil {
return err
}
if !running {
return nil
}
if time.Now().After(deadline) {
return errWaitForExitTimeout
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(100 * time.Millisecond):
}
}
}
func (n *HostNetwork) killVMProcess(ctx context.Context, pid int) error {
return n.fc().Kill(ctx, pid)
return n.privOps().KillProcess(ctx, pid)
}
// waitForGuestVSockAgent is a HostNetwork helper because it's

View file

@ -15,7 +15,7 @@ type natRule = hostnat.Rule
// Callers (vm_lifecycle) resolve the tap device from the handle cache
// themselves and pass it in.
func (n *HostNetwork) ensureNAT(ctx context.Context, guestIP, tap string, enable bool) error {
return hostnat.Ensure(ctx, n.runner, guestIP, tap, enable)
return n.privOps().EnsureNAT(ctx, guestIP, tap, enable)
}
func (n *HostNetwork) validateNATPrereqs(ctx context.Context) (string, error) {

View file

@ -45,6 +45,7 @@ func TestCloseOnPartiallyInitialisedDaemon(t *testing.T) {
build: func(t *testing.T) *Daemon {
server, err := vmdns.New("127.0.0.1:0", nil)
if err != nil {
skipIfSocketRestricted(t, err)
t.Fatalf("vmdns.New: %v", err)
}
return &Daemon{

View file

@ -46,7 +46,7 @@ func (s *VMService) addBaseStartPrereqs(checks *system.Preflight, image model.Im
}
func (s *VMService) addBaseStartCommandPrereqs(checks *system.Preflight) {
for _, command := range []string{"sudo", "ip", "dmsetup", "losetup", "blockdev", "truncate", "pgrep", "chown", "chmod", "kill", "e2cp", "e2rm", "debugfs"} {
for _, command := range []string{"ip", "dmsetup", "losetup", "blockdev", "truncate", "pgrep", "chown", "chmod", "kill", "e2cp", "e2rm", "debugfs"} {
checks.RequireCommand(command, toolHint(command))
}
}
@ -69,8 +69,6 @@ func toolHint(command string) string {
return "install e2fsprogs"
case "e2cp", "e2rm":
return "install e2tools"
case "sudo":
return "install sudo"
default:
return ""
}

View file

@ -0,0 +1,354 @@
package daemon
import (
"context"
"errors"
"log/slog"
"os"
"strconv"
"strings"
"syscall"
"banger/internal/daemon/dmsnap"
"banger/internal/daemon/fcproc"
"banger/internal/firecracker"
"banger/internal/hostnat"
"banger/internal/model"
"banger/internal/paths"
"banger/internal/roothelper"
"banger/internal/system"
)
type privilegedOps interface {
EnsureBridge(context.Context) error
CreateTap(context.Context, string) error
DeleteTap(context.Context, string) error
SyncResolverRouting(context.Context, string) error
ClearResolverRouting(context.Context) error
EnsureNAT(context.Context, string, string, bool) error
CreateDMSnapshot(context.Context, string, string, string) (dmSnapshotHandles, error)
CleanupDMSnapshot(context.Context, dmSnapshotHandles) error
RemoveDMSnapshot(context.Context, string) error
FsckSnapshot(context.Context, string) error
ReadExt4File(context.Context, string, string) ([]byte, error)
WriteExt4Files(context.Context, string, []roothelper.Ext4Write) error
ResolveFirecrackerBinary(context.Context, string) (string, error)
LaunchFirecracker(context.Context, roothelper.FirecrackerLaunchRequest) (int, error)
EnsureSocketAccess(context.Context, string, string) error
FindFirecrackerPID(context.Context, string) (int, error)
KillProcess(context.Context, int) error
SignalProcess(context.Context, int, string) error
ProcessRunning(context.Context, int, string) (bool, error)
}
type localPrivilegedOps struct {
runner system.CommandRunner
logger *slog.Logger
config model.DaemonConfig
layout paths.Layout
clientUID int
clientGID int
}
func (n *HostNetwork) privOps() privilegedOps {
if n.priv == nil {
n.priv = newLocalPrivilegedOps(n.runner, n.logger, n.config, n.layout, os.Getuid(), os.Getgid())
}
return n.priv
}
func (s *VMService) privOps() privilegedOps {
if s.priv == nil {
s.priv = newLocalPrivilegedOps(s.runner, s.logger, s.config, s.layout, os.Getuid(), os.Getgid())
}
return s.priv
}
func newLocalPrivilegedOps(runner system.CommandRunner, logger *slog.Logger, cfg model.DaemonConfig, layout paths.Layout, clientUID, clientGID int) privilegedOps {
if clientUID < 0 {
clientUID = os.Getuid()
}
if clientGID < 0 {
clientGID = os.Getgid()
}
return &localPrivilegedOps{
runner: runner,
logger: logger,
config: cfg,
layout: layout,
clientUID: clientUID,
clientGID: clientGID,
}
}
func (o *localPrivilegedOps) EnsureBridge(ctx context.Context) error {
return o.fc().EnsureBridge(ctx)
}
func (o *localPrivilegedOps) CreateTap(ctx context.Context, tapName string) error {
return o.fc().CreateTapOwned(ctx, tapName, o.clientUID, o.clientGID)
}
func (o *localPrivilegedOps) DeleteTap(ctx context.Context, tapName string) error {
_, err := o.runner.RunSudo(ctx, "ip", "link", "del", tapName)
return err
}
func (o *localPrivilegedOps) SyncResolverRouting(ctx context.Context, serverAddr string) error {
if strings.TrimSpace(o.config.BridgeName) == "" || strings.TrimSpace(serverAddr) == "" {
return nil
}
if _, err := system.LookupExecutable("resolvectl"); err != nil {
return nil
}
if _, err := o.runner.RunSudo(ctx, "resolvectl", "dns", o.config.BridgeName, serverAddr); err != nil {
return err
}
if _, err := o.runner.RunSudo(ctx, "resolvectl", "domain", o.config.BridgeName, vmResolverRouteDomain); err != nil {
return err
}
_, err := o.runner.RunSudo(ctx, "resolvectl", "default-route", o.config.BridgeName, "no")
return err
}
func (o *localPrivilegedOps) ClearResolverRouting(ctx context.Context) error {
if strings.TrimSpace(o.config.BridgeName) == "" {
return nil
}
if _, err := system.LookupExecutable("resolvectl"); err != nil {
return nil
}
_, err := o.runner.RunSudo(ctx, "resolvectl", "revert", o.config.BridgeName)
return err
}
func (o *localPrivilegedOps) EnsureNAT(ctx context.Context, guestIP, tap string, enable bool) error {
return hostnat.Ensure(ctx, o.runner, guestIP, tap, enable)
}
func (o *localPrivilegedOps) CreateDMSnapshot(ctx context.Context, rootfsPath, cowPath, dmName string) (dmSnapshotHandles, error) {
return dmsnap.Create(ctx, o.runner, rootfsPath, cowPath, dmName)
}
func (o *localPrivilegedOps) CleanupDMSnapshot(ctx context.Context, handles dmSnapshotHandles) error {
return dmsnap.Cleanup(ctx, o.runner, handles)
}
func (o *localPrivilegedOps) RemoveDMSnapshot(ctx context.Context, target string) error {
return dmsnap.Remove(ctx, o.runner, target)
}
func (o *localPrivilegedOps) FsckSnapshot(ctx context.Context, dmDev string) error {
if _, err := o.runner.RunSudo(ctx, "e2fsck", "-fy", dmDev); err != nil {
if code := system.ExitCode(err); code < 0 || code > 1 {
return err
}
}
return nil
}
func (o *localPrivilegedOps) ReadExt4File(ctx context.Context, imagePath, guestPath string) ([]byte, error) {
return system.ReadExt4File(ctx, o.runner, imagePath, guestPath)
}
func (o *localPrivilegedOps) WriteExt4Files(ctx context.Context, imagePath string, files []roothelper.Ext4Write) error {
for _, file := range files {
mode := os.FileMode(file.Mode)
if mode == 0 {
mode = 0o644
}
if err := system.WriteExt4FileOwned(ctx, o.runner, imagePath, file.GuestPath, mode, 0, 0, file.Data); err != nil {
return err
}
}
return nil
}
func (o *localPrivilegedOps) ResolveFirecrackerBinary(_ context.Context, requested string) (string, error) {
manager := fcproc.New(o.runner, fcproc.Config{FirecrackerBin: normalizeFirecrackerBinary(requested, o.config.FirecrackerBin)}, o.logger)
return manager.ResolveBinary()
}
func (o *localPrivilegedOps) LaunchFirecracker(ctx context.Context, req roothelper.FirecrackerLaunchRequest) (int, error) {
machine, err := firecracker.NewMachine(ctx, firecracker.MachineConfig{
BinaryPath: req.BinaryPath,
VMID: req.VMID,
SocketPath: req.SocketPath,
LogPath: req.LogPath,
MetricsPath: req.MetricsPath,
KernelImagePath: req.KernelImagePath,
InitrdPath: req.InitrdPath,
KernelArgs: req.KernelArgs,
Drives: req.Drives,
TapDevice: req.TapDevice,
VSockPath: req.VSockPath,
VSockCID: req.VSockCID,
VCPUCount: req.VCPUCount,
MemoryMiB: req.MemoryMiB,
Logger: o.logger,
})
if err != nil {
return 0, err
}
if err := machine.Start(ctx); err != nil {
if pid := o.fc().ResolvePID(context.Background(), machine, req.SocketPath); pid > 0 {
_ = o.KillProcess(context.Background(), pid)
}
return 0, err
}
if err := o.EnsureSocketAccess(ctx, req.SocketPath, "firecracker api socket"); err != nil {
return 0, err
}
if strings.TrimSpace(req.VSockPath) != "" {
if err := o.EnsureSocketAccess(ctx, req.VSockPath, "firecracker vsock socket"); err != nil {
return 0, err
}
}
pid := o.fc().ResolvePID(context.Background(), machine, req.SocketPath)
if pid <= 0 {
return 0, errors.New("firecracker started but pid could not be resolved")
}
return pid, nil
}
func (o *localPrivilegedOps) EnsureSocketAccess(ctx context.Context, socketPath, label string) error {
return o.fc().EnsureSocketAccessFor(ctx, socketPath, label, o.clientUID, o.clientGID)
}
func (o *localPrivilegedOps) FindFirecrackerPID(ctx context.Context, apiSock string) (int, error) {
return o.fc().FindPID(ctx, apiSock)
}
func (o *localPrivilegedOps) KillProcess(ctx context.Context, pid int) error {
return o.fc().Kill(ctx, pid)
}
func (o *localPrivilegedOps) SignalProcess(ctx context.Context, pid int, signal string) error {
if strings.TrimSpace(signal) == "" {
signal = "TERM"
}
_, err := o.runner.RunSudo(ctx, "kill", "-"+signal, strconv.Itoa(pid))
return err
}
func (o *localPrivilegedOps) ProcessRunning(_ context.Context, pid int, apiSock string) (bool, error) {
return system.ProcessRunning(pid, apiSock), nil
}
func (o *localPrivilegedOps) fc() *fcproc.Manager {
return fcproc.New(o.runner, fcproc.Config{
FirecrackerBin: normalizeFirecrackerBinary("", o.config.FirecrackerBin),
BridgeName: o.config.BridgeName,
BridgeIP: o.config.BridgeIP,
CIDR: o.config.CIDR,
RuntimeDir: o.layout.RuntimeDir,
}, o.logger)
}
type helperPrivilegedOps struct {
client *roothelper.Client
config model.DaemonConfig
layout paths.Layout
}
func newHelperPrivilegedOps(client *roothelper.Client, cfg model.DaemonConfig, layout paths.Layout) privilegedOps {
return &helperPrivilegedOps{client: client, config: cfg, layout: layout}
}
func (o *helperPrivilegedOps) EnsureBridge(ctx context.Context) error {
return o.client.EnsureBridge(ctx, o.networkConfig())
}
func (o *helperPrivilegedOps) CreateTap(ctx context.Context, tapName string) error {
return o.client.CreateTap(ctx, o.networkConfig(), tapName)
}
func (o *helperPrivilegedOps) DeleteTap(ctx context.Context, tapName string) error {
return o.client.DeleteTap(ctx, tapName)
}
func (o *helperPrivilegedOps) SyncResolverRouting(ctx context.Context, serverAddr string) error {
return o.client.SyncResolverRouting(ctx, o.config.BridgeName, serverAddr)
}
func (o *helperPrivilegedOps) ClearResolverRouting(ctx context.Context) error {
return o.client.ClearResolverRouting(ctx, o.config.BridgeName)
}
func (o *helperPrivilegedOps) EnsureNAT(ctx context.Context, guestIP, tap string, enable bool) error {
return o.client.EnsureNAT(ctx, guestIP, tap, enable)
}
func (o *helperPrivilegedOps) CreateDMSnapshot(ctx context.Context, rootfsPath, cowPath, dmName string) (dmSnapshotHandles, error) {
return o.client.CreateDMSnapshot(ctx, rootfsPath, cowPath, dmName)
}
func (o *helperPrivilegedOps) CleanupDMSnapshot(ctx context.Context, handles dmSnapshotHandles) error {
return o.client.CleanupDMSnapshot(ctx, handles)
}
func (o *helperPrivilegedOps) RemoveDMSnapshot(ctx context.Context, target string) error {
return o.client.RemoveDMSnapshot(ctx, target)
}
func (o *helperPrivilegedOps) FsckSnapshot(ctx context.Context, dmDev string) error {
return o.client.FsckSnapshot(ctx, dmDev)
}
func (o *helperPrivilegedOps) ReadExt4File(ctx context.Context, imagePath, guestPath string) ([]byte, error) {
return o.client.ReadExt4File(ctx, imagePath, guestPath)
}
func (o *helperPrivilegedOps) WriteExt4Files(ctx context.Context, imagePath string, files []roothelper.Ext4Write) error {
return o.client.WriteExt4Files(ctx, imagePath, files)
}
func (o *helperPrivilegedOps) ResolveFirecrackerBinary(ctx context.Context, requested string) (string, error) {
return o.client.ResolveFirecrackerBinary(ctx, normalizeFirecrackerBinary(requested, o.config.FirecrackerBin))
}
func (o *helperPrivilegedOps) LaunchFirecracker(ctx context.Context, req roothelper.FirecrackerLaunchRequest) (int, error) {
req.Network = o.networkConfig()
return o.client.LaunchFirecracker(ctx, req)
}
func (o *helperPrivilegedOps) EnsureSocketAccess(ctx context.Context, socketPath, label string) error {
if info, err := os.Stat(socketPath); err == nil {
if stat, ok := info.Sys().(*syscall.Stat_t); ok && int(stat.Uid) == os.Getuid() {
return os.Chmod(socketPath, 0o600)
}
}
return o.client.EnsureSocketAccess(ctx, socketPath, label)
}
func (o *helperPrivilegedOps) FindFirecrackerPID(ctx context.Context, apiSock string) (int, error) {
return o.client.FindFirecrackerPID(ctx, apiSock)
}
func (o *helperPrivilegedOps) KillProcess(ctx context.Context, pid int) error {
return o.client.KillProcess(ctx, pid)
}
func (o *helperPrivilegedOps) SignalProcess(ctx context.Context, pid int, signal string) error {
return o.client.SignalProcess(ctx, pid, signal)
}
func (o *helperPrivilegedOps) ProcessRunning(ctx context.Context, pid int, apiSock string) (bool, error) {
return o.client.ProcessRunning(ctx, pid, apiSock)
}
func (o *helperPrivilegedOps) networkConfig() roothelper.NetworkConfig {
return roothelper.NetworkConfig{
BridgeName: o.config.BridgeName,
BridgeIP: o.config.BridgeIP,
CIDR: o.config.CIDR,
}
}
func normalizeFirecrackerBinary(requested, configured string) string {
requested = strings.TrimSpace(requested)
if requested != "" {
return requested
}
return strings.TrimSpace(configured)
}

View file

@ -11,13 +11,13 @@ import (
type dmSnapshotHandles = dmsnap.Handles
func (n *HostNetwork) createDMSnapshot(ctx context.Context, rootfsPath, cowPath, dmName string) (dmSnapshotHandles, error) {
return dmsnap.Create(ctx, n.runner, rootfsPath, cowPath, dmName)
return n.privOps().CreateDMSnapshot(ctx, rootfsPath, cowPath, dmName)
}
func (n *HostNetwork) cleanupDMSnapshot(ctx context.Context, handles dmSnapshotHandles) error {
return dmsnap.Cleanup(ctx, n.runner, handles)
return n.privOps().CleanupDMSnapshot(ctx, handles)
}
func (n *HostNetwork) removeDMSnapshot(ctx context.Context, target string) error {
return dmsnap.Remove(ctx, n.runner, target)
return n.privOps().RemoveDMSnapshot(ctx, target)
}

View file

@ -57,7 +57,7 @@ func BangerSSHConfigPath(layout paths.Layout) string {
}
func (d *Daemon) ensureVMSSHClientConfig() {
if err := syncVMSSHClientConfig(d.layout, d.config.SSHKeyPath); err != nil && d.logger != nil {
if err := SyncVMSSHClientConfig(d.userLayout, d.config.SSHKeyPath); err != nil && d.logger != nil {
d.logger.Warn("vm ssh client config sync failed", "error", err.Error())
}
}
@ -68,7 +68,7 @@ func (d *Daemon) ensureVMSSHClientConfig() {
//
// The file lives in the banger config dir so users who manage their
// SSH config declaratively can decide how (or whether) to pull it in.
func syncVMSSHClientConfig(layout paths.Layout, keyPath string) error {
func SyncVMSSHClientConfig(layout paths.Layout, keyPath string) error {
keyPath = strings.TrimSpace(keyPath)
if keyPath == "" {
return nil

View file

@ -22,8 +22,8 @@ func TestSyncVMSSHClientConfigWritesBangerFileOnly(t *testing.T) {
}
keyPath := filepath.Join(homeDir, ".config", "banger", "ssh", "id_ed25519")
if err := syncVMSSHClientConfig(layout, keyPath); err != nil {
t.Fatalf("syncVMSSHClientConfig: %v", err)
if err := SyncVMSSHClientConfig(layout, keyPath); err != nil {
t.Fatalf("SyncVMSSHClientConfig: %v", err)
}
// Banger's own ssh_config file has the `Host *.vm` stanza.

View file

@ -106,7 +106,7 @@ func (n *HostNetwork) releaseTap(ctx context.Context, tapName string) error {
}
n.tapPool.mu.Unlock()
}
_, err := n.runner.RunSudo(ctx, "ip", "link", "del", tapName)
err := n.privOps().DeleteTap(ctx, tapName)
if err == nil {
go n.ensureTapPool(context.Background())
}

View file

@ -10,6 +10,7 @@ import (
"strconv"
"strings"
"banger/internal/config"
"banger/internal/guest"
"banger/internal/model"
"banger/internal/system"
@ -120,15 +121,22 @@ func (s *WorkspaceService) runFileSync(ctx context.Context, vm *model.VMRecord)
runner = system.NewRunner()
}
hostHome, err := os.UserHomeDir()
if err != nil {
return fmt.Errorf("resolve host user home: %w", err)
hostHome := strings.TrimSpace(s.config.HostHomeDir)
if hostHome == "" {
var err error
hostHome, err = os.UserHomeDir()
if err != nil {
return fmt.Errorf("resolve host user home: %w", err)
}
}
workDisk := vm.Runtime.WorkDiskPath
for _, entry := range s.config.FileSync {
hostPath := expandHostPath(entry.Host, hostHome)
hostPath, err := config.ResolveFileSyncHostPath(entry.Host, hostHome)
if err != nil {
return fmt.Errorf("file_sync: %w", err)
}
guestRel := guestPathRelativeToRoot(entry.Guest)
guestImagePath := "/" + guestRel
@ -140,6 +148,10 @@ func (s *WorkspaceService) runFileSync(ctx context.Context, vm *model.VMRecord)
}
return fmt.Errorf("file_sync: stat %s: %w", hostPath, err)
}
hostPath, err = config.ResolveExistingFileSyncHostPath(entry.Host, hostHome)
if err != nil {
return fmt.Errorf("file_sync: %w", err)
}
vmCreateStage(ctx, "prepare_work_disk", "file sync: "+entry.Host+" → "+entry.Guest)
@ -180,8 +192,8 @@ func (s *WorkspaceService) runFileSync(ctx context.Context, vm *model.VMRecord)
// inside ~/.aws that points at ~/secrets can't leak out of the tree
// the user named. Other special types (devices, FIFOs) are skipped
// silently. Top-level host paths go through os.Stat back in
// runFileSync and still follow, since the user explicitly named that
// path.
// runFileSync and may still follow, but only when the resolved target
// stays under the configured owner home.
func (s *WorkspaceService) copyHostDir(ctx context.Context, vm model.VMRecord, runner system.CommandRunner, imagePath, hostDir, guestTarget string) error {
if err := system.MkdirExt4(ctx, runner, imagePath, guestTarget, 0o755, 0, 0); err != nil {
return err
@ -234,15 +246,6 @@ func parseFileSyncMode(raw string) (os.FileMode, error) {
}
// expandHostPath expands a leading "~/" against the host user's
// home. Already-absolute paths pass through unchanged.
func expandHostPath(raw, home string) string {
raw = strings.TrimSpace(raw)
if strings.HasPrefix(raw, "~/") {
return filepath.Join(home, strings.TrimPrefix(raw, "~/"))
}
return raw
}
// guestPathRelativeToRoot returns the guest path as a relative path
// under /root (banger's work disk is mounted at /root in the guest,
// so everything syncable lives there). "~/foo" and "/root/foo" both

View file

@ -10,6 +10,7 @@ import (
"banger/internal/guestconfig"
"banger/internal/guestnet"
"banger/internal/model"
"banger/internal/roothelper"
"banger/internal/system"
)
@ -27,18 +28,19 @@ func (s *VMService) ensureSystemOverlay(ctx context.Context, vm *model.VMRecord)
// patchRootOverlay writes the per-VM config files (resolv.conf,
// hostname, hosts, sshd drop-in, network bootstrap, fstab) into the
// rootfs overlay. Reads the DM device path from the handle cache,
// which the start flow populates before calling this.
func (s *VMService) patchRootOverlay(ctx context.Context, vm model.VMRecord, image model.Image) error {
dmDev := s.vmHandles(vm.ID).DMDev
if dmDev == "" {
return fmt.Errorf("vm %q: DM device not in handle cache — start flow out of order?", vm.ID)
// rootfs overlay. The start flow passes the DM device path explicitly so the
// owner daemon can hand the privileged ext4 work to the root helper without
// rereading mutable process state.
func (s *VMService) patchRootOverlay(ctx context.Context, vm model.VMRecord, image model.Image, dmDev string) error {
if strings.TrimSpace(dmDev) == "" {
return fmt.Errorf("vm %q: DM device is required", vm.ID)
}
resolv := []byte(fmt.Sprintf("nameserver %s\n", s.config.DefaultDNS))
hostname := []byte(vm.Name + "\n")
hosts := []byte(fmt.Sprintf("127.0.0.1 localhost\n127.0.1.1 %s\n", vm.Name))
sshdConfig := []byte(sshdGuestConfig())
fstab, err := system.ReadDebugFSText(ctx, s.runner, dmDev, "/etc/fstab")
fstabBytes, err := s.privOps().ReadExt4File(ctx, dmDev, "/etc/fstab")
fstab := string(fstabBytes)
if err != nil {
fstab = ""
}
@ -70,19 +72,19 @@ func (s *VMService) patchRootOverlay(ctx context.Context, vm model.VMRecord, ima
s.capHooks.contributeGuest(builder, vm, image)
builder.WriteFile("/etc/fstab", []byte(builder.RenderFSTab(fstab)))
files := builder.Files()
writes := make([]roothelper.Ext4Write, 0, len(files))
for _, guestPath := range builder.FilePaths() {
data := files[guestPath]
mode := uint32(0o644)
if guestPath == guestnet.GuestScriptPath {
if err := system.WriteExt4FileMode(ctx, s.runner, dmDev, guestPath, 0o755, data); err != nil {
return err
}
continue
}
if err := system.WriteExt4File(ctx, s.runner, dmDev, guestPath, data); err != nil {
return err
mode = 0o755
}
writes = append(writes, roothelper.Ext4Write{
GuestPath: guestPath,
Data: files[guestPath],
Mode: mode,
})
}
return nil
return s.privOps().WriteExt4Files(ctx, dmDev, writes)
}
func (s *VMService) ensureWorkDisk(ctx context.Context, vm *model.VMRecord, image model.Image) (workDiskPreparation, error) {

View file

@ -10,7 +10,6 @@ import (
"sync"
"banger/internal/model"
"banger/internal/system"
)
// handleCache is the daemon's in-memory map of per-VM transient
@ -175,7 +174,8 @@ func (s *VMService) vmAlive(vm model.VMRecord) bool {
if h.PID <= 0 {
return false
}
return system.ProcessRunning(h.PID, vm.Runtime.APISockPath)
running, err := s.privOps().ProcessRunning(context.Background(), h.PID, vm.Runtime.APISockPath)
return err == nil && running
}
// rediscoverHandles loads what the last daemon start knew about a VM
@ -207,8 +207,10 @@ func (s *VMService) rediscoverHandles(ctx context.Context, vm model.VMRecord) (m
saved.PID = pid
return saved, true, nil
}
if saved.PID > 0 && system.ProcessRunning(saved.PID, apiSock) {
return saved, true, nil
if saved.PID > 0 {
if running, runErr := s.privOps().ProcessRunning(ctx, saved.PID, apiSock); runErr == nil && running {
return saved, true, nil
}
}
return saved, false, nil
}

View file

@ -5,7 +5,6 @@ import (
"errors"
"os"
"path/filepath"
"strconv"
"strings"
"time"
@ -184,7 +183,7 @@ func (s *VMService) killVMLocked(ctx context.Context, current model.VMRecord, si
}
pid := s.vmHandles(vm.ID).PID
op.stage("send_signal", "pid", pid, "signal", signal)
if _, err := s.runner.RunSudo(ctx, "kill", "-"+signal, strconv.Itoa(pid)); err != nil {
if err := s.privOps().SignalProcess(ctx, pid, signal); err != nil {
return model.VMRecord{}, err
}
op.stage("wait_for_exit", "pid", pid)

View file

@ -10,6 +10,7 @@ import (
"banger/internal/firecracker"
"banger/internal/imagepull"
"banger/internal/model"
"banger/internal/roothelper"
"banger/internal/system"
)
@ -40,7 +41,6 @@ type startContext struct {
dmName string
tapName string
fcPath string
machine *firecracker.Machine
// systemOverlayCreated records whether the system_overlay step
// actually created the file (vs. the file existing from a crashed
@ -243,12 +243,7 @@ func (s *VMService) buildStartSteps(op *operationLog, sc *startContext) []startS
// snapshot. Exit codes 0 + 1 are both "ok" here.
name: "fsck_snapshot",
run: func(ctx context.Context, sc *startContext) error {
if _, err := s.runner.RunSudo(ctx, "e2fsck", "-fy", sc.live.DMDev); err != nil {
if code := system.ExitCode(err); code < 0 || code > 1 {
return fmt.Errorf("fsck snapshot: %w", err)
}
}
return nil
return s.privOps().FsckSnapshot(ctx, sc.live.DMDev)
},
},
{
@ -256,7 +251,7 @@ func (s *VMService) buildStartSteps(op *operationLog, sc *startContext) []startS
createStage: "prepare_rootfs",
createDetail: "writing guest configuration",
run: func(ctx context.Context, sc *startContext) error {
return s.patchRootOverlay(ctx, *sc.vm, sc.image)
return s.patchRootOverlay(ctx, *sc.vm, sc.image, sc.live.DMDev)
},
},
{
@ -307,8 +302,8 @@ func (s *VMService) buildStartSteps(op *operationLog, sc *startContext) []startS
},
{
name: "firecracker_binary",
run: func(_ context.Context, sc *startContext) error {
fcPath, err := s.net.firecrackerBinary()
run: func(ctx context.Context, sc *startContext) error {
fcPath, err := s.net.firecrackerBinary(ctx)
if err != nil {
return err
}
@ -323,7 +318,7 @@ func (s *VMService) buildStartSteps(op *operationLog, sc *startContext) []startS
createDetail: "starting firecracker",
run: func(ctx context.Context, sc *startContext) error {
kernelArgs := buildKernelArgs(*sc.vm, sc.image, s.config.BridgeIP, s.config.DefaultDNS)
machineConfig := firecracker.MachineConfig{
launchReq := roothelper.FirecrackerLaunchRequest{
BinaryPath: sc.fcPath,
VMID: sc.vm.ID,
SocketPath: sc.apiSock,
@ -343,24 +338,15 @@ func (s *VMService) buildStartSteps(op *operationLog, sc *startContext) []startS
VSockCID: sc.vm.Runtime.VSockCID,
VCPUCount: sc.vm.Spec.VCPUCount,
MemoryMiB: sc.vm.Spec.MemoryMiB,
Logger: s.logger,
}
machineConfig := firecracker.MachineConfig{Drives: launchReq.Drives}
s.capHooks.contributeMachine(&machineConfig, *sc.vm, sc.image)
machine, err := firecracker.NewMachine(ctx, machineConfig)
launchReq.Drives = machineConfig.Drives
pid, err := s.privOps().LaunchFirecracker(ctx, launchReq)
if err != nil {
return err
}
sc.machine = machine
if err := machine.Start(ctx); err != nil {
// machine.Start can fail AFTER the firecracker process
// is already spawned (HTTP config phase). Record the
// PID so the undo can kill it; use a fresh ctx since
// the request ctx may be cancelled by now.
sc.live.PID = s.net.resolveFirecrackerPID(context.Background(), machine, sc.apiSock)
s.setVMHandles(sc.vm, *sc.live)
return err
}
sc.live.PID = s.net.resolveFirecrackerPID(context.Background(), machine, sc.apiSock)
sc.live.PID = pid
s.setVMHandles(sc.vm, *sc.live)
op.debugStage("firecracker_started", "pid", sc.live.PID)
return nil

View file

@ -58,9 +58,10 @@ type VMService struct {
// Peer services. VMService orchestrates across all three during
// start/stop/delete; pointer fields keep call sites direct without
// promoting the peer API to package-level interfaces.
net *HostNetwork
img *ImageService
ws *WorkspaceService
net *HostNetwork
img *ImageService
ws *WorkspaceService
priv privilegedOps
// vsockHostDevice is the path preflight + doctor expect to find for
// the vhost-vsock device. Defaults to defaultVsockHostDevice; tests
@ -101,6 +102,7 @@ type vmServiceDeps struct {
net *HostNetwork
img *ImageService
ws *WorkspaceService
priv privilegedOps
capHooks capabilityHooks
beginOperation func(name string, attrs ...any) *operationLog
vsockHostDevice string
@ -120,6 +122,7 @@ func newVMService(deps vmServiceDeps) *VMService {
net: deps.net,
img: deps.img,
ws: deps.ws,
priv: deps.priv,
capHooks: deps.capHooks,
beginOperation: deps.beginOperation,
vsockHostDevice: vsockPath,

View file

@ -427,8 +427,8 @@ func TestHealthVMReturnsHealthyForRunningGuest(t *testing.T) {
runner := &scriptedRunner{
t: t,
steps: []runnerStep{
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
sudoStep("", nil, "chmod", "600", vsockSock),
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
},
}
d := &Daemon{store: db, runner: runner}
@ -491,8 +491,8 @@ func TestPingVMAliasReturnsAliveForHealthyVM(t *testing.T) {
runner := &scriptedRunner{
t: t,
steps: []runnerStep{
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
sudoStep("", nil, "chmod", "600", vsockSock),
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
},
}
d := &Daemon{store: db, runner: runner}
@ -691,8 +691,8 @@ func TestPortsVMReturnsEnrichedPortsAndWebSchemes(t *testing.T) {
runner := &scriptedRunner{
t: t,
steps: []runnerStep{
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
sudoStep("", nil, "chmod", "600", vsockSock),
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock),
},
}
d := &Daemon{store: db, runner: runner}
@ -1148,13 +1148,92 @@ func TestRunFileSyncCopiesDirectoryRecursively(t *testing.T) {
}
}
func TestRunFileSyncAllowsTopLevelSymlinkWithinHome(t *testing.T) {
homeDir := t.TempDir()
t.Setenv("HOME", homeDir)
targetDir := filepath.Join(homeDir, ".config", "gh")
if err := os.MkdirAll(targetDir, 0o755); err != nil {
t.Fatal(err)
}
targetPath := filepath.Join(targetDir, "hosts.yml")
if err := os.WriteFile(targetPath, []byte("github.com"), 0o600); err != nil {
t.Fatal(err)
}
linkPath := filepath.Join(homeDir, "gh-hosts.yml")
if err := os.Symlink(targetPath, linkPath); err != nil {
t.Skipf("symlink unsupported on this filesystem: %v", err)
}
workDisk := t.TempDir()
d := &Daemon{
runner: &filesystemRunner{t: t},
config: model.DaemonConfig{
HostHomeDir: homeDir,
FileSync: []model.FileSyncEntry{
{Host: "~/gh-hosts.yml", Guest: "~/.config/gh/hosts.yml"},
},
},
}
wireServices(d)
vm := testVM("sync-top-level-symlink-ok", "image", "172.16.0.77")
vm.Runtime.WorkDiskPath = workDisk
if err := d.ws.runFileSync(context.Background(), &vm); err != nil {
t.Fatalf("runFileSync: %v", err)
}
got, err := os.ReadFile(filepath.Join(workDisk, ".config", "gh", "hosts.yml"))
if err != nil {
t.Fatal(err)
}
if string(got) != "github.com" {
t.Fatalf("guest file = %q, want github.com", got)
}
}
func TestRunFileSyncRejectsTopLevelSymlinkOutsideHome(t *testing.T) {
homeDir := t.TempDir()
t.Setenv("HOME", homeDir)
outsideDir := t.TempDir()
targetPath := filepath.Join(outsideDir, "secret.txt")
if err := os.WriteFile(targetPath, []byte("must-stay-outside"), 0o600); err != nil {
t.Fatal(err)
}
linkPath := filepath.Join(homeDir, "secret-link")
if err := os.Symlink(targetPath, linkPath); err != nil {
t.Skipf("symlink unsupported on this filesystem: %v", err)
}
workDisk := t.TempDir()
d := &Daemon{
runner: &filesystemRunner{t: t},
config: model.DaemonConfig{
HostHomeDir: homeDir,
FileSync: []model.FileSyncEntry{
{Host: "~/secret-link", Guest: "~/secret.txt"},
},
},
}
wireServices(d)
vm := testVM("sync-top-level-symlink-reject", "image", "172.16.0.78")
vm.Runtime.WorkDiskPath = workDisk
err := d.ws.runFileSync(context.Background(), &vm)
if err == nil || !strings.Contains(err.Error(), "owner home") {
t.Fatalf("runFileSync error = %v, want owner-home rejection", err)
}
if _, statErr := os.Stat(filepath.Join(workDisk, "secret.txt")); !os.IsNotExist(statErr) {
t.Fatalf("guest file exists after rejected sync (stat err = %v)", statErr)
}
}
// TestRunFileSyncSkipsNestedSymlinks pins the anti-sprawl contract:
// a symlink INSIDE a synced directory is not followed, even if the
// target holds real files. Without this, a user syncing ~/.aws with
// a ~/.aws/session -> ~/other-creds symlink would copy the unrelated
// creds into the guest. Top-level entries (the path the user
// literally named) still follow, because they explicitly asked for
// that path.
// creds into the guest. Top-level entries are resolved separately:
// they may still follow, but only when the real target stays under
// the configured owner home.
func TestRunFileSyncSkipsNestedSymlinks(t *testing.T) {
homeDir := t.TempDir()
t.Setenv("HOME", homeDir)
@ -1543,8 +1622,8 @@ func TestStopVMFallsBackToForcedCleanupAfterGracefulTimeout(t *testing.T) {
scriptedRunner: &scriptedRunner{
t: t,
steps: []runnerStep{
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), apiSock),
sudoStep("", nil, "chmod", "600", apiSock),
sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), apiSock),
{call: runnerCall{name: "pgrep", args: []string{"-n", "-f", apiSock}}, out: []byte(strconv.Itoa(fake.Process.Pid) + "\n")},
sudoStep("", nil, "kill", "-KILL", strconv.Itoa(fake.Process.Pid)),
},