From 853249dec28c5fe36ccf6efd392b798ae9ffdbd3 Mon Sep 17 00:00:00 2001 From: Thales Maciel Date: Tue, 28 Apr 2026 14:39:41 -0300 Subject: [PATCH] roothelper: tighten input validation across privileged RPCs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Defence-in-depth pass over every helper method that touches the host as root. Each fix narrows what a compromised owner-uid daemon could ask the helper to do; many close concrete file-ownership and DoS primitives that the previous validators didn't reach. Path / identifier validation: * priv.fsck_snapshot now requires /dev/mapper/fc-rootfs-* (was "is the string non-empty"). e2fsck -fy on /dev/sda1 was the motivating exploit. * priv.kill_process and priv.signal_process now read /proc//cmdline and require a "firecracker" substring before sending the signal. Killing arbitrary host PIDs (sshd, init, …) is no longer a one-RPC primitive. * priv.read_ext4_file and priv.write_ext4_files now require the image path to live under StateDir or be /dev/mapper/fc-rootfs-*. * priv.cleanup_dm_snapshot validates every non-empty Handles field: DM name fc-rootfs-*, DM device /dev/mapper/fc-rootfs-*, loops /dev/loopN. * priv.remove_dm_snapshot accepts only fc-rootfs-* names or /dev/mapper/fc-rootfs-* paths. * priv.ensure_nat now requires a parsable IPv4 address and a banger-prefixed tap. * priv.sync_resolver_routing and priv.clear_resolver_routing now require a Linux iface-name-shaped bridge name (1–15 chars, no whitespace/'/'/':') and, for sync, a parsable resolver address. Symlink defence: * priv.ensure_socket_access now validates the socket path is under RuntimeDir and not a symlink. The fcproc layer's chown/chmod moves to unix.Open(O_PATH|O_NOFOLLOW) + Fchownat(AT_EMPTY_PATH) + Fchmodat via /proc/self/fd, so even a swap of the leaf into a symlink between validation and the syscall is refused. The local-priv (non-root) fallback uses `chown -h`. * priv.cleanup_jailer_chroot rejects symlinks at both the leaf (os.Lstat) and intermediate path components (filepath.EvalSymlinks + clean-equality). The umount sweep was rewritten from shell `umount --recursive --lazy` to direct unix.Unmount(MNT_DETACH | UMOUNT_NOFOLLOW) per child mount, deepest-first; the findmnt guard remains as the rm-rf safety net. Local-priv mode falls back to `sudo umount --lazy`. Binary validation: * validateRootExecutable now opens with O_PATH|O_NOFOLLOW and Fstats through the resulting fd. Rejects path-level symlinks and narrows the TOCTOU window between validation and the SDK's exec to fork+exec time on a healthy host. Daemon socket: * The owner daemon now reads SO_PEERCRED on every accepted connection and refuses any UID that isn't 0 or the registered owner. Filesystem perms (0600 + ownerUID) already enforced this; the check is belt-and-braces in case the socket FD is ever leaked to a non-owner process. Docs: * docs/privileges.md walked end-to-end. Each helper RPC's Validation gate row reflects what the code actually enforces. New section "Running outside the system install" calls out the looser dev-mode trust model (NOPASSWD sudoers, helper hardening bypassed) so users don't deploy that path on shared hosts. Trust list updated to include every new validator. Tests added: validators (DM-loop, DM-remove-target, DM-handles, ext4-image-path, iface-name, IPv4, resolver-addr, not-symlink, firecracker-PID, root-executable variants), the daemon's authorize path (non-unix conn rejection + unix conn happy path), the umount2 ordering contract (deepest-first + --lazy on the sudo branch), and positive/negative cases for the chown-no-follow fallback. Verified end-to-end via `make smoke JOBS=4` on a KVM host. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/privileges.md | 98 +++++-- internal/daemon/daemon.go | 47 ++++ internal/daemon/daemon_test.go | 59 ++++ internal/daemon/fcproc/fcproc.go | 137 ++++++++-- internal/daemon/fcproc/fcproc_test.go | 229 ++++++++++++++++ internal/daemon/vm_test.go | 14 +- internal/roothelper/roothelper.go | 297 +++++++++++++++++++- internal/roothelper/roothelper_test.go | 359 +++++++++++++++++++++++++ 8 files changed, 1177 insertions(+), 63 deletions(-) diff --git a/docs/privileges.md b/docs/privileges.md index 89b31f1..b10e7ca 100644 --- a/docs/privileges.md +++ b/docs/privileges.md @@ -11,8 +11,8 @@ their eyes open. | Unit | User | Socket | Purpose | |---|---|---|---| -| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0700, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. | -| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, root) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. | +| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0600, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. | +| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, owner; root-owned dir at 0711) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. | The owner daemon does all the business logic. It never runs as root. The root helper runs as root but only accepts a fixed list of operations @@ -37,7 +37,8 @@ specific shape. The root helper: - Listens on a Unix socket at `/run/banger-root/bangerd-root.sock`, - mode 0600, owned by root, in a runtime dir at 0711 root. + mode 0600, owned by the registered owner UID, in a root-owned + runtime dir at 0711. - Reads `SO_PEERCRED` on every accepted connection and rejects any caller whose UID is not 0 or the owner UID recorded in `/etc/banger/install.toml`. The match is by UID, not username. @@ -46,8 +47,13 @@ The root helper: The owner daemon: -- Listens on `/run/banger/bangerd.sock`, mode 0700, owned by the +- Listens on `/run/banger/bangerd.sock`, mode 0600, owned by the install-time owner user. Other host users cannot connect. +- Reads `SO_PEERCRED` on every accepted connection and rejects any + caller whose UID is not 0 or the install-time owner UID. The + filesystem perms already gate access; the peer-cred read is + belt-and-braces in case the socket FD is ever leaked to a + non-owner process. - Resolves the helper socket path from the install metadata and retries with backoff if the helper hasn't started yet. @@ -56,29 +62,34 @@ socket on the local host. ## What the root helper will do, exactly -The helper exposes 17 RPC methods. Each is shaped so the owner daemon -can name a banger-managed object but cannot pass an arbitrary host -path or interface name. Code lives in -`internal/roothelper/roothelper.go`. +The helper exposes a fixed list of RPC methods (see +`internal/roothelper/roothelper.go` for the canonical set). Each is +shaped so the owner daemon can name a banger-managed object but +cannot pass an arbitrary host path or interface name. Every input +that names a path, device, PID, or interface is checked against a +validator before the helper touches the host. | Method | Effect | Validation gate | |---|---|---| | `priv.ensure_bridge` | Create the configured Linux bridge if missing; assign the bridge IP. | Bridge name and IP come from owner config; helper does not allow caller to pick `lo` etc. | | `priv.create_tap` | `ip link add tap NAME tuntap` and add to bridge, owned by the owner user. | Tap name must match `tap-fc-*` or `tap-pool-*`. | | `priv.delete_tap` | `ip link del NAME`. | Same prefix check. | -| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | No-op if `resolvectl` is missing. Bridge name comes from owner config. | -| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same. | -| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap and IP come from VM record; helper does not run arbitrary iptables. | +| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | Bridge name passes the kernel iface-name rules (1–15 chars, no `/`/`:`/whitespace, not `.`/`..`). Resolver address must parse via `net.ParseIP`. | +| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same iface-name check. | +| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap must be banger-prefixed. Guest IP must parse as IPv4. | | `priv.create_dm_snapshot` | Create a `dmsetup` device-mapper snapshot from `rootfs.ext4` with COW backing file. | Both paths must be inside `/var/lib/banger`; DM name must start with `fc-rootfs-`. | -| `priv.cleanup_dm_snapshot` | `dmsetup remove` for a snapshot the helper itself just created. | Acts on the typed `dmsnap.Handles` returned by create. | -| `priv.remove_dm_snapshot` | `dmsetup remove` by target name. | Name must start with `fc-rootfs-`. | -| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | Tolerates exit 1 (filesystem cleaned). | -| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Path is inside the image; image path is not validated against the state dir today (the helper trusts the daemon for image paths because images can sit anywhere the owner registers). | -| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same. | -| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Resolved path must be a regular file, executable, root-owned, not group/world-writable. | -| `priv.launch_firecracker` | Start the firecracker process for a VM. | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Binary must pass the same root-owned-executable check. | -| `priv.ensure_socket_access` | `chown` and `chmod 0660` on a firecracker API or vsock socket so the owner user can talk to it. | Helper does not chown arbitrary paths; this is invoked only after the helper itself just created the socket via firecracker. | -| `priv.find_firecracker_pid` / `priv.kill_process` / `priv.signal_process` / `priv.process_running` | Look up a firecracker PID by API socket path; signal or stat the resulting process. | Fixed-shape requests; path validation happens at launch time, and PID lookups are filtered to processes whose cmdline mentions the requested API socket. | +| `priv.cleanup_dm_snapshot` | `dmsetup remove` and `losetup -d` for a snapshot the helper itself just created. | Every non-empty `dmsnap.Handles` field is checked: DM name `fc-rootfs-*`, DM device `/dev/mapper/fc-rootfs-*`, loops `/dev/loopN`. | +| `priv.remove_dm_snapshot` | `dmsetup remove` by target. | Target must be either a `fc-rootfs-*` name or a `/dev/mapper/fc-rootfs-*` path. | +| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | DM device path must match `/dev/mapper/fc-rootfs-*`. Exit 1 (filesystem cleaned) is tolerated. | +| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Image path must be inside `/var/lib/banger` or a managed DM device. Guest path is rejected if it contains debugfs-hostile chars (`"`/`\`/newline). | +| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same image-path validator. | +| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Path is opened with `O_PATH \| O_NOFOLLOW` (refusing symlinks) and Fstat'd through the resulting fd: must be a regular file, executable, root-owned, not group/world-writable. | +| `priv.launch_firecracker` | Start the firecracker process for a VM (jailer-wrapped). | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel/initrd paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Jailer chroot base must be inside the system state/runtime dirs; jailer UID/GID must equal the registered owner. Binary must pass the same root-owned-executable check. | +| `priv.ensure_socket_access` | `chown` and `chmod 0600` on a firecracker API or vsock socket so the owner user can talk to it. | Path must be inside `/run/banger` and not a symlink. The helper opens it with `O_PATH \| O_NOFOLLOW`, refuses anything that isn't a unix socket, and chmod/chown via the resulting fd (no symlink-follow). The local-priv fallback uses `chown -h`. | +| `priv.cleanup_jailer_chroot` | Detach every mount under the per-VM jailer chroot via direct `umount2(MNT_DETACH \| UMOUNT_NOFOLLOW)` syscalls (deepest-first), then `rm -rf` the tree. | Path must be inside the system state/runtime dirs and not a symlink — including no symlinks at intermediate components (resolved with `EvalSymlinks` and re-checked). `UMOUNT_NOFOLLOW` makes the unmounts symlink-safe even if a path is swapped after validation. A `findmnt` guard refuses to `rm -rf` if any mount remains underneath. | +| `priv.find_firecracker_pid` | Resolve a firecracker PID by API socket path. | Filters to processes whose cmdline mentions the requested API socket. | +| `priv.kill_process` / `priv.signal_process` | Send SIGKILL or a named signal to a PID. | PID must refer to a running process whose `/proc//cmdline` mentions `firecracker`. | +| `priv.process_running` | Check whether a PID is alive (no host mutation). | Read-only; same cmdline filter. | Anything outside this list returns `unknown_method` and is logged. The helper does not run a shell, does not exec helper scripts, and does @@ -186,6 +197,38 @@ What `uninstall` does NOT do automatically: - It does not remove the owner user, the owner's home, or anything the user wrote into a guest from inside the guest. +## Running outside the system install + +Everything above describes the supported deployment: `banger system +install` lays down both systemd units and the helper takes over every +privileged operation. + +It is also possible to run `bangerd` directly without installing the +helper — the binary still works as a per-user daemon and shells `sudo +-n` for each privileged operation it would otherwise hand off +(`iptables`, `ip`, `mount`, `mknod`, `dmsetup`, `e2fsck`, `kill`, +`chown -h`, `chmod`, `losetup`, `chown`, `chmod`, `firecracker`). +This mode is intended for ad-hoc developer machines while iterating on +banger itself. + +It carries a different trust model: + +- It needs `NOPASSWD` sudoers entries for the developer (otherwise + every VM action prompts for a password). +- Once those entries exist, **any** process running as the developer + can invoke those commands with arbitrary arguments — banger's input + validators only constrain what banger itself sends. They are no + defence against a different program on the same account. +- The helper's `SO_PEERCRED` boundary, the systemd hardening + (`NoNewPrivileges`, `ProtectSystem=strict`, the narrow + `CapabilityBoundingSet`), and the helper's own input validators are + all bypassed. + +If you care about isolating banger's blast radius from anything else +running as your user, use the system install. If you only need +banger to work on your own dev box, the non-system mode is fine — +just don't run it on a shared or production host. + ## Hardening of the systemd units The two units ship with restrictive defaults; they are written by @@ -222,11 +265,16 @@ If you install banger as root, you are trusting: 1. The two binaries banger drops under `/usr/local/bin` and the companion agent under `/usr/local/lib/banger`. These should match the build artifacts you reviewed. -2. The path validators in - `internal/roothelper/roothelper.go:validateManagedPath`, - `validateTapName`, `validateDMName`, and `validateRootExecutable` - to be tight. If those are bypassed, the helper would carry out a - privileged op against an unmanaged path. They are unit-tested in +2. The path/identifier validators in + `internal/roothelper/roothelper.go` to be tight: `validateManagedPath`, + `validateTapName`, `validateDMName`, `validateDMDevicePath`, + `validateLoopDevicePath`, `validateDMRemoveTarget`, + `validateDMSnapshotHandles`, `validateRootExecutable`, + `validateNotSymlink`, `validateExt4ImagePath`, + `validateLinuxIfaceName`, `validateIPv4`, `validateResolverAddr`, + and `validateFirecrackerPID`. If any of these are bypassed, the + helper would carry out a privileged op against an unmanaged + target. They are unit-tested in `internal/roothelper/roothelper_test.go`. 3. The Firecracker binary banger executes. The helper refuses to launch anything that isn't a regular, executable, root-owned, not diff --git a/internal/daemon/daemon.go b/internal/daemon/daemon.go index ca6b7c8..9f727b6 100644 --- a/internal/daemon/daemon.go +++ b/internal/daemon/daemon.go @@ -14,6 +14,8 @@ import ( "sync" "time" + "golang.org/x/sys/unix" + "banger/internal/config" ws "banger/internal/daemon/workspace" "banger/internal/installmeta" @@ -259,6 +261,13 @@ func (d *Daemon) Serve(ctx context.Context) error { func (d *Daemon) handleConn(conn net.Conn) { defer conn.Close() + if err := d.authorizeConn(conn); err != nil { + if d.logger != nil { + d.logger.Warn("daemon connection rejected", "remote", conn.RemoteAddr().String(), "error", err.Error()) + } + _ = json.NewEncoder(conn).Encode(rpc.NewError("unauthorized", err.Error())) + return + } reader := bufio.NewReader(conn) var req rpc.Request if err := json.NewDecoder(reader).Decode(&req); err != nil { @@ -281,6 +290,44 @@ func (d *Daemon) handleConn(conn net.Conn) { } } +// authorizeConn enforces SO_PEERCRED on the daemon socket as a +// belt-and-braces check on top of filesystem perms (0600 + chowned to +// the owner). Filesystem perms already prevent other host users from +// connecting; the peer-cred read closes the door on any path that +// might leak the socket FD to a non-owner process. Mirrors the +// equivalent check in roothelper.authorizeConn. +func (d *Daemon) authorizeConn(conn net.Conn) error { + unixConn, ok := conn.(*net.UnixConn) + if !ok { + return errors.New("daemon requires unix connections") + } + rawConn, err := unixConn.SyscallConn() + if err != nil { + return err + } + var cred *unix.Ucred + var controlErr error + if err := rawConn.Control(func(fd uintptr) { + cred, controlErr = unix.GetsockoptUcred(int(fd), unix.SOL_SOCKET, unix.SO_PEERCRED) + }); err != nil { + return err + } + if controlErr != nil { + return controlErr + } + if cred == nil { + return errors.New("missing peer credentials") + } + expected := d.clientUID + if expected < 0 { + expected = os.Getuid() + } + if int(cred.Uid) == 0 || int(cred.Uid) == expected { + return nil + } + return fmt.Errorf("uid %d is not allowed to use the daemon", cred.Uid) +} + func (d *Daemon) watchRequestDisconnect(conn net.Conn, reader *bufio.Reader, method string, cancel context.CancelFunc) func() { if conn == nil || reader == nil { return func() {} diff --git a/internal/daemon/daemon_test.go b/internal/daemon/daemon_test.go index 6cd4545..7b19cb6 100644 --- a/internal/daemon/daemon_test.go +++ b/internal/daemon/daemon_test.go @@ -22,6 +22,65 @@ import ( "banger/internal/system" ) +// TestAuthorizeConnRejectsNonUnixConn pins the type guard at the top +// of authorizeConn: SO_PEERCRED only makes sense on a unix socket, so +// anything else must be refused outright. net.Pipe gives us a +// connection that satisfies net.Conn but isn't a *net.UnixConn, which +// is exactly the shape we need to exercise the early-return. +func TestAuthorizeConnRejectsNonUnixConn(t *testing.T) { + d := &Daemon{} + pipeA, pipeB := net.Pipe() + defer pipeA.Close() + defer pipeB.Close() + if err := d.authorizeConn(pipeA); err == nil { + t.Fatal("authorizeConn(pipe) succeeded, want error") + } +} + +// TestAuthorizeConnAcceptsOwnerUIDOverUnixSocket pins the happy path: +// when the test process connects to a freshly bound unix socket as +// itself, the daemon's peer-cred check matches d.clientUID and lets +// the connection through. +func TestAuthorizeConnAcceptsOwnerUIDOverUnixSocket(t *testing.T) { + dir := t.TempDir() + sockPath := filepath.Join(dir, "test.sock") + listener, err := net.Listen("unix", sockPath) + if err != nil { + t.Fatalf("listen: %v", err) + } + defer listener.Close() + + type result struct { + err error + } + got := make(chan result, 1) + go func() { + conn, err := listener.Accept() + if err != nil { + got <- result{err: err} + return + } + defer conn.Close() + d := &Daemon{clientUID: os.Getuid()} + got <- result{err: d.authorizeConn(conn)} + }() + + client, err := net.Dial("unix", sockPath) + if err != nil { + t.Fatalf("dial: %v", err) + } + defer client.Close() + + select { + case r := <-got: + if r.err != nil { + t.Fatalf("authorizeConn(unix self) = %v, want nil", r.err) + } + case <-time.After(2 * time.Second): + t.Fatal("authorizeConn never returned") + } +} + func TestRegisterImageRequiresKernel(t *testing.T) { rootfs := filepath.Join(t.TempDir(), "rootfs.ext4") if err := os.WriteFile(rootfs, []byte("rootfs"), 0o644); err != nil { diff --git a/internal/daemon/fcproc/fcproc.go b/internal/daemon/fcproc/fcproc.go index 7bd7990..1d3eaac 100644 --- a/internal/daemon/fcproc/fcproc.go +++ b/internal/daemon/fcproc/fcproc.go @@ -12,6 +12,7 @@ import ( "log/slog" "os" "path/filepath" + "sort" "strconv" "strings" "sync" @@ -202,18 +203,57 @@ func (m *Manager) ensureSocketAccessFor(ctx context.Context, socketPath, label s if err := pollPath(ctx, socketPath, timeout, interval, label); err != nil { return err } - if os.Geteuid() == 0 { - if _, err := m.runner.Run(ctx, "chmod", "600", socketPath); err != nil { + return chownChmodNoFollow(ctx, m.runner, socketPath, uid, gid, 0o600) +} + +// chownChmodNoFollow sets owner/group/mode on path without following +// symlinks at the leaf. Required because the helper RPCs that drive +// socket access run as root: a follow-symlink chmod/chown becomes an +// arbitrary file-ownership primitive if the caller can plant a symlink +// at the target. +// +// Linux idiom: open with O_PATH|O_NOFOLLOW (errors out if the leaf is a +// symlink), Fstat the fd to confirm the file is a unix socket, then +// chown via Fchownat(AT_EMPTY_PATH) and chmod via /proc/self/fd/N +// (fchmod on an O_PATH fd returns EBADF, but the /proc path resolves +// straight back to the inode the fd already pins, so no leaf re-traversal +// happens). +// +// Falls back to `sudo chown -h` + `sudo chmod` for the local-priv mode +// where the daemon isn't root and can't issue the syscalls itself; the +// `-h` flag still avoids the symlink-follow on the chown side. +func chownChmodNoFollow(ctx context.Context, runner Runner, path string, uid, gid int, mode os.FileMode) error { + if os.Geteuid() != 0 { + // Mode-then-owner ordering preserves the pre-existing failure + // semantics of the legacy `chmod 600 / chown` shell-out path + // (chmod-failure tests expect chown to be skipped). `chown -h` + // keeps the symlink-no-follow guarantee on this branch. + if _, err := runner.RunSudo(ctx, "chmod", fmt.Sprintf("%o", mode.Perm()), path); err != nil { return err } - _, err := m.runner.Run(ctx, "chown", fmt.Sprintf("%d:%d", uid, gid), socketPath) + _, err := runner.RunSudo(ctx, "chown", "-h", fmt.Sprintf("%d:%d", uid, gid), path) return err } - if _, err := m.runner.RunSudo(ctx, "chmod", "600", socketPath); err != nil { - return err + fd, err := unix.Open(path, unix.O_PATH|unix.O_NOFOLLOW|unix.O_CLOEXEC, 0) + if err != nil { + return fmt.Errorf("open %s: %w", path, err) } - _, err := m.runner.RunSudo(ctx, "chown", fmt.Sprintf("%d:%d", uid, gid), socketPath) - return err + defer unix.Close(fd) + var st unix.Stat_t + if err := unix.Fstat(fd, &st); err != nil { + return fmt.Errorf("fstat %s: %w", path, err) + } + if st.Mode&unix.S_IFMT != unix.S_IFSOCK { + return fmt.Errorf("%s is not a unix socket (mode %#o)", path, st.Mode&unix.S_IFMT) + } + procPath := "/proc/self/fd/" + strconv.Itoa(fd) + if err := unix.Fchmodat(unix.AT_FDCWD, procPath, uint32(mode.Perm()), 0); err != nil { + return fmt.Errorf("chmod %s: %w", path, err) + } + if err := unix.Fchownat(fd, "", uid, gid, unix.AT_EMPTY_PATH); err != nil { + return fmt.Errorf("chown %s: %w", path, err) + } + return nil } // FindPID returns the PID of the firecracker process listening on apiSock, @@ -447,23 +487,84 @@ func (m *Manager) CleanupJailerChroot(ctx context.Context, chrootRoot string) er if strings.TrimSpace(chrootRoot) == "" { return nil } - if _, err := os.Stat(chrootRoot); os.IsNotExist(err) { - return nil + // Lstat (not Stat): if chrootRoot is a symlink the umount/rm shell-outs + // below would chase it. The handler-side validateNotSymlink also catches + // this, but lifting the check inside fcproc closes the TOCTOU window + // between the handler check and our umount command. + info, err := os.Lstat(chrootRoot) + if err != nil { + if os.IsNotExist(err) { + return nil + } + return fmt.Errorf("inspect chroot %s: %w", chrootRoot, err) } - // Best-effort umount: for chroots that were never bind-mounted (a - // stale install pre-bind-mount work, say) this fails — that's fine, - // the findmnt guard below is what enforces safety. - _ = m.sudoIgnore(ctx, "umount", "--recursive", "--lazy", chrootRoot) - if mounts, err := m.mountsUnder(ctx, chrootRoot); err != nil { + if info.Mode()&os.ModeSymlink != 0 { + return fmt.Errorf("refusing to clean up %q: path is a symlink", chrootRoot) + } + if !info.IsDir() { + return fmt.Errorf("refusing to clean up %q: not a directory", chrootRoot) + } + // Resolve any intermediate symlinks and require the result equals the + // input — that catches a planted `…/jail/firecracker/ → /` even + // though the leaf "/root" component is itself a real directory inside + // the redirected target. Equality + Lstat together cover both top and + // intermediate symlink shapes. + resolved, err := filepath.EvalSymlinks(chrootRoot) + if err != nil { + return fmt.Errorf("resolve chroot %s: %w", chrootRoot, err) + } + if filepath.Clean(resolved) != filepath.Clean(chrootRoot) { + return fmt.Errorf("refusing to clean up %q: resolves to %q via symlink", chrootRoot, resolved) + } + // Switch from `umount --recursive --lazy ` (shell-resolved, + // follows symlinks at exec time) to direct umount2() syscalls per child + // mount with UMOUNT_NOFOLLOW. That fully closes the residual TOCTOU + // between the EvalSymlinks check above and the unmount: even if a daemon- + // uid attacker swapped a child mount's path to a symlink in the gap, the + // kernel refuses to follow it. The findmnt guard below still catches any + // mount we couldn't detach. + mounts, err := m.mountsUnder(ctx, chrootRoot) + if err != nil { return fmt.Errorf("inspect chroot mounts: %w", err) - } else if len(mounts) > 0 { - return fmt.Errorf("refusing to rm -rf %q: still has %d mount(s): %v", chrootRoot, len(mounts), mounts) + } + // Deepest-first so child mounts come off before parents; otherwise a + // parent unmount would EBUSY against in-use children. + sort.Slice(mounts, func(i, j int) bool { + return strings.Count(mounts[i], "/") > strings.Count(mounts[j], "/") + }) + for _, mt := range mounts { + if err := m.detachMount(ctx, mt); err != nil { + return fmt.Errorf("detach %q: %w", mt, err) + } + } + if remaining, err := m.mountsUnder(ctx, chrootRoot); err != nil { + return fmt.Errorf("re-inspect chroot mounts: %w", err) + } else if len(remaining) > 0 { + return fmt.Errorf("refusing to rm -rf %q: still has %d mount(s): %v", chrootRoot, len(remaining), remaining) } return m.sudo(ctx, "rm", "-rf", "--", chrootRoot) } -func (m *Manager) sudoIgnore(ctx context.Context, name string, args ...string) error { - err := m.sudo(ctx, name, args...) +// detachMount tears down a single mount target with MNT_DETACH (lazy) + +// UMOUNT_NOFOLLOW (refuse symlinks). Falls back to `sudo umount --lazy` +// when not running as root, since umount2() requires CAP_SYS_ADMIN. +// +// ENOENT and EINVAL on the syscall path are treated as "already gone" — +// findmnt's snapshot can race with parallel cleanups, and a missing +// mount is the desired end state. +func (m *Manager) detachMount(ctx context.Context, target string) error { + if os.Geteuid() == 0 { + err := unix.Unmount(target, unix.MNT_DETACH|unix.UMOUNT_NOFOLLOW) + if err == nil || errors.Is(err, unix.ENOENT) || errors.Is(err, unix.EINVAL) { + return nil + } + return err + } + // Local-priv fallback: shell `umount --lazy` resolves the path through + // the kernel without UMOUNT_NOFOLLOW, but the EvalSymlinks check earlier + // already constrained the chroot tree. The dev-mode caveat in + // docs/privileges.md covers this branch's looser guarantees. + _, err := m.runner.RunSudo(ctx, "umount", "--lazy", target) return err } diff --git a/internal/daemon/fcproc/fcproc_test.go b/internal/daemon/fcproc/fcproc_test.go index d013c7b..99ff665 100644 --- a/internal/daemon/fcproc/fcproc_test.go +++ b/internal/daemon/fcproc/fcproc_test.go @@ -6,6 +6,7 @@ import ( "log/slog" "os" "path/filepath" + "strings" "testing" "time" ) @@ -232,6 +233,234 @@ func TestEnsureSocketAccessForAsyncWaitsForSocketThenChowns(t *testing.T) { } } +// recordingRunner captures every Run/RunSudo invocation's full +// argv. Used to assert that ensureSocketAccessFor's fallback path +// passes `chown -h` rather than the symlink-following plain `chown`. +type recordingRunner struct { + sudos [][]string + runs [][]string +} + +func (r *recordingRunner) Run(_ context.Context, name string, args ...string) ([]byte, error) { + r.runs = append(r.runs, append([]string{name}, args...)) + return nil, nil +} + +func (r *recordingRunner) RunSudo(_ context.Context, args ...string) ([]byte, error) { + r.sudos = append(r.sudos, append([]string(nil), args...)) + return nil, nil +} + +// TestCleanupJailerChrootRejectsSymlink pins the TOCTOU-closing +// fcproc-side check: even if a daemon-uid attacker somehow bypasses +// the helper handler's validateNotSymlink (or races it), the cleanup +// itself refuses a symlinked path before any umount/rm shells. +func TestCleanupJailerChrootRejectsSymlink(t *testing.T) { + dir := t.TempDir() + target := filepath.Join(dir, "real") + if err := os.Mkdir(target, 0o700); err != nil { + t.Fatalf("mkdir target: %v", err) + } + link := filepath.Join(dir, "link") + if err := os.Symlink(target, link); err != nil { + t.Fatalf("symlink: %v", err) + } + + // scriptedRunner with no scripted calls — any shell invocation + // trips r.t.Fatalf, proving rejection happened before umount/rm. + runner := &scriptedRunner{t: t} + mgr := New(runner, Config{}, slog.Default()) + if err := mgr.CleanupJailerChroot(context.Background(), link); err == nil { + t.Fatal("CleanupJailerChroot(symlink) succeeded, want error") + } +} + +// TestCleanupJailerChrootRejectsIntermediateSymlink covers the +// `/jail/firecracker/ → /` shape: the leaf "/root" component +// is a real directory inside the redirected target, but EvalSymlinks +// resolves to a different path so we still bail. +func TestCleanupJailerChrootRejectsIntermediateSymlink(t *testing.T) { + dir := t.TempDir() + realParent := filepath.Join(dir, "real-parent") + if err := os.MkdirAll(filepath.Join(realParent, "root"), 0o700); err != nil { + t.Fatalf("mkdir real: %v", err) + } + linkParent := filepath.Join(dir, "link-parent") + if err := os.Symlink(realParent, linkParent); err != nil { + t.Fatalf("symlink: %v", err) + } + chrootViaSymlink := filepath.Join(linkParent, "root") + + runner := &scriptedRunner{t: t} + mgr := New(runner, Config{}, slog.Default()) + if err := mgr.CleanupJailerChroot(context.Background(), chrootViaSymlink); err == nil { + t.Fatal("CleanupJailerChroot(symlinked-parent) succeeded, want error") + } +} + +// TestCleanupJailerChrootHappyPathWithoutMounts pins the no-leak case: +// when findmnt reports zero mounts under the chroot, the cleanup +// skips straight to `sudo rm -rf` without invoking umount2 / sudo +// umount at all. Regression guard for the umount2 rewrite — if the +// new logic leaks an extra runner call here, this test will fail. +func TestCleanupJailerChrootHappyPathWithoutMounts(t *testing.T) { + dir := t.TempDir() + chroot := filepath.Join(dir, "root") + if err := os.Mkdir(chroot, 0o700); err != nil { + t.Fatalf("mkdir chroot: %v", err) + } + runner := &scriptedRunner{ + t: t, + runs: []scriptedCall{ + // First mountsUnder() — pre-detach. Empty stdout = no mounts. + {matchName: "findmnt", out: nil}, + // Second mountsUnder() — post-detach guard. Same. + {matchName: "findmnt", out: nil}, + }, + // sudo rm -rf -- chroot. + sudos: []scriptedCall{{}}, + } + mgr := New(runner, Config{}, slog.Default()) + if err := mgr.CleanupJailerChroot(context.Background(), chroot); err != nil { + t.Fatalf("CleanupJailerChroot: %v", err) + } + if len(runner.runs) != 0 { + t.Fatalf("findmnt scripted calls left over: %d", len(runner.runs)) + } + if len(runner.sudos) != 0 { + t.Fatalf("sudo scripted calls left over: %d", len(runner.sudos)) + } +} + +// TestCleanupJailerChrootDetachesMountsDeepestFirst pins the ordering +// contract for the umount2 rewrite: child mounts come off before +// parents, otherwise the parent unmount would race against in-use +// children. The non-root code path shells `sudo umount --lazy`, which +// the recording runner captures so we can assert order + the --lazy +// flag. +func TestCleanupJailerChrootDetachesMountsDeepestFirst(t *testing.T) { + if os.Geteuid() == 0 { + t.Skip("euid 0 takes the umount2 syscall branch; this test exercises the sudo fallback") + } + dir := t.TempDir() + chroot := filepath.Join(dir, "root") + if err := os.Mkdir(chroot, 0o700); err != nil { + t.Fatalf("mkdir chroot: %v", err) + } + parent := chroot + child := filepath.Join(chroot, "lib") + deep := filepath.Join(child, "deep") + findmntOut := []byte(strings.Join([]string{parent, child, deep}, "\n")) + runner := &mountRecordingRunner{findmntOut: findmntOut} + mgr := New(runner, Config{}, slog.Default()) + if err := mgr.CleanupJailerChroot(context.Background(), chroot); err != nil { + t.Fatalf("CleanupJailerChroot: %v", err) + } + // Three umount + final rm -rf. The umount targets must be deep, + // child, parent in that order. + wantTargets := []string{deep, child, parent} + if len(runner.umountTargets) != len(wantTargets) { + t.Fatalf("umount calls = %v, want %d", runner.umountTargets, len(wantTargets)) + } + for i, want := range wantTargets { + if runner.umountTargets[i] != want { + t.Fatalf("umount[%d] = %q, want %q", i, runner.umountTargets[i], want) + } + } + if !runner.lazyFlagSeen { + t.Fatalf("expected umount --lazy on the sudo branch, args = %v", runner.umountArgs) + } + if !runner.rmCalled { + t.Fatal("rm -rf was never invoked after the umount sweep") + } +} + +// mountRecordingRunner stubs out findmnt + sudo for the cleanup path: +// the first findmnt call returns the canned mount list (pre-detach), +// subsequent calls return empty to simulate the kernel having dropped +// each mount as we asked. sudo umount/rm calls are captured and +// answer success. +type mountRecordingRunner struct { + findmntOut []byte + findmntCalls int + umountTargets []string + umountArgs [][]string + lazyFlagSeen bool + rmCalled bool +} + +func (r *mountRecordingRunner) Run(_ context.Context, name string, _ ...string) ([]byte, error) { + if name == "findmnt" { + r.findmntCalls++ + if r.findmntCalls == 1 { + return r.findmntOut, nil + } + return nil, nil + } + return nil, nil +} + +func (r *mountRecordingRunner) RunSudo(_ context.Context, args ...string) ([]byte, error) { + if len(args) == 0 { + return nil, nil + } + switch args[0] { + case "umount": + // Last arg is the target. Earlier args are flags. + if len(args) >= 2 { + r.umountTargets = append(r.umountTargets, args[len(args)-1]) + } + r.umountArgs = append(r.umountArgs, append([]string(nil), args...)) + for _, a := range args[1 : len(args)-1] { + if a == "--lazy" || a == "-l" { + r.lazyFlagSeen = true + } + } + case "rm": + r.rmCalled = true + } + return nil, nil +} + +// TestEnsureSocketAccessSudoBranchUsesChownNoFollow pins the +// symlink-defence on the local-priv (non-root) path: a follow-symlink +// chown on a daemon-uid attacker-planted symlink is the same arbitrary +// file-ownership primitive we close in the root branch via +// O_PATH|O_NOFOLLOW. Test only runs as non-root (the syscall branch is +// taken when euid == 0, which CI doesn't see). +func TestEnsureSocketAccessSudoBranchUsesChownNoFollow(t *testing.T) { + if os.Geteuid() == 0 { + t.Skip("euid 0 takes the syscall branch; the sudo branch is only reachable as a regular user") + } + socketPath := filepath.Join(t.TempDir(), "present.sock") + if err := os.WriteFile(socketPath, []byte{}, 0o600); err != nil { + t.Fatalf("WriteFile: %v", err) + } + runner := &recordingRunner{} + mgr := New(runner, Config{}, slog.Default()) + + if err := mgr.EnsureSocketAccess(context.Background(), socketPath, "api socket"); err != nil { + t.Fatalf("EnsureSocketAccess: %v", err) + } + if len(runner.sudos) != 2 { + t.Fatalf("got %d sudo calls, want 2 (chmod, chown)", len(runner.sudos)) + } + chown := runner.sudos[1] + if len(chown) < 2 || chown[0] != "chown" { + t.Fatalf("second sudo call = %v, want chown", chown) + } + hasNoFollow := false + for _, arg := range chown[1:] { + if arg == "-h" { + hasNoFollow = true + break + } + } + if !hasNoFollow { + t.Fatalf("chown args = %v, missing the -h symlink-no-follow flag", chown) + } +} + func contains(s, sub string) bool { for i := 0; i+len(sub) <= len(s); i++ { if s[i:i+len(sub)] == sub { diff --git a/internal/daemon/vm_test.go b/internal/daemon/vm_test.go index 868e5b0..131c55f 100644 --- a/internal/daemon/vm_test.go +++ b/internal/daemon/vm_test.go @@ -428,7 +428,7 @@ func TestHealthVMReturnsHealthyForRunningGuest(t *testing.T) { t: t, steps: []runnerStep{ sudoStep("", nil, "chmod", "600", vsockSock), - sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), + sudoStep("", nil, "chown", "-h", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), }, } d := &Daemon{store: db, runner: runner} @@ -492,7 +492,7 @@ func TestPingVMAliasReturnsAliveForHealthyVM(t *testing.T) { t: t, steps: []runnerStep{ sudoStep("", nil, "chmod", "600", vsockSock), - sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), + sudoStep("", nil, "chown", "-h", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), }, } d := &Daemon{store: db, runner: runner} @@ -692,7 +692,7 @@ func TestPortsVMReturnsEnrichedPortsAndWebSchemes(t *testing.T) { t: t, steps: []runnerStep{ sudoStep("", nil, "chmod", "600", vsockSock), - sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), + sudoStep("", nil, "chown", "-h", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), vsockSock), }, } d := &Daemon{store: db, runner: runner} @@ -1623,7 +1623,7 @@ func TestStopVMFallsBackToForcedCleanupAfterGracefulTimeout(t *testing.T) { t: t, steps: []runnerStep{ sudoStep("", nil, "chmod", "600", apiSock), - sudoStep("", nil, "chown", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), apiSock), + sudoStep("", nil, "chown", "-h", fmt.Sprintf("%d:%d", os.Getuid(), os.Getgid()), apiSock), {call: runnerCall{name: "pgrep", args: []string{"-n", "-f", apiSock}}, out: []byte(strconv.Itoa(fake.Process.Pid) + "\n")}, sudoStep("", nil, "kill", "-KILL", strconv.Itoa(fake.Process.Pid)), }, @@ -2068,14 +2068,16 @@ func (r *filesystemRunner) RunSudo(ctx context.Context, args ...string) ([]byte, } return nil, os.WriteFile(dst, data, os.FileMode(mode)) case "chown": - // Recognised forms, both no-op under test (we run as the test + // Recognised forms, all no-op under test (we run as the test // user and os.Chown would need CAP_CHOWN): // chown OWNER TARGET // chown -R OWNER TARGET + // chown -h OWNER TARGET (symlink-no-follow; required by + // fcproc.chownChmodNoFollow) switch { case len(args) == 3: return nil, nil - case len(args) == 4 && args[1] == "-R": + case len(args) == 4 && (args[1] == "-R" || args[1] == "-h"): return nil, nil default: return nil, fmt.Errorf("unexpected chown args: %v", args) diff --git a/internal/roothelper/roothelper.go b/internal/roothelper/roothelper.go index bad286c..4310aca 100644 --- a/internal/roothelper/roothelper.go +++ b/internal/roothelper/roothelper.go @@ -12,7 +12,6 @@ import ( "path/filepath" "strconv" "strings" - "syscall" "time" "golang.org/x/sys/unix" @@ -463,6 +462,18 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + // syncResolverRouting short-circuits on empty input; only + // validate when actually doing something. This stops a + // compromised daemon from flapping arbitrary system-managed + // links via resolvectl. + if strings.TrimSpace(params.BridgeName) != "" || strings.TrimSpace(params.ServerAddr) != "" { + if err := validateLinuxIfaceName(params.BridgeName); err != nil { + return rpc.NewError("bad_params", err.Error()) + } + if err := validateResolverAddr(params.ServerAddr); err != nil { + return rpc.NewError("bad_params", err.Error()) + } + } return marshalResultOrError(struct{}{}, s.syncResolverRouting(ctx, params.BridgeName, params.ServerAddr)) case methodClearResolverRouting: params, err := rpc.DecodeParams[struct { @@ -471,6 +482,11 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + if strings.TrimSpace(params.BridgeName) != "" { + if err := validateLinuxIfaceName(params.BridgeName); err != nil { + return rpc.NewError("bad_params", err.Error()) + } + } return marshalResultOrError(struct{}{}, s.clearResolverRouting(ctx, params.BridgeName)) case methodEnsureNAT: params, err := rpc.DecodeParams[struct { @@ -481,6 +497,16 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + // Without these the helper installs iptables rules with + // daemon-supplied identifiers; argv-style exec rules out + // command injection, but a compromised daemon could still + // install MASQUERADE rules tied to arbitrary IPs/interfaces. + if err := validateIPv4(params.GuestIP); err != nil { + return rpc.NewError("bad_params", err.Error()) + } + if err := validateTapName(params.Tap); err != nil { + return rpc.NewError("bad_params", err.Error()) + } return marshalResultOrError(struct{}{}, hostnat.Ensure(ctx, s.runner, params.GuestIP, params.Tap, params.Enable)) case methodCreateDMSnapshot: params, err := rpc.DecodeParams[struct { @@ -507,6 +533,13 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + // Each Handles field flows into a `dmsetup remove` / + // `losetup -d` shell-out as root. Without these checks a + // compromised daemon could ask the helper to detach + // arbitrary loop devices or remove unrelated DM targets. + if err := validateDMSnapshotHandles(params); err != nil { + return rpc.NewError("bad_params", err.Error()) + } return marshalResultOrError(struct{}{}, dmsnap.Cleanup(ctx, s.runner, params)) case methodRemoveDMSnapshot: params, err := rpc.DecodeParams[struct { @@ -515,6 +548,9 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + if err := validateDMRemoveTarget(params.Target); err != nil { + return rpc.NewError("bad_params", err.Error()) + } return marshalResultOrError(struct{}{}, dmsnap.Remove(ctx, s.runner, params.Target)) case methodFsckSnapshot: params, err := rpc.DecodeParams[struct { @@ -532,6 +568,13 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + // Without this validation a compromised daemon can drive + // debugfs as root against any path on the host; it would have + // to be a real ext4 image to leak data, but the constraint is + // trivially expressed and adds no operational cost. + if err := s.validateExt4ImagePath(params.ImagePath); err != nil { + return rpc.NewError("bad_params", err.Error()) + } data, readErr := system.ReadExt4File(ctx, s.runner, params.ImagePath, params.GuestPath) return marshalResultOrError(readExt4FileResult{Data: data}, readErr) case methodWriteExt4Files: @@ -542,6 +585,9 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + if err := s.validateExt4ImagePath(params.ImagePath); err != nil { + return rpc.NewError("bad_params", err.Error()) + } return marshalResultOrError(struct{}{}, s.writeExt4Files(ctx, params.ImagePath, params.Files)) case methodResolveFirecrackerBin: params, err := rpc.DecodeParams[struct { @@ -567,6 +613,20 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + // Without these checks the helper's chown/chmod becomes an + // arbitrary file-ownership primitive: a daemon-uid attacker + // could plant a symlink at any path under RuntimeDir (or just + // pass /etc/shadow) and have the helper transfer ownership to + // the daemon UID. The fcproc layer also chowns/chmods via + // O_PATH|O_NOFOLLOW so the leaf can't be a symlink at the time + // of the syscall — these checks are belt + braces and give a + // clear error before we even open the path. + if err := s.validateManagedPath(params.SocketPath, paths.ResolveSystem().RuntimeDir); err != nil { + return rpc.NewError("invalid_path", err.Error()) + } + if err := validateNotSymlink(params.SocketPath); err != nil { + return rpc.NewError("invalid_path", err.Error()) + } return marshalResultOrError(struct{}{}, s.ensureSocketAccess(ctx, params.SocketPath, params.Label)) case methodFindFirecrackerPID: params, err := rpc.DecodeParams[struct { @@ -584,6 +644,9 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + if err := validateFirecrackerPID(params.PID); err != nil { + return rpc.NewError("invalid_pid", err.Error()) + } _, killErr := s.runner.Run(ctx, "kill", "-KILL", strconv.Itoa(params.PID)) return marshalResultOrError(struct{}{}, killErr) case methodSignalProcess: @@ -594,6 +657,9 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err != nil { return rpc.NewError("bad_params", err.Error()) } + if err := validateFirecrackerPID(params.PID); err != nil { + return rpc.NewError("invalid_pid", err.Error()) + } signal := strings.TrimSpace(params.Signal) if signal == "" { signal = "TERM" @@ -620,6 +686,14 @@ func (s *Server) dispatch(ctx context.Context, req rpc.Request) rpc.Response { if err := s.validateManagedPath(params.ChrootRoot, systemLayout.StateDir, systemLayout.RuntimeDir); err != nil { return rpc.NewError("invalid_path", err.Error()) } + // validateManagedPath only does textual prefix matching. A + // symlink at e.g. /var/lib/banger/jail/x → / would pass the + // prefix check, and the subsequent `umount --recursive --lazy` + // would detach real host mounts. Reject leaf symlinks before + // we go anywhere near unmount/rm. + if err := validateNotSymlink(params.ChrootRoot); err != nil { + return rpc.NewError("invalid_path", err.Error()) + } err = fcproc.New(s.runner, fcproc.Config{}, s.logger).CleanupJailerChroot(ctx, params.ChrootRoot) return marshalResultOrError(struct{}{}, err) default: @@ -683,8 +757,11 @@ func (s *Server) clearResolverRouting(ctx context.Context, bridgeName string) er } func (s *Server) fsckSnapshot(ctx context.Context, dmDev string) error { - if strings.TrimSpace(dmDev) == "" { - return errors.New("dm device is required") + // Helper runs as root with -fy (auto-yes); without the prefix check + // a compromised daemon could fsck arbitrary block devices like + // /dev/sda1 and corrupt the host filesystem. + if err := validateDMDevicePath(dmDev); err != nil { + return err } if _, err := s.runner.Run(ctx, "e2fsck", "-fy", dmDev); err != nil { if code := system.ExitCode(err); code < 0 || code > 1 { @@ -973,6 +1050,143 @@ func (s *Server) validateManagedPath(path string, roots ...string) error { return fmt.Errorf("path %q is outside banger-managed directories", path) } +// validateExt4ImagePath accepts a path that is either inside the +// banger StateDir (regular ext4 image files we manage) or a managed +// DM-snapshot device (/dev/mapper/fc-rootfs-*). Both shapes are +// legitimate inputs for the helper's debugfs/e2cp/e2rm RPCs; anything +// else would let a compromised daemon point those tools at arbitrary +// host files. +func (s *Server) validateExt4ImagePath(path string) error { + if err := s.validateManagedPath(path, paths.ResolveSystem().StateDir); err == nil { + return nil + } + if err := validateDMDevicePath(path); err == nil { + return nil + } + return fmt.Errorf("path %q is not a banger-managed ext4 image", path) +} + +// validateLoopDevicePath confirms path is `/dev/loopN` for some N≥0. +// dmsnap.Cleanup detaches loops via `losetup -d `; without this +// a compromised daemon could ask the helper to detach an arbitrary +// device node. +func validateLoopDevicePath(path string) error { + path = strings.TrimSpace(path) + if path == "" { + return errors.New("loop device path is required") + } + const prefix = "/dev/loop" + if !strings.HasPrefix(path, prefix) { + return fmt.Errorf("loop device %q must live under /dev/loop", path) + } + suffix := path[len(prefix):] + if suffix == "" { + return fmt.Errorf("loop device %q is missing its index", path) + } + for _, r := range suffix { + if r < '0' || r > '9' { + return fmt.Errorf("loop device %q has non-numeric suffix", path) + } + } + return nil +} + +// validateDMSnapshotHandles checks every non-empty field on a Handles +// passed to priv.cleanup_dm_snapshot. Empty fields are tolerated (the +// dmsnap layer treats them as "nothing to clean here") but anything +// set must look like a banger-managed object. +func validateDMSnapshotHandles(h dmsnap.Handles) error { + if h.DMName != "" { + if err := validateDMName(h.DMName); err != nil { + return err + } + } + if h.DMDev != "" { + if err := validateDMDevicePath(h.DMDev); err != nil { + return err + } + } + if h.BaseLoop != "" { + if err := validateLoopDevicePath(h.BaseLoop); err != nil { + return err + } + } + if h.COWLoop != "" { + if err := validateLoopDevicePath(h.COWLoop); err != nil { + return err + } + } + return nil +} + +// validateDMRemoveTarget covers the union accepted by `dmsetup remove`: +// either the bare DM name or the /dev/mapper/ path. Both shapes +// are produced by dmsnap.Cleanup; nothing else should reach the helper. +func validateDMRemoveTarget(target string) error { + target = strings.TrimSpace(target) + if target == "" { + return errors.New("dm target is required") + } + if strings.HasPrefix(target, "/dev/mapper/") { + return validateDMDevicePath(target) + } + return validateDMName(target) +} + +// validateLinuxIfaceName mirrors the kernel's __dev_valid_name rules +// in a permissive subset: 1-15 chars, no whitespace, no slash, no +// colon, and not the special "." or "..". Used for bridge-name +// arguments to resolvectl. argv-style exec already prevents shell +// injection, but a compromised daemon could otherwise flap any +// system-managed link by passing its name here. +func validateLinuxIfaceName(name string) error { + name = strings.TrimSpace(name) + if name == "" { + return errors.New("interface name is required") + } + if len(name) > 15 { + return fmt.Errorf("interface %q exceeds 15 chars", name) + } + if name == "." || name == ".." { + return fmt.Errorf("interface name %q is reserved", name) + } + for _, r := range name { + if r <= ' ' || r == '/' || r == ':' || r == 0x7f { + return fmt.Errorf("interface %q contains invalid char %q", name, r) + } + } + return nil +} + +// validateIPv4 confirms ip parses as an IPv4 address. The NAT helpers +// build /32 iptables rules from this string; non-v4 input would +// produce malformed rules at best and unexpected ones at worst. +func validateIPv4(ip string) error { + ip = strings.TrimSpace(ip) + if ip == "" { + return errors.New("ipv4 address is required") + } + parsed := net.ParseIP(ip) + if parsed == nil || parsed.To4() == nil { + return fmt.Errorf("invalid ipv4 address %q", ip) + } + return nil +} + +// validateResolverAddr confirms s parses as an IP address (v4 or v6). +// resolvectl accepts either; reject anything that doesn't parse so a +// compromised daemon can't wedge resolved with garbage input. +func validateResolverAddr(s string) error { + s = strings.TrimSpace(s) + if s == "" { + return errors.New("resolver address is required") + } + if net.ParseIP(s) == nil { + return fmt.Errorf("invalid resolver address %q", s) + } + return nil +} + func validateTapName(tapName string) error { tapName = strings.TrimSpace(tapName) if strings.HasPrefix(tapName, vmTapPrefix) || strings.HasPrefix(tapName, tapPoolPrefix) { @@ -1004,25 +1218,80 @@ func validateDMDevicePath(path string) error { return validateDMName(filepath.Base(cleaned)) } -func validateRootExecutable(path string) error { - info, err := os.Stat(path) +// validateNotSymlink rejects paths whose final component is a symlink. +// validateManagedPath does textual prefix matching only; pairing it +// with an Lstat check stops a daemon-uid attacker from planting a +// symlink at a managed path and using helper RPCs that operate on +// that path (chown/chmod sockets, umount/rm chroot trees) to reach +// arbitrary host objects. There is a small TOCTOU window between +// this check and the syscall that follows; for sockets the +// fcproc-level O_PATH|O_NOFOLLOW open closes that window, and for +// the chroot cleanup the umount step is bracketed by a findmnt +// guard inside fcproc.CleanupJailerChroot. +func validateNotSymlink(path string) error { + info, err := os.Lstat(path) if err != nil { - return err + return fmt.Errorf("inspect %s: %w", path, err) } - if !info.Mode().IsRegular() { + if info.Mode()&os.ModeSymlink != 0 { + return fmt.Errorf("path %q must not be a symlink", path) + } + return nil +} + +// validateFirecrackerPID confirms pid refers to a running process whose +// /proc//cmdline mentions "firecracker". Both jailer and direct +// firecracker launches keep the binary name in cmdline, so substring +// match catches both. PID reuse is theoretically racey but the kill +// follows immediately, so the window is too narrow to weaponise. +func validateFirecrackerPID(pid int) error { + if pid <= 0 { + return fmt.Errorf("pid %d is invalid", pid) + } + data, err := os.ReadFile(filepath.Join("/proc", strconv.Itoa(pid), "cmdline")) + if err != nil { + return fmt.Errorf("inspect pid %d: %w", pid, err) + } + cmdline := strings.ReplaceAll(string(data), "\x00", " ") + if !strings.Contains(cmdline, "firecracker") { + return fmt.Errorf("pid %d is not a banger-managed firecracker process", pid) + } + return nil +} + +// validateRootExecutable opens the path with O_PATH|O_NOFOLLOW and re-checks +// every constraint via Fstat on the resulting fd. Going through O_PATH (rather +// than the previous os.Stat) gives two improvements: +// +// - O_NOFOLLOW rejects path-level symlinks outright, so a swap of the +// binary's path component to point at an attacker-controlled target is +// caught here rather than slipping through to the SDK. +// - Fstat reads metadata from the inode the kernel just resolved, narrowing +// the TOCTOU window between validation and exec to the time it takes the +// SDK to fork+exec — sub-millisecond on a healthy host. The window can't +// be fully closed without re-pointing the SDK at /proc/self/fd/N (the +// known-good idiom), which would require keeping the fd alive across +// fork+exec; we accept the tiny residual window for the simpler shape. +func validateRootExecutable(path string) error { + fd, err := unix.Open(path, unix.O_PATH|unix.O_NOFOLLOW|unix.O_CLOEXEC, 0) + if err != nil { + return fmt.Errorf("open executable %q: %w", path, err) + } + defer unix.Close(fd) + var st unix.Stat_t + if err := unix.Fstat(fd, &st); err != nil { + return fmt.Errorf("fstat executable %q: %w", path, err) + } + if st.Mode&unix.S_IFMT != unix.S_IFREG { return fmt.Errorf("firecracker binary %q is not a regular file", path) } - if info.Mode().Perm()&0o111 == 0 { + if st.Mode&0o111 == 0 { return fmt.Errorf("firecracker binary %q is not executable", path) } - if info.Mode().Perm()&0o022 != 0 { + if st.Mode&0o022 != 0 { return fmt.Errorf("firecracker binary %q must not be group/world writable", path) } - stat, ok := info.Sys().(*syscall.Stat_t) - if !ok { - return fmt.Errorf("inspect owner for %q: unsupported file metadata", path) - } - if stat.Uid != 0 { + if st.Uid != 0 { return fmt.Errorf("firecracker binary %q must be root-owned in system mode", path) } return nil diff --git a/internal/roothelper/roothelper_test.go b/internal/roothelper/roothelper_test.go index 0570cb0..a5ce078 100644 --- a/internal/roothelper/roothelper_test.go +++ b/internal/roothelper/roothelper_test.go @@ -1,9 +1,13 @@ package roothelper import ( + "os" + "path/filepath" "testing" + "banger/internal/daemon/dmsnap" "banger/internal/firecracker" + "banger/internal/paths" ) func TestValidateDMDevicePath(t *testing.T) { @@ -33,6 +37,361 @@ func TestValidateDMDevicePath(t *testing.T) { } } +func TestValidateFirecrackerPID(t *testing.T) { + t.Parallel() + + if err := validateFirecrackerPID(0); err == nil { + t.Fatal("validateFirecrackerPID(0) succeeded, want error") + } + if err := validateFirecrackerPID(-1); err == nil { + t.Fatal("validateFirecrackerPID(-1) succeeded, want error") + } + // Self pid points at the go test binary, whose cmdline does not + // contain "firecracker" — rejection proves the helper would refuse + // to kill arbitrary host processes. + if err := validateFirecrackerPID(os.Getpid()); err == nil { + t.Fatal("validateFirecrackerPID(test pid) succeeded, want error") + } + // PID 1 is init/systemd on Linux — a juicy target for a compromised + // daemon, and definitely not firecracker. Make sure we'd refuse. + if err := validateFirecrackerPID(1); err == nil { + t.Fatal("validateFirecrackerPID(1) succeeded, want error") + } +} + +// TestValidateRootExecutableRejectsSymlink pins the O_NOFOLLOW +// guarantee: even if the path string passes a textual check, a symlink +// at the leaf is refused before we ever stat the target. +func TestValidateRootExecutableRejectsSymlink(t *testing.T) { + t.Parallel() + dir := t.TempDir() + regular := filepath.Join(dir, "real") + if err := os.WriteFile(regular, []byte{}, 0o755); err != nil { + t.Fatalf("write regular: %v", err) + } + link := filepath.Join(dir, "link") + if err := os.Symlink(regular, link); err != nil { + t.Fatalf("symlink: %v", err) + } + if err := validateRootExecutable(link); err == nil { + t.Fatal("validateRootExecutable(symlink) succeeded, want error") + } +} + +// TestValidateRootExecutableRejectsNonRootOwned exercises the Fstat +// uid check on a file the test user just created: it can't possibly +// be uid 0, so the validator must refuse it. This is the regression +// guard against the previous os.Stat code path drifting back in. +func TestValidateRootExecutableRejectsNonRootOwned(t *testing.T) { + t.Parallel() + if os.Getuid() == 0 { + t.Skip("test runs as root; cannot construct a non-root-owned file in a tempdir we can write") + } + path := filepath.Join(t.TempDir(), "binary") + if err := os.WriteFile(path, []byte{}, 0o755); err != nil { + t.Fatalf("write: %v", err) + } + err := validateRootExecutable(path) + if err == nil { + t.Fatal("validateRootExecutable(user-owned) succeeded, want error") + } + if !contains(err.Error(), "root-owned") { + t.Fatalf("err = %v, want root-owned rejection", err) + } +} + +func TestValidateRootExecutableRejectsGroupWritable(t *testing.T) { + t.Parallel() + if os.Getuid() == 0 { + t.Skip("test runs as root; can't construct a non-root-owned file") + } + path := filepath.Join(t.TempDir(), "binary") + if err := os.WriteFile(path, []byte{}, 0o775); err != nil { + t.Fatalf("write: %v", err) + } + err := validateRootExecutable(path) + if err == nil { + t.Fatal("validateRootExecutable(group-writable) succeeded, want error") + } +} + +// contains is a local substring helper that mirrors strings.Contains +// without pulling in the package — kept tiny so the test file's +// dependency surface stays close to the thing being tested. +func contains(s, sub string) bool { + for i := 0; i+len(sub) <= len(s); i++ { + if s[i:i+len(sub)] == sub { + return true + } + } + return false +} + +func TestValidateLoopDevicePath(t *testing.T) { + t.Parallel() + + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "loop0", arg: "/dev/loop0", ok: true}, + {name: "loop12", arg: "/dev/loop12", ok: true}, + {name: "no_index", arg: "/dev/loop", ok: false}, + {name: "non_numeric", arg: "/dev/loop-x", ok: false}, + {name: "wrong_prefix", arg: "/dev/sda1", ok: false}, + {name: "empty", arg: "", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateLoopDevicePath(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateLoopDevicePath(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateLoopDevicePath(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateDMRemoveTarget(t *testing.T) { + t.Parallel() + + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "dm_name", arg: "fc-rootfs-abc", ok: true}, + {name: "dm_device_path", arg: "/dev/mapper/fc-rootfs-abc", ok: true}, + {name: "wrong_prefix", arg: "not-banger", ok: false}, + {name: "device_wrong_prefix", arg: "/dev/mapper/not-banger", ok: false}, + {name: "empty", arg: "", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateDMRemoveTarget(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateDMRemoveTarget(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateDMRemoveTarget(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateDMSnapshotHandles(t *testing.T) { + t.Parallel() + + // Empty handles are tolerated — the dmsnap layer treats every + // missing field as a no-op for that step. + if err := validateDMSnapshotHandles(dmsnap.Handles{}); err != nil { + t.Fatalf("validateDMSnapshotHandles(empty) = %v, want nil", err) + } + good := dmsnap.Handles{ + BaseLoop: "/dev/loop0", + COWLoop: "/dev/loop1", + DMName: "fc-rootfs-abc", + DMDev: "/dev/mapper/fc-rootfs-abc", + } + if err := validateDMSnapshotHandles(good); err != nil { + t.Fatalf("validateDMSnapshotHandles(good) = %v, want nil", err) + } + for _, tc := range []struct { + name string + mutate func(dmsnap.Handles) dmsnap.Handles + wantErr bool + }{ + {name: "bad_dm_name", mutate: func(h dmsnap.Handles) dmsnap.Handles { + h.DMName = "rogue" + return h + }, wantErr: true}, + {name: "bad_dm_device", mutate: func(h dmsnap.Handles) dmsnap.Handles { + h.DMDev = "/dev/sda1" + return h + }, wantErr: true}, + {name: "bad_base_loop", mutate: func(h dmsnap.Handles) dmsnap.Handles { + h.BaseLoop = "/dev/sda1" + return h + }, wantErr: true}, + {name: "bad_cow_loop", mutate: func(h dmsnap.Handles) dmsnap.Handles { + h.COWLoop = "/etc/shadow" + return h + }, wantErr: true}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateDMSnapshotHandles(tc.mutate(good)) + if tc.wantErr && err == nil { + t.Fatalf("validateDMSnapshotHandles(%s) succeeded, want error", tc.name) + } + if !tc.wantErr && err != nil { + t.Fatalf("validateDMSnapshotHandles(%s) = %v, want nil", tc.name, err) + } + }) + } +} + +func TestValidateLinuxIfaceName(t *testing.T) { + t.Parallel() + + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "typical_bridge", arg: "br-banger", ok: true}, + {name: "uplink", arg: "enp5s0", ok: true}, + {name: "max_len", arg: "a234567890abcde", ok: true}, // 15 chars + {name: "empty", arg: "", ok: false}, + {name: "too_long", arg: "a234567890abcdef", ok: false}, + {name: "with_slash", arg: "br/0", ok: false}, + {name: "with_space", arg: "br 0", ok: false}, + {name: "with_colon", arg: "br:0", ok: false}, + {name: "dot", arg: ".", ok: false}, + {name: "dotdot", arg: "..", ok: false}, + {name: "control_char", arg: "br\x01", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateLinuxIfaceName(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateLinuxIfaceName(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateLinuxIfaceName(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateIPv4(t *testing.T) { + t.Parallel() + + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "valid", arg: "172.16.0.2", ok: true}, + {name: "with_whitespace", arg: " 10.0.0.1 ", ok: true}, + {name: "empty", arg: "", ok: false}, + {name: "ipv6", arg: "::1", ok: false}, + {name: "garbage", arg: "not-an-ip", ok: false}, + {name: "with_cidr", arg: "10.0.0.1/24", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateIPv4(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateIPv4(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateIPv4(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateResolverAddr(t *testing.T) { + t.Parallel() + + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "ipv4", arg: "192.168.1.1", ok: true}, + {name: "ipv6", arg: "fe80::1", ok: true}, + {name: "empty", arg: "", ok: false}, + {name: "garbage", arg: "resolver.example", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := validateResolverAddr(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateResolverAddr(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateResolverAddr(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateExt4ImagePath(t *testing.T) { + t.Parallel() + + srv := &Server{} + stateDir := paths.ResolveSystem().StateDir + for _, tc := range []struct { + name string + arg string + ok bool + }{ + {name: "managed_image", arg: filepath.Join(stateDir, "vms", "abc", "rootfs.ext4"), ok: true}, + {name: "managed_dm_device", arg: "/dev/mapper/fc-rootfs-test", ok: true}, + {name: "outside_state", arg: "/etc/shadow", ok: false}, + {name: "wrong_dm", arg: "/dev/mapper/not-banger", ok: false}, + {name: "relative", arg: "rootfs.ext4", ok: false}, + {name: "empty", arg: "", ok: false}, + } { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + err := srv.validateExt4ImagePath(tc.arg) + if tc.ok && err != nil { + t.Fatalf("validateExt4ImagePath(%q) = %v, want nil", tc.arg, err) + } + if !tc.ok && err == nil { + t.Fatalf("validateExt4ImagePath(%q) succeeded, want error", tc.arg) + } + }) + } +} + +func TestValidateNotSymlink(t *testing.T) { + t.Parallel() + + dir := t.TempDir() + regular := filepath.Join(dir, "real") + if err := os.WriteFile(regular, []byte("ok"), 0o600); err != nil { + t.Fatalf("write regular: %v", err) + } + link := filepath.Join(dir, "link") + if err := os.Symlink(regular, link); err != nil { + t.Fatalf("symlink: %v", err) + } + + if err := validateNotSymlink(regular); err != nil { + t.Fatalf("validateNotSymlink(real) = %v, want nil", err) + } + if err := validateNotSymlink(link); err == nil { + t.Fatal("validateNotSymlink(symlink) succeeded, want error") + } + if err := validateNotSymlink(filepath.Join(dir, "missing")); err == nil { + t.Fatal("validateNotSymlink(missing) succeeded, want error") + } + // Symlink pointing into the system tree is the threat we care about. + // A daemon-uid attacker plants this kind of link and hopes the helper + // follows it; this test pins the rejection. + hostileLink := filepath.Join(dir, "hostile") + if err := os.Symlink("/etc/shadow", hostileLink); err != nil { + t.Fatalf("symlink: %v", err) + } + if err := validateNotSymlink(hostileLink); err == nil { + t.Fatal("validateNotSymlink(symlink-to-/etc/shadow) succeeded, want error") + } +} + func TestValidateLaunchDrivePathAllowsManagedRootDMDevice(t *testing.T) { t.Parallel()