roothelper: tighten input validation across privileged RPCs

Defence-in-depth pass over every helper method that touches the host
as root. Each fix narrows what a compromised owner-uid daemon could
ask the helper to do; many close concrete file-ownership and DoS
primitives that the previous validators didn't reach.

Path / identifier validation:
  * priv.fsck_snapshot now requires /dev/mapper/fc-rootfs-* (was
    "is the string non-empty"). e2fsck -fy on /dev/sda1 was the
    motivating exploit.
  * priv.kill_process and priv.signal_process now read
    /proc/<pid>/cmdline and require a "firecracker" substring before
    sending the signal. Killing arbitrary host PIDs (sshd, init, …)
    is no longer a one-RPC primitive.
  * priv.read_ext4_file and priv.write_ext4_files now require the
    image path to live under StateDir or be /dev/mapper/fc-rootfs-*.
  * priv.cleanup_dm_snapshot validates every non-empty Handles field:
    DM name fc-rootfs-*, DM device /dev/mapper/fc-rootfs-*, loops
    /dev/loopN.
  * priv.remove_dm_snapshot accepts only fc-rootfs-* names or
    /dev/mapper/fc-rootfs-* paths.
  * priv.ensure_nat now requires a parsable IPv4 address and a
    banger-prefixed tap.
  * priv.sync_resolver_routing and priv.clear_resolver_routing now
    require a Linux iface-name-shaped bridge name (1–15 chars, no
    whitespace/'/'/':') and, for sync, a parsable resolver address.

Symlink defence:
  * priv.ensure_socket_access now validates the socket path is under
    RuntimeDir and not a symlink. The fcproc layer's chown/chmod
    moves to unix.Open(O_PATH|O_NOFOLLOW) + Fchownat(AT_EMPTY_PATH)
    + Fchmodat via /proc/self/fd, so even a swap of the leaf into a
    symlink between validation and the syscall is refused. The
    local-priv (non-root) fallback uses `chown -h`.
  * priv.cleanup_jailer_chroot rejects symlinks at both the leaf
    (os.Lstat) and intermediate path components (filepath.EvalSymlinks
    + clean-equality). The umount sweep was rewritten from shell
    `umount --recursive --lazy` to direct unix.Unmount(MNT_DETACH |
    UMOUNT_NOFOLLOW) per child mount, deepest-first; the findmnt
    guard remains as the rm-rf safety net. Local-priv mode falls
    back to `sudo umount --lazy`.

Binary validation:
  * validateRootExecutable now opens with O_PATH|O_NOFOLLOW and
    Fstats through the resulting fd. Rejects path-level symlinks and
    narrows the TOCTOU window between validation and the SDK's exec
    to fork+exec time on a healthy host.

Daemon socket:
  * The owner daemon now reads SO_PEERCRED on every accepted
    connection and refuses any UID that isn't 0 or the registered
    owner. Filesystem perms (0600 + ownerUID) already enforced this;
    the check is belt-and-braces in case the socket FD is ever
    leaked to a non-owner process.

Docs:
  * docs/privileges.md walked end-to-end. Each helper RPC's
    Validation gate row reflects what the code actually enforces.
    New section "Running outside the system install" calls out the
    looser dev-mode trust model (NOPASSWD sudoers, helper hardening
    bypassed) so users don't deploy that path on shared hosts.
    Trust list updated to include every new validator.

Tests added: validators (DM-loop, DM-remove-target, DM-handles,
ext4-image-path, iface-name, IPv4, resolver-addr, not-symlink,
firecracker-PID, root-executable variants), the daemon's authorize
path (non-unix conn rejection + unix conn happy path), the umount2
ordering contract (deepest-first + --lazy on the sudo branch), and
positive/negative cases for the chown-no-follow fallback.

Verified end-to-end via `make smoke JOBS=4` on a KVM host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-28 14:39:41 -03:00
parent 6b543cb17f
commit 853249dec2
No known key found for this signature in database
GPG key ID: 33112E6833C34679
8 changed files with 1177 additions and 63 deletions

View file

@ -11,8 +11,8 @@ their eyes open.
| Unit | User | Socket | Purpose |
|---|---|---|---|
| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0700, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. |
| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, root) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. |
| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0600, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. |
| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, owner; root-owned dir at 0711) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. |
The owner daemon does all the business logic. It never runs as root.
The root helper runs as root but only accepts a fixed list of operations
@ -37,7 +37,8 @@ specific shape.
The root helper:
- Listens on a Unix socket at `/run/banger-root/bangerd-root.sock`,
mode 0600, owned by root, in a runtime dir at 0711 root.
mode 0600, owned by the registered owner UID, in a root-owned
runtime dir at 0711.
- Reads `SO_PEERCRED` on every accepted connection and rejects any
caller whose UID is not 0 or the owner UID recorded in
`/etc/banger/install.toml`. The match is by UID, not username.
@ -46,8 +47,13 @@ The root helper:
The owner daemon:
- Listens on `/run/banger/bangerd.sock`, mode 0700, owned by the
- Listens on `/run/banger/bangerd.sock`, mode 0600, owned by the
install-time owner user. Other host users cannot connect.
- Reads `SO_PEERCRED` on every accepted connection and rejects any
caller whose UID is not 0 or the install-time owner UID. The
filesystem perms already gate access; the peer-cred read is
belt-and-braces in case the socket FD is ever leaked to a
non-owner process.
- Resolves the helper socket path from the install metadata and
retries with backoff if the helper hasn't started yet.
@ -56,29 +62,34 @@ socket on the local host.
## What the root helper will do, exactly
The helper exposes 17 RPC methods. Each is shaped so the owner daemon
can name a banger-managed object but cannot pass an arbitrary host
path or interface name. Code lives in
`internal/roothelper/roothelper.go`.
The helper exposes a fixed list of RPC methods (see
`internal/roothelper/roothelper.go` for the canonical set). Each is
shaped so the owner daemon can name a banger-managed object but
cannot pass an arbitrary host path or interface name. Every input
that names a path, device, PID, or interface is checked against a
validator before the helper touches the host.
| Method | Effect | Validation gate |
|---|---|---|
| `priv.ensure_bridge` | Create the configured Linux bridge if missing; assign the bridge IP. | Bridge name and IP come from owner config; helper does not allow caller to pick `lo` etc. |
| `priv.create_tap` | `ip link add tap NAME tuntap` and add to bridge, owned by the owner user. | Tap name must match `tap-fc-*` or `tap-pool-*`. |
| `priv.delete_tap` | `ip link del NAME`. | Same prefix check. |
| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | No-op if `resolvectl` is missing. Bridge name comes from owner config. |
| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same. |
| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap and IP come from VM record; helper does not run arbitrary iptables. |
| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | Bridge name passes the kernel iface-name rules (115 chars, no `/`/`:`/whitespace, not `.`/`..`). Resolver address must parse via `net.ParseIP`. |
| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same iface-name check. |
| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap must be banger-prefixed. Guest IP must parse as IPv4. |
| `priv.create_dm_snapshot` | Create a `dmsetup` device-mapper snapshot from `rootfs.ext4` with COW backing file. | Both paths must be inside `/var/lib/banger`; DM name must start with `fc-rootfs-`. |
| `priv.cleanup_dm_snapshot` | `dmsetup remove` for a snapshot the helper itself just created. | Acts on the typed `dmsnap.Handles` returned by create. |
| `priv.remove_dm_snapshot` | `dmsetup remove` by target name. | Name must start with `fc-rootfs-`. |
| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | Tolerates exit 1 (filesystem cleaned). |
| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Path is inside the image; image path is not validated against the state dir today (the helper trusts the daemon for image paths because images can sit anywhere the owner registers). |
| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same. |
| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Resolved path must be a regular file, executable, root-owned, not group/world-writable. |
| `priv.launch_firecracker` | Start the firecracker process for a VM. | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Binary must pass the same root-owned-executable check. |
| `priv.ensure_socket_access` | `chown` and `chmod 0660` on a firecracker API or vsock socket so the owner user can talk to it. | Helper does not chown arbitrary paths; this is invoked only after the helper itself just created the socket via firecracker. |
| `priv.find_firecracker_pid` / `priv.kill_process` / `priv.signal_process` / `priv.process_running` | Look up a firecracker PID by API socket path; signal or stat the resulting process. | Fixed-shape requests; path validation happens at launch time, and PID lookups are filtered to processes whose cmdline mentions the requested API socket. |
| `priv.cleanup_dm_snapshot` | `dmsetup remove` and `losetup -d` for a snapshot the helper itself just created. | Every non-empty `dmsnap.Handles` field is checked: DM name `fc-rootfs-*`, DM device `/dev/mapper/fc-rootfs-*`, loops `/dev/loopN`. |
| `priv.remove_dm_snapshot` | `dmsetup remove` by target. | Target must be either a `fc-rootfs-*` name or a `/dev/mapper/fc-rootfs-*` path. |
| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | DM device path must match `/dev/mapper/fc-rootfs-*`. Exit 1 (filesystem cleaned) is tolerated. |
| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Image path must be inside `/var/lib/banger` or a managed DM device. Guest path is rejected if it contains debugfs-hostile chars (`"`/`\`/newline). |
| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same image-path validator. |
| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Path is opened with `O_PATH \| O_NOFOLLOW` (refusing symlinks) and Fstat'd through the resulting fd: must be a regular file, executable, root-owned, not group/world-writable. |
| `priv.launch_firecracker` | Start the firecracker process for a VM (jailer-wrapped). | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel/initrd paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Jailer chroot base must be inside the system state/runtime dirs; jailer UID/GID must equal the registered owner. Binary must pass the same root-owned-executable check. |
| `priv.ensure_socket_access` | `chown` and `chmod 0600` on a firecracker API or vsock socket so the owner user can talk to it. | Path must be inside `/run/banger` and not a symlink. The helper opens it with `O_PATH \| O_NOFOLLOW`, refuses anything that isn't a unix socket, and chmod/chown via the resulting fd (no symlink-follow). The local-priv fallback uses `chown -h`. |
| `priv.cleanup_jailer_chroot` | Detach every mount under the per-VM jailer chroot via direct `umount2(MNT_DETACH \| UMOUNT_NOFOLLOW)` syscalls (deepest-first), then `rm -rf` the tree. | Path must be inside the system state/runtime dirs and not a symlink — including no symlinks at intermediate components (resolved with `EvalSymlinks` and re-checked). `UMOUNT_NOFOLLOW` makes the unmounts symlink-safe even if a path is swapped after validation. A `findmnt` guard refuses to `rm -rf` if any mount remains underneath. |
| `priv.find_firecracker_pid` | Resolve a firecracker PID by API socket path. | Filters to processes whose cmdline mentions the requested API socket. |
| `priv.kill_process` / `priv.signal_process` | Send SIGKILL or a named signal to a PID. | PID must refer to a running process whose `/proc/<pid>/cmdline` mentions `firecracker`. |
| `priv.process_running` | Check whether a PID is alive (no host mutation). | Read-only; same cmdline filter. |
Anything outside this list returns `unknown_method` and is logged. The
helper does not run a shell, does not exec helper scripts, and does
@ -186,6 +197,38 @@ What `uninstall` does NOT do automatically:
- It does not remove the owner user, the owner's home, or anything
the user wrote into a guest from inside the guest.
## Running outside the system install
Everything above describes the supported deployment: `banger system
install` lays down both systemd units and the helper takes over every
privileged operation.
It is also possible to run `bangerd` directly without installing the
helper — the binary still works as a per-user daemon and shells `sudo
-n` for each privileged operation it would otherwise hand off
(`iptables`, `ip`, `mount`, `mknod`, `dmsetup`, `e2fsck`, `kill`,
`chown -h`, `chmod`, `losetup`, `chown`, `chmod`, `firecracker`).
This mode is intended for ad-hoc developer machines while iterating on
banger itself.
It carries a different trust model:
- It needs `NOPASSWD` sudoers entries for the developer (otherwise
every VM action prompts for a password).
- Once those entries exist, **any** process running as the developer
can invoke those commands with arbitrary arguments — banger's input
validators only constrain what banger itself sends. They are no
defence against a different program on the same account.
- The helper's `SO_PEERCRED` boundary, the systemd hardening
(`NoNewPrivileges`, `ProtectSystem=strict`, the narrow
`CapabilityBoundingSet`), and the helper's own input validators are
all bypassed.
If you care about isolating banger's blast radius from anything else
running as your user, use the system install. If you only need
banger to work on your own dev box, the non-system mode is fine —
just don't run it on a shared or production host.
## Hardening of the systemd units
The two units ship with restrictive defaults; they are written by
@ -222,11 +265,16 @@ If you install banger as root, you are trusting:
1. The two binaries banger drops under `/usr/local/bin` and the
companion agent under `/usr/local/lib/banger`. These should match
the build artifacts you reviewed.
2. The path validators in
`internal/roothelper/roothelper.go:validateManagedPath`,
`validateTapName`, `validateDMName`, and `validateRootExecutable`
to be tight. If those are bypassed, the helper would carry out a
privileged op against an unmanaged path. They are unit-tested in
2. The path/identifier validators in
`internal/roothelper/roothelper.go` to be tight: `validateManagedPath`,
`validateTapName`, `validateDMName`, `validateDMDevicePath`,
`validateLoopDevicePath`, `validateDMRemoveTarget`,
`validateDMSnapshotHandles`, `validateRootExecutable`,
`validateNotSymlink`, `validateExt4ImagePath`,
`validateLinuxIfaceName`, `validateIPv4`, `validateResolverAddr`,
and `validateFirecrackerPID`. If any of these are bypassed, the
helper would carry out a privileged op against an unmanaged
target. They are unit-tested in
`internal/roothelper/roothelper_test.go`.
3. The Firecracker binary banger executes. The helper refuses to launch
anything that isn't a regular, executable, root-owned, not