roothelper: tighten input validation across privileged RPCs
Defence-in-depth pass over every helper method that touches the host
as root. Each fix narrows what a compromised owner-uid daemon could
ask the helper to do; many close concrete file-ownership and DoS
primitives that the previous validators didn't reach.
Path / identifier validation:
* priv.fsck_snapshot now requires /dev/mapper/fc-rootfs-* (was
"is the string non-empty"). e2fsck -fy on /dev/sda1 was the
motivating exploit.
* priv.kill_process and priv.signal_process now read
/proc/<pid>/cmdline and require a "firecracker" substring before
sending the signal. Killing arbitrary host PIDs (sshd, init, …)
is no longer a one-RPC primitive.
* priv.read_ext4_file and priv.write_ext4_files now require the
image path to live under StateDir or be /dev/mapper/fc-rootfs-*.
* priv.cleanup_dm_snapshot validates every non-empty Handles field:
DM name fc-rootfs-*, DM device /dev/mapper/fc-rootfs-*, loops
/dev/loopN.
* priv.remove_dm_snapshot accepts only fc-rootfs-* names or
/dev/mapper/fc-rootfs-* paths.
* priv.ensure_nat now requires a parsable IPv4 address and a
banger-prefixed tap.
* priv.sync_resolver_routing and priv.clear_resolver_routing now
require a Linux iface-name-shaped bridge name (1–15 chars, no
whitespace/'/'/':') and, for sync, a parsable resolver address.
Symlink defence:
* priv.ensure_socket_access now validates the socket path is under
RuntimeDir and not a symlink. The fcproc layer's chown/chmod
moves to unix.Open(O_PATH|O_NOFOLLOW) + Fchownat(AT_EMPTY_PATH)
+ Fchmodat via /proc/self/fd, so even a swap of the leaf into a
symlink between validation and the syscall is refused. The
local-priv (non-root) fallback uses `chown -h`.
* priv.cleanup_jailer_chroot rejects symlinks at both the leaf
(os.Lstat) and intermediate path components (filepath.EvalSymlinks
+ clean-equality). The umount sweep was rewritten from shell
`umount --recursive --lazy` to direct unix.Unmount(MNT_DETACH |
UMOUNT_NOFOLLOW) per child mount, deepest-first; the findmnt
guard remains as the rm-rf safety net. Local-priv mode falls
back to `sudo umount --lazy`.
Binary validation:
* validateRootExecutable now opens with O_PATH|O_NOFOLLOW and
Fstats through the resulting fd. Rejects path-level symlinks and
narrows the TOCTOU window between validation and the SDK's exec
to fork+exec time on a healthy host.
Daemon socket:
* The owner daemon now reads SO_PEERCRED on every accepted
connection and refuses any UID that isn't 0 or the registered
owner. Filesystem perms (0600 + ownerUID) already enforced this;
the check is belt-and-braces in case the socket FD is ever
leaked to a non-owner process.
Docs:
* docs/privileges.md walked end-to-end. Each helper RPC's
Validation gate row reflects what the code actually enforces.
New section "Running outside the system install" calls out the
looser dev-mode trust model (NOPASSWD sudoers, helper hardening
bypassed) so users don't deploy that path on shared hosts.
Trust list updated to include every new validator.
Tests added: validators (DM-loop, DM-remove-target, DM-handles,
ext4-image-path, iface-name, IPv4, resolver-addr, not-symlink,
firecracker-PID, root-executable variants), the daemon's authorize
path (non-unix conn rejection + unix conn happy path), the umount2
ordering contract (deepest-first + --lazy on the sudo branch), and
positive/negative cases for the chown-no-follow fallback.
Verified end-to-end via `make smoke JOBS=4` on a KVM host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
6b543cb17f
commit
853249dec2
8 changed files with 1177 additions and 63 deletions
|
|
@ -11,8 +11,8 @@ their eyes open.
|
|||
|
||||
| Unit | User | Socket | Purpose |
|
||||
|---|---|---|---|
|
||||
| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0700, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. |
|
||||
| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, root) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. |
|
||||
| `bangerd.service` | owner user (chosen at install) | `/run/banger/bangerd.sock` (0600, owner) | Orchestration: VM/image lifecycle, store, RPC to the CLI. |
|
||||
| `bangerd-root.service` | `root` | `/run/banger-root/bangerd-root.sock` (0600, owner; root-owned dir at 0711) | Narrow root helper: bridge/tap, DM snapshots, NAT, Firecracker launch. |
|
||||
|
||||
The owner daemon does all the business logic. It never runs as root.
|
||||
The root helper runs as root but only accepts a fixed list of operations
|
||||
|
|
@ -37,7 +37,8 @@ specific shape.
|
|||
The root helper:
|
||||
|
||||
- Listens on a Unix socket at `/run/banger-root/bangerd-root.sock`,
|
||||
mode 0600, owned by root, in a runtime dir at 0711 root.
|
||||
mode 0600, owned by the registered owner UID, in a root-owned
|
||||
runtime dir at 0711.
|
||||
- Reads `SO_PEERCRED` on every accepted connection and rejects any
|
||||
caller whose UID is not 0 or the owner UID recorded in
|
||||
`/etc/banger/install.toml`. The match is by UID, not username.
|
||||
|
|
@ -46,8 +47,13 @@ The root helper:
|
|||
|
||||
The owner daemon:
|
||||
|
||||
- Listens on `/run/banger/bangerd.sock`, mode 0700, owned by the
|
||||
- Listens on `/run/banger/bangerd.sock`, mode 0600, owned by the
|
||||
install-time owner user. Other host users cannot connect.
|
||||
- Reads `SO_PEERCRED` on every accepted connection and rejects any
|
||||
caller whose UID is not 0 or the install-time owner UID. The
|
||||
filesystem perms already gate access; the peer-cred read is
|
||||
belt-and-braces in case the socket FD is ever leaked to a
|
||||
non-owner process.
|
||||
- Resolves the helper socket path from the install metadata and
|
||||
retries with backoff if the helper hasn't started yet.
|
||||
|
||||
|
|
@ -56,29 +62,34 @@ socket on the local host.
|
|||
|
||||
## What the root helper will do, exactly
|
||||
|
||||
The helper exposes 17 RPC methods. Each is shaped so the owner daemon
|
||||
can name a banger-managed object but cannot pass an arbitrary host
|
||||
path or interface name. Code lives in
|
||||
`internal/roothelper/roothelper.go`.
|
||||
The helper exposes a fixed list of RPC methods (see
|
||||
`internal/roothelper/roothelper.go` for the canonical set). Each is
|
||||
shaped so the owner daemon can name a banger-managed object but
|
||||
cannot pass an arbitrary host path or interface name. Every input
|
||||
that names a path, device, PID, or interface is checked against a
|
||||
validator before the helper touches the host.
|
||||
|
||||
| Method | Effect | Validation gate |
|
||||
|---|---|---|
|
||||
| `priv.ensure_bridge` | Create the configured Linux bridge if missing; assign the bridge IP. | Bridge name and IP come from owner config; helper does not allow caller to pick `lo` etc. |
|
||||
| `priv.create_tap` | `ip link add tap NAME tuntap` and add to bridge, owned by the owner user. | Tap name must match `tap-fc-*` or `tap-pool-*`. |
|
||||
| `priv.delete_tap` | `ip link del NAME`. | Same prefix check. |
|
||||
| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | No-op if `resolvectl` is missing. Bridge name comes from owner config. |
|
||||
| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same. |
|
||||
| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap and IP come from VM record; helper does not run arbitrary iptables. |
|
||||
| `priv.sync_resolver_routing` | `resolvectl dns/domain/default-route` on the configured bridge. | Bridge name passes the kernel iface-name rules (1–15 chars, no `/`/`:`/whitespace, not `.`/`..`). Resolver address must parse via `net.ParseIP`. |
|
||||
| `priv.clear_resolver_routing` | `resolvectl revert` on the bridge. | Same iface-name check. |
|
||||
| `priv.ensure_nat` | `iptables -t nat MASQUERADE` for `(guest_ip, tap)` plus matching FORWARD rules; `enable=false` removes them. | Tap must be banger-prefixed. Guest IP must parse as IPv4. |
|
||||
| `priv.create_dm_snapshot` | Create a `dmsetup` device-mapper snapshot from `rootfs.ext4` with COW backing file. | Both paths must be inside `/var/lib/banger`; DM name must start with `fc-rootfs-`. |
|
||||
| `priv.cleanup_dm_snapshot` | `dmsetup remove` for a snapshot the helper itself just created. | Acts on the typed `dmsnap.Handles` returned by create. |
|
||||
| `priv.remove_dm_snapshot` | `dmsetup remove` by target name. | Name must start with `fc-rootfs-`. |
|
||||
| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | Tolerates exit 1 (filesystem cleaned). |
|
||||
| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Path is inside the image; image path is not validated against the state dir today (the helper trusts the daemon for image paths because images can sit anywhere the owner registers). |
|
||||
| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same. |
|
||||
| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Resolved path must be a regular file, executable, root-owned, not group/world-writable. |
|
||||
| `priv.launch_firecracker` | Start the firecracker process for a VM. | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Binary must pass the same root-owned-executable check. |
|
||||
| `priv.ensure_socket_access` | `chown` and `chmod 0660` on a firecracker API or vsock socket so the owner user can talk to it. | Helper does not chown arbitrary paths; this is invoked only after the helper itself just created the socket via firecracker. |
|
||||
| `priv.find_firecracker_pid` / `priv.kill_process` / `priv.signal_process` / `priv.process_running` | Look up a firecracker PID by API socket path; signal or stat the resulting process. | Fixed-shape requests; path validation happens at launch time, and PID lookups are filtered to processes whose cmdline mentions the requested API socket. |
|
||||
| `priv.cleanup_dm_snapshot` | `dmsetup remove` and `losetup -d` for a snapshot the helper itself just created. | Every non-empty `dmsnap.Handles` field is checked: DM name `fc-rootfs-*`, DM device `/dev/mapper/fc-rootfs-*`, loops `/dev/loopN`. |
|
||||
| `priv.remove_dm_snapshot` | `dmsetup remove` by target. | Target must be either a `fc-rootfs-*` name or a `/dev/mapper/fc-rootfs-*` path. |
|
||||
| `priv.fsck_snapshot` | `e2fsck -fy` against the DM device. | DM device path must match `/dev/mapper/fc-rootfs-*`. Exit 1 (filesystem cleaned) is tolerated. |
|
||||
| `priv.read_ext4_file` | Read a file from inside an ext4 image via `debugfs cat`. | Image path must be inside `/var/lib/banger` or a managed DM device. Guest path is rejected if it contains debugfs-hostile chars (`"`/`\`/newline). |
|
||||
| `priv.write_ext4_files` | Batch write files into an ext4 image, root:root, mode-controlled. | Same image-path validator. |
|
||||
| `priv.resolve_firecracker_binary` | Stat and return the firecracker binary path. | Path is opened with `O_PATH \| O_NOFOLLOW` (refusing symlinks) and Fstat'd through the resulting fd: must be a regular file, executable, root-owned, not group/world-writable. |
|
||||
| `priv.launch_firecracker` | Start the firecracker process for a VM (jailer-wrapped). | Socket and vsock paths must be inside `/run/banger`. Log/metrics/kernel/initrd paths must be inside `/var/lib/banger`. Tap name must be banger-prefixed. Drives must be inside the state dir or be a `/dev/mapper/fc-rootfs-*` device. Jailer chroot base must be inside the system state/runtime dirs; jailer UID/GID must equal the registered owner. Binary must pass the same root-owned-executable check. |
|
||||
| `priv.ensure_socket_access` | `chown` and `chmod 0600` on a firecracker API or vsock socket so the owner user can talk to it. | Path must be inside `/run/banger` and not a symlink. The helper opens it with `O_PATH \| O_NOFOLLOW`, refuses anything that isn't a unix socket, and chmod/chown via the resulting fd (no symlink-follow). The local-priv fallback uses `chown -h`. |
|
||||
| `priv.cleanup_jailer_chroot` | Detach every mount under the per-VM jailer chroot via direct `umount2(MNT_DETACH \| UMOUNT_NOFOLLOW)` syscalls (deepest-first), then `rm -rf` the tree. | Path must be inside the system state/runtime dirs and not a symlink — including no symlinks at intermediate components (resolved with `EvalSymlinks` and re-checked). `UMOUNT_NOFOLLOW` makes the unmounts symlink-safe even if a path is swapped after validation. A `findmnt` guard refuses to `rm -rf` if any mount remains underneath. |
|
||||
| `priv.find_firecracker_pid` | Resolve a firecracker PID by API socket path. | Filters to processes whose cmdline mentions the requested API socket. |
|
||||
| `priv.kill_process` / `priv.signal_process` | Send SIGKILL or a named signal to a PID. | PID must refer to a running process whose `/proc/<pid>/cmdline` mentions `firecracker`. |
|
||||
| `priv.process_running` | Check whether a PID is alive (no host mutation). | Read-only; same cmdline filter. |
|
||||
|
||||
Anything outside this list returns `unknown_method` and is logged. The
|
||||
helper does not run a shell, does not exec helper scripts, and does
|
||||
|
|
@ -186,6 +197,38 @@ What `uninstall` does NOT do automatically:
|
|||
- It does not remove the owner user, the owner's home, or anything
|
||||
the user wrote into a guest from inside the guest.
|
||||
|
||||
## Running outside the system install
|
||||
|
||||
Everything above describes the supported deployment: `banger system
|
||||
install` lays down both systemd units and the helper takes over every
|
||||
privileged operation.
|
||||
|
||||
It is also possible to run `bangerd` directly without installing the
|
||||
helper — the binary still works as a per-user daemon and shells `sudo
|
||||
-n` for each privileged operation it would otherwise hand off
|
||||
(`iptables`, `ip`, `mount`, `mknod`, `dmsetup`, `e2fsck`, `kill`,
|
||||
`chown -h`, `chmod`, `losetup`, `chown`, `chmod`, `firecracker`).
|
||||
This mode is intended for ad-hoc developer machines while iterating on
|
||||
banger itself.
|
||||
|
||||
It carries a different trust model:
|
||||
|
||||
- It needs `NOPASSWD` sudoers entries for the developer (otherwise
|
||||
every VM action prompts for a password).
|
||||
- Once those entries exist, **any** process running as the developer
|
||||
can invoke those commands with arbitrary arguments — banger's input
|
||||
validators only constrain what banger itself sends. They are no
|
||||
defence against a different program on the same account.
|
||||
- The helper's `SO_PEERCRED` boundary, the systemd hardening
|
||||
(`NoNewPrivileges`, `ProtectSystem=strict`, the narrow
|
||||
`CapabilityBoundingSet`), and the helper's own input validators are
|
||||
all bypassed.
|
||||
|
||||
If you care about isolating banger's blast radius from anything else
|
||||
running as your user, use the system install. If you only need
|
||||
banger to work on your own dev box, the non-system mode is fine —
|
||||
just don't run it on a shared or production host.
|
||||
|
||||
## Hardening of the systemd units
|
||||
|
||||
The two units ship with restrictive defaults; they are written by
|
||||
|
|
@ -222,11 +265,16 @@ If you install banger as root, you are trusting:
|
|||
1. The two binaries banger drops under `/usr/local/bin` and the
|
||||
companion agent under `/usr/local/lib/banger`. These should match
|
||||
the build artifacts you reviewed.
|
||||
2. The path validators in
|
||||
`internal/roothelper/roothelper.go:validateManagedPath`,
|
||||
`validateTapName`, `validateDMName`, and `validateRootExecutable`
|
||||
to be tight. If those are bypassed, the helper would carry out a
|
||||
privileged op against an unmanaged path. They are unit-tested in
|
||||
2. The path/identifier validators in
|
||||
`internal/roothelper/roothelper.go` to be tight: `validateManagedPath`,
|
||||
`validateTapName`, `validateDMName`, `validateDMDevicePath`,
|
||||
`validateLoopDevicePath`, `validateDMRemoveTarget`,
|
||||
`validateDMSnapshotHandles`, `validateRootExecutable`,
|
||||
`validateNotSymlink`, `validateExt4ImagePath`,
|
||||
`validateLinuxIfaceName`, `validateIPv4`, `validateResolverAddr`,
|
||||
and `validateFirecrackerPID`. If any of these are bypassed, the
|
||||
helper would carry out a privileged op against an unmanaged
|
||||
target. They are unit-tested in
|
||||
`internal/roothelper/roothelper_test.go`.
|
||||
3. The Firecracker binary banger executes. The helper refuses to launch
|
||||
anything that isn't a regular, executable, root-owned, not
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue