banger/internal
Thales Maciel fba30f26d4
firecracker: chown API + vsock sockets inside the sudo shell
Bug: Firecracker creates its API and vsock sockets as root:root 0700
(enforced by the intentional umask 077 in buildProcessRunner). The
daemon, running as the invoking user, then can't connect(2) to
either — AF_UNIX connect needs write permission on the socket file
and 0700 root-owned leaves thales without any.

firecracker-go-sdk's Machine.Start() blocks on waitForSocket, which
probes the socket with both os.Stat (succeeds — parent dir is the
user's XDG_RUNTIME_DIR) and an HTTP GET over the socket (fails —
EACCES on connect). The SDK loops for 3 seconds then fails with
"Firecracker did not create API socket ... context deadline exceeded".

The daemon's EnsureSocketAccess chown was meant to fix permissions,
but it runs *after* Machine.Start returns — and Start never returns
because it's still looping on the SDK's probe. Chicken-and-egg.

Fix: inside the sudo'd shell that launches firecracker, spawn a
background subshell that polls for each expected socket (API + vsock,
when configured) and chowns it to $SUDO_UID:$SUDO_GID as soon as it
appears. The background polling is bounded at 1s (20 × 50ms) so a
broken firecracker invocation doesn't leak a waiting shell.

Post-fix: socket appears root-owned 0600 briefly, is chowned to the
invoking user within ~50ms, SDK's HTTP probe succeeds, Machine.Start
returns normally. EnsureSocketAccess's later chmod 600 remains the
belt-and-braces guarantee on final mode.

Verified: manual repro of the shell script produces a socket owned
by thales:thales that a non-root python socket.connect() accepts.
Without the fix the same setup gives "PermissionError: [Errno 13]
Permission denied".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 16:09:02 -03:00
..
api vm run: ship tracked files only by default; add --include-untracked + --dry-run 2026-04-21 19:53:17 -03:00
buildinfo Stamp shared build metadata into banger binaries 2026-03-22 17:14:06 -03:00
cli noteUntrackedSkipped: fix subdir underreport + be best-effort everywhere 2026-04-22 12:42:33 -03:00
config config: harden resolveSSHKeyPath against relative paths + drop stray test keys 2026-04-22 14:31:11 -03:00
daemon docs: resync package docs, AGENTS, and kernel-catalog with current code 2026-04-22 13:01:11 -03:00
firecracker firecracker: chown API + vsock sockets inside the sudo shell 2026-04-22 16:09:02 -03:00
guest ssh: trust-on-first-use host key pinning everywhere 2026-04-19 16:46:03 -03:00
guestconfig Refactor VM lifecycle around capabilities 2026-03-18 19:28:26 -03:00
guestnet Stop using kernel IP autoconfig for runtime VMs 2026-03-21 21:54:18 -03:00
hostnat coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers 2026-04-18 18:03:37 -03:00
imagecat publish-golden-image: content-addressed tarball names 2026-04-18 15:26:57 -03:00
imagepull imagepull/BuildExt4: omit positional fs-size; rely on file truncation 2026-04-18 14:58:42 -03:00
kernelcat Prune legacy void/alpine + customize.sh flows 2026-04-18 15:39:53 -03:00
model config + store: remove dead knobs and stale schema 2026-04-22 10:54:01 -03:00
namegen coverage: make targets + close zero-cov gaps (namegen, sessionstream) 2026-04-18 17:44:37 -03:00
paths runtime sockets: close the local-user race window around control-plane creation 2026-04-20 12:53:47 -03:00
policy Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00
rpc Propagate RPC cancellation to daemon requests 2026-03-16 18:28:33 -03:00
store doctor: open the state DB read-only so inspection never mutates it 2026-04-22 11:05:23 -03:00
system coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
toolingplan coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vmdns coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan 2026-04-18 17:57:05 -03:00
vsockagent Add vsock-backed VM port inspection 2026-03-19 15:52:11 -03:00