firecracker: chown API + vsock sockets inside the sudo shell

Bug: Firecracker creates its API and vsock sockets as root:root 0700
(enforced by the intentional umask 077 in buildProcessRunner). The
daemon, running as the invoking user, then can't connect(2) to
either — AF_UNIX connect needs write permission on the socket file
and 0700 root-owned leaves thales without any.

firecracker-go-sdk's Machine.Start() blocks on waitForSocket, which
probes the socket with both os.Stat (succeeds — parent dir is the
user's XDG_RUNTIME_DIR) and an HTTP GET over the socket (fails —
EACCES on connect). The SDK loops for 3 seconds then fails with
"Firecracker did not create API socket ... context deadline exceeded".

The daemon's EnsureSocketAccess chown was meant to fix permissions,
but it runs *after* Machine.Start returns — and Start never returns
because it's still looping on the SDK's probe. Chicken-and-egg.

Fix: inside the sudo'd shell that launches firecracker, spawn a
background subshell that polls for each expected socket (API + vsock,
when configured) and chowns it to $SUDO_UID:$SUDO_GID as soon as it
appears. The background polling is bounded at 1s (20 × 50ms) so a
broken firecracker invocation doesn't leak a waiting shell.

Post-fix: socket appears root-owned 0600 briefly, is chowned to the
invoking user within ~50ms, SDK's HTTP probe succeeds, Machine.Start
returns normally. EnsureSocketAccess's later chmod 600 remains the
belt-and-braces guarantee on final mode.

Verified: manual repro of the shell script produces a socket owned
by thales:thales that a non-root python socket.connect() accepts.
Without the fix the same setup gives "PermissionError: [Errno 13]
Permission denied".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Thales Maciel 2026-04-22 16:09:02 -03:00
parent 60f90eb8be
commit fba30f26d4
No known key found for this signature in database
GPG key ID: 33112E6833C34679
2 changed files with 74 additions and 14 deletions

View file

@ -184,17 +184,47 @@ func defaultDriveID(drive DriveConfig, fallback string) string {
}
func buildProcessRunner(cfg MachineConfig, logFile *os.File) *exec.Cmd {
// umask 077 so the API + vsock sockets firecracker creates are
// mode 0600 from birth (owned by root since we invoke via sudo).
// A follow-up chown in fcproc.EnsureSocketAccess transfers
// ownership to the invoking user. Without this, the sockets
// would briefly exist world-readable/writable between firecracker
// creating them and the daemon tightening the mode — a real
// window for a local attacker to hit the control plane.
script := "umask 077 && exec " + shellQuote(cfg.BinaryPath) +
// Two moving parts, run inside a single sudo'd shell:
//
// 1. umask 077 + exec firecracker → the API and vsock sockets
// firecracker creates are born 0600 owned by root (sudo user),
// not 0755. Without the umask there's a real window where a
// local attacker could hit the control plane.
//
// 2. A background subshell polls for each expected socket and
// chowns it to $SUDO_UID:$SUDO_GID as soon as it appears.
//
// The chown is required *before* the firecracker-go-sdk's
// waitForSocket returns from Machine.Start — the SDK does both an
// os.Stat and an HTTP GET over the socket, and AF_UNIX connect(2)
// needs write permission on the socket file. With the socket at
// 0600 root:root, the daemon process (running as the invoking
// user) gets EACCES on connect and the SDK loops until its 3s
// timeout. The daemon's post-Start EnsureSocketAccess chown would
// fix it, but Start never returns to hand control back.
//
// Racing the chown inside sudo's shell closes the gap: by the
// time the SDK's HTTP probe fires, the socket is already owned by
// the invoking user.
chownWatcher := func(path string) string {
// Bounded poll: 20 × 50ms = 1s. Matches the SDK's 3s wait
// budget with headroom and bails quietly if firecracker
// never creates the socket (e.g. bad args — the error
// surfaces through firecracker's non-zero exit).
return `for _ in $(seq 1 20); do [ -S ` + shellQuote(path) + ` ] && break; sleep 0.05; done; ` +
`[ -S ` + shellQuote(path) + ` ] && chown "$SUDO_UID:$SUDO_GID" ` + shellQuote(path) + ` || true`
}
watchers := chownWatcher(cfg.SocketPath)
if strings.TrimSpace(cfg.VSockPath) != "" {
watchers += "; " + chownWatcher(cfg.VSockPath)
}
script := "umask 077 && (" + watchers + ") & exec " + shellQuote(cfg.BinaryPath) +
" --api-sock " + shellQuote(cfg.SocketPath) +
" --id " + shellQuote(cfg.VMID)
cmd := exec.Command("sudo", "-n", "sh", "-c", script)
// sudo -E preserves SUDO_UID / SUDO_GID (sudo sets them itself
// regardless, but -E is already the convention in this codebase
// and the background subshell needs them).
cmd := exec.Command("sudo", "-n", "-E", "sh", "-c", script)
cmd.Stdin = nil
if logFile != nil {
cmd.Stdout = logFile