* model.ParseSize / FormatSizeBytes: pinned with table tests in
internal/model/types_test.go (TestParseSize 22 cases,
TestFormatSizeBytes 11 cases, TestParseSizeFormatRoundTrip 7
boundaries). Fixed the long-suffix regression: "4GiB", "512MiB",
"4KiB" now parse correctly (parser strips trailing IB before
inspecting the unit byte). Pinned current behaviour for
no-suffix input ("1024" treated as MiB) and FormatSizeBytes(0).
commands_image.go --size flag-help updated to show 4GiB now
that the parser accepts it.
* vm ports --json: matches the JSON-vs-table inconsistency between
vm stats (always JSON) and vm ports (always table). --json on
vm ports flips to the same printJSON path as vm stats. Default
table output unchanged. Other vm subcommands (show, stats,
logs, health, ping) didn't fit the identical pattern; left
alone.
* docs/oci-import.md architecture section moved to a new
docs/oci-import-internals.md (precedent: internal/daemon/
ARCHITECTURE.md). User-facing oci-import.md keeps a one-line
pointer for advanced reading.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in commit 003b048 from the agent-2 worktree:
- CLI help text + completers (image pull / kernel pull / vm stats /
vm set / --from flag).
- README golden-image definition + Requirements block above
Quick Start.
- Code hygiene: drop emptyDash; consolidate formatBytes into
humanSize.
- Logger downgrades for per-RPC INFO chatter.
A pre-release audit collected ~12 trivial-effort UX and code-hygiene
items. Rolling them up here so the v0.1.0 commit log isn't littered
with one-line tweaks.
CLI help / completion:
* commands_image.go: drop dangling reference to a `banger image
catalog` subcommand that doesn't exist; replace with a pointer
to `banger image list`.
* commands_image.go: --size flag example was "4GiB" but the parser
rejects that suffix. Change example to "4G". (Parser-side fix
is in a separate concern.)
* commands_image.go + completion.go: image pull now wires a
catalog completer (falls back to local image names since there's
no image-catalog RPC yet); image show / delete / promote already
completed local names.
* commands_kernel.go + completion.go: kernel pull now wires a new
completeKernelCatalogNameOnlyAtPos0 backed by the kernel.catalog
RPC, so tab-complete suggests pullable kernels.
* commands_vm.go: vm stats and vm set now have Long + Example
blocks (peers all do); --from flag description updated to spell
out the relationship to --branch.
README:
* Define "golden image" inline at first use.
* Add a one-line Requirements block above Quick Start so users
hit the firecracker / KVM dependency before `make build`.
Code hygiene:
* dashIfEmpty / emptyDash were the same function. Deleted
emptyDash, retargeted three call sites.
* formatBytes (introduced today in image cache prune) duplicated
humanSize. Consolidated to humanSize, now with a space ("1.2
GiB" not "1.2GiB"). formatters_test.go expectations updated.
Logging chattiness:
* "operation started" (logger.go), "daemon request canceled"
(daemon.go), and "helper rpc completed" (roothelper.go) all
fired at INFO per RPC. Downgraded to DEBUG so routine shell
completions don't spam syslog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A pre-release audit caught three places where the docs misrepresent
the trust model. Each is a claim users would read while auditing
banger and reach the wrong conclusion.
* docs/privileges.md:140, 194 — bridge default was documented as
"banger0" but the code default (model.DefaultBridgeName) is
"br-fc". A user following the manual-removal recipe would `ip
link del banger0` against a non-existent interface.
* docs/privileges.md:192 — uninstall recipe said "stop your VMs
first via `banger vm stop --all`". That flag doesn't exist; vm
stop is a per-name action. Replaced with the actual options:
`banger vm prune` (bulk) or per-VM `banger vm stop <name>`.
* docs/privileges.md:255 and README.md:78-79 — helper unit's
CapabilityBoundingSet was listed as 5 caps; the actual set in
commands_system.go:370 is 11 (we added FOWNER/KILL/MKNOD/SETGID/
SETUID/SYS_CHROOT during Phase B and never updated the docs).
Updated both lists; the "what's NOT included" rationale stays
accurate against the new positive list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OCI layer blobs accumulate forever — every pull writes layers to
~/.cache/banger/oci/blobs/sha256/<hex> via go-containerregistry's
filesystem cache, and nothing ever evicts them. The cache is purely
a re-pull-avoidance (every flattened image is independent of the
blobs that sourced it), so it's a perfect candidate for an opt-in
operator-driven prune.
New surface:
* api: ImageCachePruneParams{DryRun}, ImageCachePruneResult
{BytesFreed, BlobsFreed, DryRun, CacheDir}.
* daemon: ImageService.PruneOCICache walks layout.OCICacheDir for
a (bytes, blobs) tally, then — outside dry-run — atomically
renames the cache aside, recreates it empty, and rm -rf's the
aside dir. The rename-then-rm avoids leaving the cache in a
half-removed state if a pull starts mid-prune (the in-flight
pull's open files survive the rename via standard Linux
semantics; it just sees a fresh empty cache afterwards). Missing
cache dir is treated as zero — fresh installs that have never
pulled an OCI image don't error.
* dispatch: image.cache.prune RPC (paramHandler-wrapped, mirroring
every other image RPC). Documented-methods test list updated.
* cli: `banger image cache` group with a `prune` subcommand
(--dry-run flag). Output is a single line: "freed 1.2 GiB
across 47 blob(s) in /var/cache/banger/oci" or "would free …".
formatBytes helper for the size pretty-print.
docs/oci-import.md: replaced the "Tech debt: cache eviction" bullet
with a "Cache lifecycle" section describing the new command and
the in-flight-pull caveat.
Tests: PruneOCICache covers the happy path (real prune empties the
cache, recreates an empty dir, doesn't leak the .pruning- aside),
the dry-run path (returns size, leaves blobs intact), and the
fresh-install path (cache dir absent → zero result, no error).
Smoke at JOBS=4 still green; live exercise against an empty cache
on a system install prints the expected zero summary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
priv.ensure_bridge / priv.create_tap accepted the daemon's network
config triple (BridgeName, BridgeIP, CIDR) and forwarded it straight
to `ip link` / `ip addr` / `ip link set master`. Argv-style exec
ruled out shell injection, but the kernel happily honours those
commands against any iface a compromised owner-uid daemon names —
including eth0/docker0/lo. Concretely:
* priv.ensure_bridge could `ip link set <iface> up` against any
host interface and `ip addr add` arbitrary IP/CIDR to it.
* priv.create_tap could `ip link set <new-tap> master <iface>`,
bridging the per-VM tap into the host's primary LAN so the
guest sees host-local broadcast traffic.
* priv.sync_resolver_routing / priv.clear_resolver_routing only
enforced "name shaped like a Linux iface" — no banger constraint.
New validators (single chokepoint via validateNetworkConfig):
* validateBangerBridgeName: name must equal "br-fc" or start with
"br-fc-". Stops a compromised daemon from naming any host iface
in these RPCs. Users with a custom bridge keep the prefix.
* validateCIDRPrefix: numeric in [8, 32]. Wider prefixes would
silently widen the bridge subnet beyond what the daemon intends.
* validateNetworkConfig bundles bridge-name + validateIPv4 +
validateCIDRPrefix so every helper RPC that takes the triple
stays in lockstep.
Wired into methodEnsureBridge, methodCreateTap, and the resolver-
routing pair (replacing the older validateLinuxIfaceName-only check
with the stricter banger-bridge check).
docs/privileges.md updated: the helper-RPC table rows now spell out
the banger-managed bridge constraint, and the trust list includes
the new validators.
Tests: TestValidateBangerBridgeName (default + suffixed accepted,
host ifaces / wrong prefix / oversized rejected), TestValidate
CIDRPrefix (boundary + non-numeric + IPv6-style 64 rejected),
TestValidateNetworkConfig (happy path + each-field-bad cases).
Smoke at JOBS=4 still green — banger's defaults sail through the
new gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both Fetch flows previously streamed resp.Body straight into
zstd → tar → on-disk extractor with the SHA256 check tacked on at
the END. A bad mirror or an attacker that's compromised the catalog
host could ship a multi-gigabyte tarball, watch banger expand it to
disk, and only THEN see the helpful "sha256 mismatch" message —
having already filled the host filesystem.
Reorder the operations: stage the compressed tarball to a temp file
under the destination directory through an io.LimitReader (cap +1
bytes), hash on the way in, refuse to decompress if either the cap
trips or the SHA mismatches. Worst-case disk use is bounded by the
cap, not by the source.
Cap is exposed as a package var (MaxFetchedBundleBytes,
MaxFetchedKernelBytes) so callers can tune per-deployment and tests
can squeeze it down to provoke the rejection. Default 8 GiB —
generous enough for a 4 GiB rootfs (which compresses to ~1-2 GiB),
tight enough to make a "fill the host disk" attack expensive.
The temp file lives in the destination dir so extraction stays on
the same filesystem and we don't pay for cross-FS rename. defer
os.Remove cleans up; the existing per-package cleanup() handler
still removes any partial extraction on hash mismatch / extraction
failure.
Tests: each package gets a TestFetchRejectsOversizedTarballBefore
Extraction that sets the cap to 64 bytes, points Fetch at a multi-KB
tarball, and asserts (a) error mentions "cap", (b) destination dir
is left clean (no leaked rootfs / manifest / kernel tree). All
existing tests still pass — happy path, hash mismatch, missing
files, path traversal, HTTP error, etc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
validateFirecrackerPID was a substring check on /proc/<pid>/cmdline:
"contains 'firecracker'". Good enough to refuse init/sshd/the test
binary, but on a shared host where multiple users run firecracker
the helper would happily SIGKILL someone else's VM. The owner-UID
daemon could weaponise the helper as an arbitrary "kill any
firecracker on this box" primitive.
Replace the substring gate with two stronger acceptance modes:
* Cgroup match (the supported path): /proc/<pid>/cgroup contains
bangerd-root.service. systemd assigns every direct child of the
helper unit into that cgroup at fork; the kernel keeps it there
for the process's lifetime, so no daemon-UID code can forge it.
Other users' firecracker processes live in different cgroups
(user@<uid>.service, foreign service slices) and fail this
check. Also robust across helper restarts: KillMode=control-group
on the unit kills children when the service goes down, so an
"orphan banger firecracker in some other cgroup" is rare by
construction.
* --api-sock fallback: cmdline carries `--api-sock <path>` with
the path under banger's RuntimeDir. Covers the legacy direct
(no-jailer) launch path, and gives daemon reconcile a way to
clean up the rare orphan that lands outside the service cgroup
after a hard helper crash.
Tried /proc/<pid>/root first — pivot_root semantics make jailer'd
firecracker read its root as "/" from any namespace, so the symlink
is useless as a banger-managed fingerprint. Cgroup is the right
signal.
Also added a signal allowlist: priv.signal_process now rejects
anything outside {TERM, KILL, INT, HUP, QUIT, USR1, USR2, ABRT}
(case-insensitive, with or without SIG prefix). STOP/CONT, real-time
signals, and numeric forms are refused — the helper running as root
must not be a generic "send arbitrary signal to my pid" primitive.
priv.kill_process is unaffected (it always sends KILL).
Tests: validateSignalName covers allowlist + numeric/STOP/RTMIN
rejection; extractFirecrackerAPISock pins the three flag forms
(--api-sock VAL, --api-sock=VAL, -a VAL); pathIsUnder gets a small
table; existing TestValidateFirecrackerPID still rejects PID 0,
PID 1, and the test process itself. Doctor's non-system-mode test
gained a t.TempDir-backed install path so it stops being
environment-dependent on machines that happen to have
/etc/banger/install.toml.
Smoke at JOBS=4 still green — every banger-launched firecracker
sails through the cgroup match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
validateManagedPath was textual-only: filepath.Clean + dest-prefix
match. That stopped `..` escapes but not the symlink-bypass attack
that motivated this fix — a daemon-UID attacker can write into
StateDir/RuntimeDir (it's their UID), so they can plant
`<StateDir>/redirect -> /etc` and any helper RPC that then operates
on `<StateDir>/redirect/...` resolves through the symlink at the
kernel and lands at /etc/... on the host.
Concretely the leaks this closed:
* priv.create_dm_snapshot: rootfs/cow paths fed to losetup —
losetup follows the symlink and attaches a host block device.
* priv.launch_firecracker: kernel/initrd paths hard-linked into
the chroot via `ln -f` — link(2) on Linux follows source
symlinks, hard-linking host files into the jail.
* priv.read_ext4_file / priv.write_ext4_files: image paths fed
to debugfs / e2cp as root.
* validateLaunchDrivePath: drive paths mknod'd or hard-linked.
* validateJailerOpts: chroot base.
Fix: after the existing prefix match, walk every component below
the matched root with Lstat. Any existing symlink — leaf or
intermediate — fails the validator. ENOENT is tolerated because
several callers pass paths firecracker/the helper materialise
later (sockets, log files, kernel hard-link targets); whoever
materialises them goes through the same validation when the
helper-side primitive runs.
Subsumes most of validateNotSymlink's coverage but the explicit
call sites (methodEnsureSocketAccess, methodCleanupJailerChroot)
keep their belt-and-braces check — those paths must EXIST and
not be symlinks, which validateNotSymlink enforces strictly while
the broadened validateManagedPath tolerates ENOENT.
Race-free in practice: helper RPCs are short and the validator
fires on the same kernel state the next syscall sees. The helper
loop processes RPCs serially per-connection, and the validator
plus the syscall both run as root within microseconds of each
other.
Four new tests cover symlink leaf, symlink intermediate, missing
leaf (must pass), and the plain happy path. Smoke at JOBS=4 still
green — every legitimate daemon-supplied path passes the walk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
safeJoin previously did textual cleaning + dest-prefix check only.
That's enough to catch `../escape`, but not the symlink-ancestor
attack: a malicious OCI layer plants `etc -> /tmp/probe`, a later
layer writes/deletes/hardlinks against `etc/anything`, and the kernel
silently dereferences the symlink so the operation lands at
`/tmp/probe/anything` on the host.
The daemon runs flatten as the owner UID, so anywhere that UID can
write becomes a write target; anywhere it can delete (e.g. its own
home) becomes a delete target. Whiteouts and hardlinks make this
worse — a whiteout for `etc/.wh.victim` would `RemoveAll` the host
file `/tmp/probe/victim`, and a TypeLink would expose host files
inside the extracted rootfs.
safeJoin now Lstat-walks every intermediate component of the joined
path against the already-extracted tree, refusing if any ancestor is
a symlink. Walking is race-free against the extraction loop because
we process tar entries serially. Leaf components stay caller-owned
(TypeSymlink writes legitimately want a symlink leaf; TypeReg
RemoveAll's any prior leaf before opening; etc.).
Three new tests pin the protection: write through a symlinked
ancestor, whiteout through a symlinked ancestor, and hardlink target
through a symlinked ancestor — each must fail and leave the host
probe path untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both packages had zero tests before this change. The helpers in them
are pure (imagemgr) or scripted-runner-friendly (dmsnap), so they're
cheap to pin and worth catching regressions on.
imagemgr/paths_test.go:
* DebianBasePackages returns a defensive copy (mutating the result
can't poison subsequent calls — important because hashPackages
digests this list).
* BuildMetadataPackages stays in lockstep with DebianBasePackages.
* hashPackages is order-sensitive and includes a trailing newline
in its canonical join (regression guard for any future "sort the
list before hashing" temptation that would invalidate every
on-disk hash).
* StageOptionalArtifactPath returns "" for empty/whitespace input
and joins by name otherwise.
* WritePackagesMetadata writes <rootfs>.packages.sha256 with the
expected hash, no-ops on empty rootfs path or empty package list.
* DebianBasePackages contains the small critical-package floor
(ca-certificates, curl, git) so a future apt-list trim can't
silently drop them.
dmsnap/dmsnap_test.go:
* Create runs losetup base, losetup cow, blockdev getsz, dmsetup
create in that order, with a snapshot table referencing the loops
in (base, cow) order — a swap would corrupt every VM.
* Create's failure path unwinds with losetup -d on cow then base.
* Cleanup tears down dmsetup before losetup (otherwise dmsetup sees
EBUSY against vanished backing devices).
* Cleanup falls back to DMDev when DMName is empty.
* Cleanup tolerates "No such device" on losetup -d (idempotent
re-run after a partial cleanup).
* Cleanup surfaces non-missing losetup errors (the tolerance is
narrow on purpose).
* Remove returns nil on a missing target and surfaces non-retryable
errors immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README previously punted on the config schema with a "full key list in
internal/config/config.go" pointer. New docs/config.md walks every
TOML key the daemon reads — top-level, [vm_defaults], [[file_sync]] —
with type, default, and a one-sentence description per row, plus a
copy-pasteable example at the bottom.
Sourced 1:1 from internal/config/config.go's fileConfig (and the
defaults in load() + internal/model/types.go), so it stays accurate
as long as those structs are the schema source of truth.
README's existing config section now points at docs/config.md, and
the "Further reading" list gets it as the first bullet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The filesystem-mutations table referred to `~/.config/banger/banger.toml`,
but the daemon reads `~/.config/banger/config.toml` (per
internal/config/config.go and README.md). Bring privileges.md in line.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`.githooks/pre-commit` runs `make lint test build` on every commit,
catching unformatted Go (`gofmt -l`), `go vet` regressions, shellcheck
errors on scripts/, broken unit tests, and broken builds before they
reach the index. Activate per-clone with `make install-hooks`, which
points `core.hooksPath` at `.githooks/`. Bypass for in-flight WIP
commits with `git commit --no-verify`.
The hook directory is tracked in git (unlike .git/hooks/) so a clone
+ `make install-hooks` is enough to opt in; no per-machine
hand-installation. .PHONY and the help line both list the new
target.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stats and Workspace fields landed in 6b543cb with column alignment
that gofmt wants to pull tighter; rerun gofmt so the new pre-commit
hook's `gofmt -l` gate passes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`docs/privileges.md` now documents what the install promises (helper +
daemon services active, sockets at 0600 ownerUID, units carrying the
hardening directives, firecracker root-owned + non-writable). Doctor
verifies the running install matches: drift between the doc and the
filesystem would silently weaken the trust model otherwise.
In system mode (install.toml present):
* helper service / owner daemon service: `systemctl is-active`.
* helper socket / daemon socket: stat-and-compare mode + uid against
the registered owner.
* helper unit hardening / daemon unit hardening: scan the rendered
unit for NoNewPrivileges, ProtectSystem=strict, ProtectHome
(=yes for the helper, =read-only for the daemon), RestrictSUIDSGID,
LockPersonality, and the helper's CapabilityBoundingSet line. The
daemon unit also pins User=<registered owner>.
* firecracker binary ownership: regular file, not a symlink, mode
not group/world writable, executable, owned by uid 0 — same
constraints validateRootExecutable enforces at launch, surfaced
once at doctor time so a misconfigured binary fails fast with a
clearer error than the helper's open-time rejection.
In non-system mode (no /etc/banger/install.toml) doctor emits a single
WARN row pointing at docs/privileges.md > 'Running outside the system
install'. A PASS would imply guarantees the install isn't actually
providing.
Tests cover both branches: the non-system warn pins its message
substrings; system-mode pins that every check name shows up; and the
helpers (socket-perms, unit-hardening, executable-ownership) have
direct table-style negative tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defence-in-depth pass over every helper method that touches the host
as root. Each fix narrows what a compromised owner-uid daemon could
ask the helper to do; many close concrete file-ownership and DoS
primitives that the previous validators didn't reach.
Path / identifier validation:
* priv.fsck_snapshot now requires /dev/mapper/fc-rootfs-* (was
"is the string non-empty"). e2fsck -fy on /dev/sda1 was the
motivating exploit.
* priv.kill_process and priv.signal_process now read
/proc/<pid>/cmdline and require a "firecracker" substring before
sending the signal. Killing arbitrary host PIDs (sshd, init, …)
is no longer a one-RPC primitive.
* priv.read_ext4_file and priv.write_ext4_files now require the
image path to live under StateDir or be /dev/mapper/fc-rootfs-*.
* priv.cleanup_dm_snapshot validates every non-empty Handles field:
DM name fc-rootfs-*, DM device /dev/mapper/fc-rootfs-*, loops
/dev/loopN.
* priv.remove_dm_snapshot accepts only fc-rootfs-* names or
/dev/mapper/fc-rootfs-* paths.
* priv.ensure_nat now requires a parsable IPv4 address and a
banger-prefixed tap.
* priv.sync_resolver_routing and priv.clear_resolver_routing now
require a Linux iface-name-shaped bridge name (1–15 chars, no
whitespace/'/'/':') and, for sync, a parsable resolver address.
Symlink defence:
* priv.ensure_socket_access now validates the socket path is under
RuntimeDir and not a symlink. The fcproc layer's chown/chmod
moves to unix.Open(O_PATH|O_NOFOLLOW) + Fchownat(AT_EMPTY_PATH)
+ Fchmodat via /proc/self/fd, so even a swap of the leaf into a
symlink between validation and the syscall is refused. The
local-priv (non-root) fallback uses `chown -h`.
* priv.cleanup_jailer_chroot rejects symlinks at both the leaf
(os.Lstat) and intermediate path components (filepath.EvalSymlinks
+ clean-equality). The umount sweep was rewritten from shell
`umount --recursive --lazy` to direct unix.Unmount(MNT_DETACH |
UMOUNT_NOFOLLOW) per child mount, deepest-first; the findmnt
guard remains as the rm-rf safety net. Local-priv mode falls
back to `sudo umount --lazy`.
Binary validation:
* validateRootExecutable now opens with O_PATH|O_NOFOLLOW and
Fstats through the resulting fd. Rejects path-level symlinks and
narrows the TOCTOU window between validation and the SDK's exec
to fork+exec time on a healthy host.
Daemon socket:
* The owner daemon now reads SO_PEERCRED on every accepted
connection and refuses any UID that isn't 0 or the registered
owner. Filesystem perms (0600 + ownerUID) already enforced this;
the check is belt-and-braces in case the socket FD is ever
leaked to a non-owner process.
Docs:
* docs/privileges.md walked end-to-end. Each helper RPC's
Validation gate row reflects what the code actually enforces.
New section "Running outside the system install" calls out the
looser dev-mode trust model (NOPASSWD sudoers, helper hardening
bypassed) so users don't deploy that path on shared hosts.
Trust list updated to include every new validator.
Tests added: validators (DM-loop, DM-remove-target, DM-handles,
ext4-image-path, iface-name, IPv4, resolver-addr, not-symlink,
firecracker-PID, root-executable variants), the daemon's authorize
path (non-unix conn rejection + unix conn happy path), the umount2
ordering contract (deepest-first + --lazy on the sudo branch), and
positive/negative cases for the chown-no-follow fallback.
Verified end-to-end via `make smoke JOBS=4` on a KVM host.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each VM's firecracker now runs inside a per-VM chroot dropped to the
registered owner UID via firecracker-jailer. Closes the broad ambient-
sudo escalation surface that survived Phase A: the helper still needs
caps for tap/bridge/dm/loop/iptables, but the VMM itself no longer
runs as root in the host root filesystem.
The host helper stages each chroot up front: hard-links the kernel
and (optional) initrd, mknods block-device drives + /dev/vhost-vsock,
copies in the firecracker binary (jailer opens it O_RDWR so a ro bind
fails with EROFS), and bind-mounts /usr/lib + /lib trees read-only so
the dynamic linker can resolve. Self-binds the chroot first so the
findmnt-guarded cleanup can recurse safely.
AF_UNIX sun_path is 108 bytes; the chroot path easily blows past that.
Daemon-side launch pre-symlinks the short request socket path to the
long chroot socket before Machine.Start so the SDK's poll/connect
sees the short path while the kernel resolves to the chroot socket.
--new-pid-ns is intentionally disabled — jailer's PID-namespace fork
makes the SDK see the parent exit and tear the API socket down too
early.
CapabilityBoundingSet for the helper expands to add CAP_FOWNER,
CAP_KILL, CAP_MKNOD, CAP_SETGID, CAP_SETUID, CAP_SYS_CHROOT alongside
the existing CAP_CHOWN/CAP_DAC_OVERRIDE/CAP_NET_ADMIN/CAP_NET_RAW/
CAP_SYS_ADMIN.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the shell-string launcher in buildProcessRunner with a direct
exec.Command. The previous sh -c wrapper relied on shellQuote escaping
for every MachineConfig field that flowed into the launch script; any
future field that ever carried an attacker-controlled value would have
become RCE-as-root. The new path passes binary path and flags as
separate argv entries, so there is no shell to interpret anything.
The wrapper also did two things the shell can no longer do for us:
1. umask 077 — moved to syscall.Umask in cmd/bangerd/main.go so every
firecracker child (and any other file the daemon creates) inherits
0600 by default. Single-user dev sandbox state should be private.
2. chown_watcher — the SDK's HTTP probe inside Machine.Start connects
to the API socket the moment it appears. Under sudo the socket is
created root-owned and the daemon's connect(2) gets EACCES, so the
post-Start EnsureSocketAccess never runs. The shell papered over
this with a backgrounded chown loop. Replaced by
fcproc.EnsureSocketAccessForAsync: same race-window guarantee, in
pure Go, kicked off in LaunchFirecracker right before Start and
awaited right after.
Tests updated: shell-substring assertions replaced with cmd-arg
assertions, plus a new fcproc test pinning the async chown sequence.
Smoke (full systemd two-service install + KVM scenarios) passes.
Four targeted fixes from a race-condition audit of the daemon package.
None change behaviour on the happy path; each closes a window where a
concurrent or interrupted RPC could strand state on the host.
- KernelDelete now holds the same per-name lock as KernelPull /
readOrAutoPullKernel. Without it, a delete racing a concurrent
pull could remove files mid-write or land between the pull's
manifest write and its first use.
- cleanupRuntime no longer early-returns on an inner waitForExit
failure; DM snapshot, capability, and tap teardown always run and
every error is folded into the returned errors.Join. EBUSY against
a still-alive firecracker is benign and surfaces in the joined
error rather than stranding kernel state across daemon restarts.
- Per-name image / kernel pull locks switch from *sync.Mutex to a
1-buffered chan struct{}. Acquire is a select on ctx.Done(), so a
peer waiting behind a pull whose RPC was cancelled can bail out
instead of blocking forever on a pull nobody is consuming.
- setVMHandles writes the per-VM scratch file before updating the
in-memory cache. A daemon crash between the two now leaves disk
ahead of memory (recoverable: reconcile re-seeds the cache from
the file on next start) rather than memory ahead of disk (lost
handles → stranded DM/loops/tap).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three quality-of-life improvements now that the daemon-side races
that gated parallel mode are fixed:
1. **Smol VMs by default.** Smoke installs a tuned config.toml at
/etc/banger/config.toml between `system install` and `system
restart` so the respawned daemon picks up:
vcpu = 2
memory_mib = 1024
disk_size = "2G"
system_overlay_size = "2G"
Smoke scenarios assert behavior, not capacity — they don't need
4 vCPU / 8 GiB / 8 GiB / 8 GiB. Per-VM RAM cost drops from 8 GiB
to 1 GiB; nominal disk drops from 16 GiB to 4 GiB (sparse, so
actual use is small either way, but the new ceiling is gentler
on hosts that can't overcommit). Scenarios that test
reconfiguration (vm_set's --vcpu 2 → 4) still pass --vcpu
explicitly, so this default doesn't perturb their assertions.
2. **JOBS defaults to nproc.** The Makefile resolves JOBS to
`$(shell nproc)` if unset; the smoke script's existing cap of 8
keeps the parallel pool sane on bigger hosts. The script always
passes --jobs N now, so behavior is consistent. Override with
`make smoke JOBS=1` for a fully serial run.
3. **Help text catches up.** --help no longer flags parallelism as
experimental (the underlying daemon races are fixed) and now
describes the small-VM default. `make help` mentions the new
default and how to override.
Verified: `make smoke` (no JOBS) on a 32-core box auto-runs with
JOBS=8, smol VMs, 21/21 PASS in 172s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concurrency bugs surfaced by `make smoke JOBS=4` that all stem
from `vm.create` paths assuming single-caller semantics:
1. **Kernel auto-pull manifest race.** Parallel `vm.create` calls that
each need to auto-pull the same kernel ref both run kernelcat.Fetch
in parallel against the same /var/lib/banger/kernels/<name>/. Fetch
writes manifest.json non-atomically (truncate + write); the peer
reads it back mid-write and trips
"parse manifest for X: unexpected end of JSON input".
Fix: per-name `sync.Mutex` map on `ImageService` (kernelPullLock).
`KernelPull` and `readOrAutoPullKernel` both acquire it and re-check
`kernelcat.ReadLocal` after the lock so a peer who finished while we
waited is treated as success — `readOrAutoPullKernel` does NOT call
`s.KernelPull` because that path errors with "already pulled" on a
peer-success, which would be wrong for auto-pull. Different kernels
stay parallel.
2. **Image auto-pull race.** Same shape as the kernel race but on the
image side: parallel `vm.create` calls both run pullFromBundle /
pullFromOCI for the missing image (each ~minutes of OCI fetch +
ext4 build). The publishImage atom under imageOpsMu only protects
the rename + UpsertImage commit, so the loser does all the work
only to fail at the recheck with "image already exists".
Fix: per-name `sync.Mutex` map on `ImageService` (imagePullLock).
`findOrAutoPullImage` acquires it, re-checks FindImage, and only
then calls PullImage. Loser short-circuits with the
freshly-published image instead of redoing minutes of work.
PullImage's own publishImage recheck stays as defense-in-depth
for callers that bypass the auto-pull path.
3. **Work-seed refresh race.** When the host's SSH key has rotated
since an image was last refreshed, `ensureAuthorizedKeyOnWorkDisk`
triggers `refreshManagedWorkSeedFingerprint`, which rewrote the
shared work-seed.ext4 in place via e2rm + e2cp. Peer `vm.create`
calls doing parallel `MaterializeWorkDisk` rdumps observed a torn
ext4 image — "Superblock checksum does not match superblock".
Fix: stage the rewrite on a sibling tmpfile (`<seed>.refresh.<pid>-<ns>.tmp`)
and atomic-rename. Concurrent readers either have the file open
(kernel keeps the pre-rename inode alive) or open after the rename
(see the new inode) — never observe a partial state. Two parallel
refreshes are idempotent (same daemon, same SSH key) so unique tmp
names are enough; whichever rename lands last wins, with identical
content. UpsertImage runs after the rename so the recorded
fingerprint always matches what's on disk.
Plus one smoke harness fix: reclassify `vm_prune` from `pure` to
`global`. `vm prune -f` removes ALL stopped VMs system-wide, not just
the ones the scenario created — so a parallel peer scenario that
happens to have its VM in `created`/`stopped` momentarily gets wiped.
Moving prune to the post-pool serial phase keeps it from racing with
in-flight scenarios.
After all four fixes, `make smoke JOBS=4` passes 21/21 in 174s
(serial baseline 141s; the small overhead is the buffered-output and
`wait -n` semaphore cost — well worth the parallelism for fast-iter
work on a 32-core box).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`scripts/smoke.sh` was a 600-line linear script: no way to see what it
covers without reading the whole thing, and no way to run a single
scenario when iterating. Every iteration paid the full ~5-10 min suite,
which made fast feedback loops painful enough to avoid the suite.
Refactor into a registry + per-scenario functions:
- Top-of-file SMOKE_SCENARIOS (ordered) + SMOKE_DESCS (one-line desc per
scenario) + SMOKE_CLASS (pure / repodir / global) drive both listing
and dispatch. The 21 existing scenario blocks become scenario_<name>
functions. Bodies are the inline blocks verbatim, modulo the workspace
fixture move described below.
- New CLI: --list (cheap discovery, no install / no env-vars),
--scenario NAME (or NAME,NAME,...), --jobs N (parallel dispatch),
-h / --help.
- New setup_fixtures runs once after the install/doctor/restart preamble
and produces the throwaway git repo at $repodir that 'repodir'-class
scenarios consume. Lifted out of scenario_workspace_run so single-
scenario invocations (e.g. --scenario workspace_dryrun) get the
fixture even when the scenario that historically built it isn't
selected.
- Wipe ~/.local/state/banger/ssh/known_hosts in the install preamble.
`system uninstall --purge` clears /var/lib/banger but the user-side
known_hosts persists by design — and smoke creates VMs that reuse
guest IPs (172.16.0.2 etc.) with fresh host keys every run, so a
leftover entry trips StrictHostKeyChecking and the daemon's wait-
for-ssh sees only timeouts. This was the real cause of the "guest
ssh did not come up" flakes that surface across smoke iterations.
Parallel dispatch:
- --jobs N opts into a slot-limited pool: 'pure' scenarios fan out as
individual jobs; 'repodir' scenarios fuse into a single serial chain
(since they mutate $repodir in registry order); 'global' scenarios
run serially after the pool, one at a time.
- Cap is min(N, 8) — each parallel slot runs an 8 GiB VM, so RAM is
the binding constraint.
- Parallel-mode stdout/stderr per scenario buffer to per-scenario
logs and emit one PASS/FAIL line on completion; on FAIL the buffer
is dumped. Serial mode (--jobs 1, the default) keeps stdout
unbuffered exactly as before.
- Parallelism is documented as experimental in --help: it surfaces
real daemon-side concurrency bugs (image auto-pull manifest race,
work-seed-refresh race on the shared work-seed.ext4) that don't
appear in serial mode and that need their own fix in the daemon.
Serial (--jobs 1) is the reliable path; --jobs N is for fast-
iteration dev work where occasional re-runs are acceptable.
Exit codes: 0 ok, 1 assertion failed, 2 usage error (unknown
scenario, missing SCENARIO=), 77 explicit selection skipped (NAT
when sudo iptables is unavailable AND nat is the only selected
scenario; soft-skip otherwise).
Makefile additions:
- `make smoke-list` — cheap discovery, no smoke-build dep, no env vars.
- `make smoke-one SCENARIO=name` — single-scenario run, full preamble.
MAKECMDGOALS guard catches missing SCENARIO= before any rebuild.
- `make smoke JOBS=N` — passes through to the script's --jobs N.
- Help text covers all three.
Verified: serial full suite passes 21/21 in ~140s on this host;
make smoke-one SCENARIO=workspace_restart runs the recently-added
regression test alone in ~50s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VM stop has been quietly losing data freshly written via
`vm workspace prepare`: stop+start of a workspace-prepared VM would
come back with /root/repo wiped on the work disk.
Root cause is firecracker + Debian's systemd defaults. FC's
SendCtrlAltDel (the only "graceful shutdown" action FC exposes) just
delivers the keystroke; what the guest does with it is its choice.
Debian routes ctrl-alt-del.target -> reboot.target, so the guest
reboots, FC stays alive, the daemon's 10s wait_for_exit window
expires, and the SIGKILL fallback drops anything still in FC's
userspace I/O path. For an idle VM that's invisible. For one that
just took 100s of small writes through a workspace prepare, it's
data loss.
Fix is to dial the guest over SSH inside StopVM and run
`sync; systemctl --no-block poweroff || /sbin/poweroff -f &` before
the existing SendCtrlAltDel path. The synchronous `sync` is the
load-bearing piece — it blocks until every dirty page hits virtio-blk
and lands in the on-host root.ext4. Whether poweroff completes
before SIGKILL fires is incidental; sync has already run. SSH
unreachable falls back to the old SendCtrlAltDel behaviour so a
broken-network guest can't make stop hang.
Bounded by a 5s SSH-dial timeout so a half-broken guest can't extend
the overall stop window past gracefulShutdownWait.
Also adds two smoke scenarios:
- `workspace + stop/start`: prepare -> stop -> start -> assert
marker survives. This is the regression that caught the bug.
- `vm exec`: end-to-end coverage for d59425a — auto-cd into the
prepared workspace, exit-code propagation, dirty-host warning,
--auto-prepare resync, refusal on stopped VM.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces three interconnected features for persistent VM workflows:
1. `banger vm exec <vm> -- <cmd>`: runs a command in the prepared
workspace, automatically cd-ing into the guest path and wrapping
via `mise exec --` so mise-managed tools are on PATH. Falls back
to a plain exec when mise isn't available. Exit code propagates
verbatim.
2. Workspace persistence: workspace.prepare now stores the guest path,
host source path, and HEAD commit into a new `workspace_json` column
on the vms table (migration 3). This state survives daemon restarts
and informs both dirty-checking and auto-prepare.
3. Dirty detection: `vm exec` compares the stored HEAD commit against
the current host repo HEAD. When stale it warns and, with
--auto-prepare, re-syncs the workspace before running.
Also:
- WORKSPACE column added to `banger ps` / `vm list`
- `banger vm` quick reference updated with `vm exec` entry
vm run ./repo (and the explicit vm workspace prepare) imports the
host user's own checkout. Any .mise.toml that lands in the guest
would otherwise prompt on the first guest command — 'mise trust:
hash mismatch, run "mise trust"' — and stall what should be a
zero-friction sandbox launch. The repo just came from the host,
the guest is single-tenant root@<vm>.vm, the user already trusts
this checkout: auto-trust is the right default here.
After workspaceImportHook succeeds, run
if command -v mise >/dev/null 2>&1; then
mise trust --quiet --all <guest_path> || true
fi
inside the guest. Best effort: a missing mise binary, a non-zero
exit, or a no-op trust all log at debug only and never fail
prepare. The path is shell-quoted via ws.ShellQuote so guest
paths with spaces or quotes don't break the argument.
Tests pin the script shape (command -v guard + --quiet --all flag
+ trailing `|| true`) and assert the script actually fires after
a successful import. A path with an apostrophe round-trips via
ws.ShellQuote without truncation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three recovery-path errors were silently dropped:
- vm_lifecycle.go startVMLocked persisted the VMStateError record
with `_ = s.store.UpsertVM(...)`. If the persist failed the user
saw the original start error but operators had no way to find
out the store had also drifted out of sync.
- vm_lifecycle.go deleteVMLocked killed the firecracker process
with `_ = s.net.killVMProcess(...)`. cleanupRuntime tears it
down regardless, so the explicit kill is best-effort, but a
permission-denied / EPERM was still worth logging.
- capabilities.go cleanupPreparedCapabilities collected per-cap
errors with errors.Join. Callers get the aggregated value but
couldn't tell which capability failed when more than one did.
All three now log Warn before the original behaviour continues.
The aggregate return value, control flow, and user-visible error
strings are unchanged — this is purely a "less silence in the
journal" pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three small but high-leverage presentation tweaks for v0.1:
1. internal/cli/style is a new ~70 LOC package with Pass/Fail/Warn/
Dim/Bold helpers. Each is TTY-gated and obeys NO_COLOR. No
external dep. Wired into the doctor PASS/FAIL/WARN status, the
"banger:" error prefix on stderr, and the dim 'ready in <elapsed>'
line.
2. internal/cli/errors translates rpc.ErrorResponse into user-facing
text. operation_failed becomes invisible (the message wins);
not_found, already_exists, bad_request, bad_version, unauthorized,
unknown_method get short labels; unknown codes pass through. The
daemon-attached op_id lands in dim parens — paste into
journalctl --grep to find the daemon log line that produced the
failure.
3. Tabwriter config converges on (0, 8, 2, ' ', 0) across every
list/table command. The vm prune confirmation table picked up the
right config; system install + system status switched from bare
"key: value\n" lines to tabular form. printVMSpecLine drops its
Unicode middle dot for an ASCII '|' so terminals without UTF-8
render cleanly.
Tests cover translateRPCError for every code, style helpers no-op
on non-TTY and under NO_COLOR. Smoke status greps switch from
"key: value" to "key value" to match the new format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.
Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.
vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.
The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.
Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fsck_snapshot lifecycle step exists to repair stale bitmaps in
a COW file reused from a prior aborted start — without it, the
later e2cp/e2rm calls in patch_root_overlay refuse to touch the
snapshot. On a freshly-created COW there are no stale bitmaps to
repair, so e2fsck -fy is pure overhead.
system_overlay already tracks whether it created the file this run
(sc.systemOverlayCreated, used to drive the rollback path). Reuse
that flag to skip e2fsck entirely on the create-fresh path. The
reused-COW path keeps the fsck for safety. Saves a few hundred ms
per VM create — small absolute win on top of the lazy-mkfs change,
but free.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mkfs.ext4 zeroes the entire inode table and journal at format time
unless told otherwise. On an 8 GiB work disk that's roughly 500-700ms
of host CPU/IO per 'banger vm create', for a one-time small per-write
penalty inside the guest the first time it touches an unwritten
inode that nobody can perceive.
Centralise the canonical mkfs -E option list as
system.MkfsExtraOptions and use it everywhere banger calls mkfs.ext4
on a VM-internal image: the no-seed work disk, MaterializeWorkDisk,
BuildWorkSeedImage, and the imagepull rootfs builder. The work-disk
paths feed vm create directly; the others are one-off but still
benefit from the faster format.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
systemd's Type=simple reports a unit "active" the moment its
ExecStart binary is exec()'d, which for bangerd happens well before
the daemon has read its config and bound /run/banger/bangerd.sock.
'banger system install' and 'banger system restart' both returned
inside that window, so the very next 'banger ...' command would hit
ensureDaemon, miss on a single ping, and exit with "service not
reachable; run sudo banger system restart" — the same restart that
had just succeeded. Smoke tripped over this on every run.
Add waitForDaemonReady: poll daemonPing for up to 15s after the
restart returns. Both the system install and restart paths now
block until the daemon is genuinely accepting RPCs, so the next
CLI invocation can talk to it without retrying.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Print '[vm create] ready in <elapsed>' to stderr once the create
operation completes successfully. Surfaces how long the full
create-to-ready cycle took (image resolve + work disk + boot +
guest agents + capability post-start), which the per-stage
progress lines don't add up to in any visible way.
Format adapts to scale: sub-second prints as 'NNNms', sub-minute
keeps one decimal ('4.7s'), longer prints as 'MmSSs'. Always
emitted (not gated on TTY) so logged and CI output carry the
number too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old flow on every 'banger vm run' that hit the seeded path:
CopyFilePreferClone the seed file (FICLONE attempt + io.Copy + fsync
fallback), then e2fsck -fp + resize2fs to grow the FS to the spec
size. On filesystems without reflink support that meant pushing
512+ MiB through the kernel followed by a full filesystem check
and resize, even though the seed only carries a few KB of dotfiles
— minWorkSeedBytes is 512 MiB but the actual payload is tiny.
That is the minute-long stall on the 'cloning work seed' stage
users see today.
Replace the copy with a sized fresh ext4: truncate to
WorkDiskSizeBytes, mkfs.ext4 -F -E root_owner=0:0, debugfs rdump
to extract the seed's contents, then ingest each file via the
sudoless ext4 toolkit (MkdirExt4 / WriteExt4FileOwned, root:root,
mode preserved). Sub-second regardless of seed size or requested
work-disk size; no fsck or resize needed because the FS is created
at its final size from the start.
Also drop the now-implementation-pinned
TestEnsureWorkDiskClonesSeedImageAndResizes — its premise (a
scripted e2fsck/resize2fs sequence) no longer reflects the code,
and smoke covers the new flow end to end. Stage label changed
from 'cloning work seed' to 'applying work seed' to match what
actually happens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 'docker' bit on model.Image was unused at runtime — every code
path that branched on it had been removed earlier, leaving only the
field, the SQL column, the --docker flag, and the
#feature:docker sentinel that BuildMetadataPackages emitted into a
hash file. None of those have callers anymore.
Strip the field from the model, the API params, the SQLite column,
the CLI flag, and BuildMetadataPackages's signature. Add migration
2 (drop_images_docker) so existing installs lose the column on next
daemon start. ALTER TABLE ... DROP COLUMN is fine: SQLite has
supported it since 3.35 (2021).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BuildWorkSeedImage used to mount the source rootfs and the new seed
image — both via sudo. After the privilege split (59e48e8) the owner
daemon runs without sudo and those mounts fail silently inside the
image-pull pipeline (runBuildWorkSeed swallows errors), so every
freshly pulled image landed in the store with an empty WorkSeedPath
and 'banger doctor' kept warning that /root would be empty.
Rewrite the builder around the existing sudoless toolkit:
1. RdumpExt4Dir extracts /root from the source rootfs into a host
tempdir (debugfs, no mount).
2. truncate + mkfs.ext4 -F -E root_owner=0:0 produces an empty
user-owned ext4 file.
3. A Go walk over the staged tree calls MkdirExt4 /
WriteExt4FileOwned for every dir + regular file, forcing
root:root and preserving mode bits.
Symlinks and special files in /root are skipped — extremely rare on
a stock distro and not part of what makes a useful seed.
Fix won't retroactively populate already-pulled images: re-pull the
default image (e.g. 'banger image delete debian-bookworm && banger
image pull debian-bookworm') to get a seeded work-seed.ext4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous check tried to bind 127.0.0.1:42069 and warned on
'address already in use' — which is exactly the state when the
banger daemon is running, the case the user ran 'doctor' to
confirm. The warning was actively misleading.
Now, on 'address already in use', probe the listener with a
*.vm DNS query that only banger's vmdns server answers
authoritatively (NXDOMAIN with Authoritative=true). If the shape
matches we pass; if the port is held by something else we still
warn. Tests cover both branches: a real vmdns server is accepted,
and a silent UDP listener on the same port is rejected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontier models tend to discover a CLI by running --help, scanning
the Long description, and inferring the dominant workflow from the
examples. Today's banger help reads like a man page index — every
verb has a one-line Short and nothing else. This rewrites the
groups (banger, vm, vm workspace, image, kernel, system,
ssh-config) so each landing page answers "what is this for, what's
the 80% command, what comes next" in three to ten lines, with
runnable examples.
Also disambiguates the near-twin lifecycle commands so a model
reading the subcommand index can tell stop/kill/delete apart at a
glance:
start Start a stopped VM
stop Stop a running VM gracefully
restart Stop then start a VM
kill Force-kill a VM (use when 'vm stop' hangs)
delete Stop a VM and remove its disks (irreversible)
vm create / vm ssh / vm logs / vm show pick up Long descriptions
and examples for the same reason. No behaviour changes; help text
only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
go 1.25.0 matches go.mod's toolchain. shellcheck is the only non-go
tool make lint hard-requires.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Explain what runs as the owner user vs root, every helper RPC method
and its validation gate, the on-disk paths banger writes, network
mutations, and how install/uninstall work end to end. The aim is to
give a reader enough information to grant or refuse the privileges
banger asks for during system install with their eyes open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-subdir entries with a single /build/ to cover any
new outputs Make or scripts add later (build/manual exists today;
future docs/coverage variants would otherwise need new lines).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.
Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.
Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.
Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
Before this change `banger image pull` (both OCI-direct and bundle
paths) shipped images with an empty WorkSeedPath — the BuildWorkSeedImage
helper existed only behind the hidden `banger internal work-seed` CLI.
Every pulled image hit ensureWorkDisk's no-seed branch, and the guest
booted with a bare /root (no .bashrc, no .profile, none of the distro
defaults).
Pull now calls BuildWorkSeedImage after the rootfs is finalised (OCI)
or fetched (bundle). The builder is behind a new `workSeedBuilder` test
seam so existing pull tests don't accidentally demand sudo mount. The
build failure is non-fatal: any error logs a warning and leaves
WorkSeedPath empty — images stay publishable even if the pulled rootfs
has no /root to extract.
Verified end-to-end by wiping the cached smoke image and re-pulling:
work-seed.ext4 lands in the artifact dir next to rootfs.ext4, and all
21 smoke scenarios pass.
Also refreshes the "feature /root work disk" fallback tooling check —
the no-seed path no longer touches mount/umount/cp after commit
0e28504, so the doctor check now only requires truncate + mkfs.ext4.
The warn copy updates from "new VM creates will be slower" to "guest
/root will be empty", which matches the actual tradeoff post-refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both helpers are stranded: commit f068536 dropped their last callers
from ensureAuthorizedKeyOnWorkDisk and seedAuthorizedKeyOnExt4Image,
and commit 6ab1a2b dropped the ensureGitIdentity / runFileSync calls
that still held them up. Every on-disk-patch code path now drives the
ext4 image directly via MkdirExt4 / WriteExt4FileOwned /
EnsureExt4RootPerms.
Also drops TestFlattenNestedWorkHomeCopiesEntriesIndividually —
premise gone with the function. The sshd_config_test comment
referencing normaliseHomeDirPerms now points at EnsureExt4RootPerms.
Net sudo reduction across the five-commit series: work-disk creation,
authsync, image seeding, git identity sync, and file_sync all drop
sudo entirely against user-owned ext4 files. Remaining sudo in
internal/daemon is confined to firecracker process launch, tap/dm
device setup, iptables/NAT, and dmsnap/fcproc — things that
legitimately need CAP_SYS_ADMIN or CAP_NET_ADMIN. MountTempDir stays
on exclusively as an image-build helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ensureGitIdentityOnWorkDisk, writeGitIdentity, runFileSync, and
copyHostDir all dropped their mount + sudo install/mkdir/chmod/chown
scaffolding. Every write now goes through MkdirExt4,
WriteExt4FileOwned, ReadExt4File, and the new MkdirAllExt4 helper —
all sudoless against user-owned ext4 images.
Net effect with the prior two commits: ensureWorkDisk, authsync, image
seeding, git identity sync, and file_sync no longer mount the work
disk or spawn sudo mkdir/chmod/chown/cat/install. Only the
image-build path (which legitimately produces root-owned artifacts)
still touches MountTempDir.
The filesystemRunner test harness grew a small debugfs/e2cp/e2rm
emulator so the WorkspaceService tests keep exercising their real
code paths without a live ext4 image. The mock is deliberately
dumb — it only implements the subset runFileSync and writeGitIdentity
drive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ensureAuthorizedKeyOnWorkDisk and seedAuthorizedKeyOnExt4Image both
drove mount + sudo mkdir/chmod/chown/cat/install to patch
/.ssh/authorized_keys into a work disk or work-seed. Both now delegate
to a shared provisionAuthorizedKey helper that uses the ext4 toolkit
introduced in 7704396 — EnsureExt4RootPerms + MkdirExt4 +
Ext4PathExists/ReadExt4File + WriteExt4FileOwned. No mount, no sudo,
no host-path staging.
Drops ~10 sudo call sites from the VM create and image pull flows
and deletes the TestEnsureAuthorizedKeyOnWorkDiskRepairsNestedRootLayout
premise (flattenNestedWorkHome will disappear entirely in the next
commit — the no-seed path no longer copies /root, and the work-seed
path produces flat seeds).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The no-seed branch used to mount the base rootfs read-only, mount
the freshly mkfs'd work disk read-write, sudo-cp /root from one to
the other, then flatten any accidental /root/root/ nesting. Five
sudo call sites packed into a fallback that the common image path
doesn't even exercise.
Replace with: `mkfs.ext4 -F -E root_owner=0:0` and nothing else.
mkfs already stamps inode 2 as root:root:0755 — sshd's StrictModes
walks that dir's ownership when the work disk mounts at /root in
the guest, so getting it right from mkfs means authsync can just
write authorized_keys without any repair pass.
Tradeoff: no-seed VMs lose the base rootfs's default /root dotfiles
(.bashrc, .profile). The no-seed path is explicitly the degraded
fallback — `banger doctor` already warns about it — and users who
want those back have two documented knobs: rebuild the image with
a work-seed, or land them via [[file_sync]].
Sudo call sites removed: 5 (MountTempDir × 2, sudo cp -a,
flattenNestedWorkHome's chmod/cp/rm). flattenNestedWorkHome itself
stays alive for now — authsync + image_seed still call it — and
gets deleted in commit 5 once its last caller goes away.
While here: fix the freshly-added EnsureExt4RootPerms helper.
`set_inode_field <2> mode N` overwrites the full i_mode word
instead of preserving the type nibble, so the initial
implementation that passed just the permission bits (0755) would
reset the fs root to regular-file shape and break the next kernel
mount with "Structure needs cleaning." The corrected call OR's in
S_IFDIR (0o040000) explicitly. Test updated to match.
Smoke: 21/21 scenarios green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The daemon mounts every VM's work disk on the host via sudo, copies
files in as root, chmods+chowns them, and unmounts. That's ~18 of
banger's runtime RunSudo calls. The ext4 image is a regular file the
daemon user owns; e2cp / debugfs can write to it directly and bake
uid/gid/mode into the filesystem metadata without the caller being
root. `imagepull.ApplyOwnership` already proves this works in
production (OCI layer flattening writes 0/0/root-owned inodes from
an unprivileged daemon).
This commit adds the toolkit layer. Callers land in the next four
commits:
- MkdirExt4 — idempotent directory create + metadata reset, single
debugfs batch
- WriteExt4FileOwned — e2cp + debugfs-driven uid/gid/mode, auto-
cleans the host tempfile
- SetExt4Ownership — sif + set_inode_field batch for existing
inodes (no mkdir implied)
- EnsureExt4RootPerms — fixes inode <2> (the fs root, which is
`/root` once the work disk is mounted inside the guest), the
thing sshd's StrictModes walks
- Ext4PathExists — yes/no probe via `debugfs -R "stat ..."` with
"File not found" detection
- ReadExt4File — bytes-returning wrapper around the existing
ReadDebugFSText with the same path rejection
Design notes:
- extfsRun auto-switches Run ↔ RunSudo on imagePath's type: regular
files get the unprivileged path, block devices (dm-snapshot,
loops) get sudo. The same helper works for both patchRootOverlay
(dm device) and work-disk writes (user-owned file). No caller
flag needed — os.Stat tells us.
- debugfsScript batches set_inode_field + sif + mkdir lines into
one `debugfs -w -f -` stdin invocation on any Runner that
implements StdinRunner (production's system.Runner does). Matches
imagepull.ApplyOwnership's existing pattern; dramatically cheaper
than per-call subprocesses.
- Paths are escaped for debugfs on the way in: spaces get double-
quoted, double-quote/backslash/newline are rejected outright
(debugfs's hand-rolled parser doesn't reliably escape those and
we'd rather fail fast than silently scribble over the wrong
inode).
Tests: seven behaviour assertions via scripted + stdin-scripted
runners — existence probe (found + missing + rejection), read
passthrough, mkdir batch contents (new vs. pre-existing path), write
tempfile cleanup + mode line shape, root-inode addressing, and the
full rejectDebugfsUnsafePath matrix.
No production wiring change in this commit — the helpers land
unused. `make smoke` stays green (21/21) because nothing else
shifted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preserve cleanup after daemon restarts and harden OCI and tar imports
against filenames that debugfs cannot encode safely.
Mirror tap, loop, and dm teardown identity onto VM.Runtime, teach
cleanup and reconcile to fall back to those persisted fields when
handles.json is missing or corrupt, and clear the recovery state on
stop, error, and delete paths.
Reject debugfs-hostile entry names during flattening and in
ApplyOwnership itself, then add regression coverage for corrupt
handles.json recovery and unsafe import paths.
Verified with targeted go tests, make lint-go, make lint-shell, and
make build.
Closes commit 3 of the god-service decomposition. VMService still
owned 45+ methods after the startVMLocked extraction and RPC table
landed in commits 1 and 2. Stats / ports / health / vsock-ping sit
in a corner of that surface that doesn't share any state with
lifecycle orchestration — nothing about "what's this VM's CPU
doing" belongs in the same service as Create/Start/Stop/Delete/Set.
New StatsService owns:
- GetVMStats / getVMStatsLocked / collectStats (stats collection)
- HealthVM / PingVM (vsock-agent health probe)
- PortsVM + buildVMPorts + probeWebListener + probeHTTPScheme +
dedupeVMPorts (listening-port enumeration)
- pollStats (background ticker refresh)
- stopStaleVMs (auto-stop sweep past config.AutoStopStaleAfter)
The three VMService touch-points stats genuinely needs — vmAlive,
vmHandles, the per-VM lock helpers, plus cleanupRuntime for the
stale-sweep tear-down — come in as function-typed closures, not a
*VMService pointer. StatsService has no back-reference to its
sibling. Mirrors the dependency-struct pattern WorkspaceService
already uses.
Wiring: d.stats is populated in wireServices AFTER d.vm (closures
must see a non-nil d.vm at call time). Dispatch table's four
entries (vm.stats / vm.health / vm.ping / vm.ports) now resolve
through d.stats. Background loop's pollStats / stopStaleVMs
tickers do the same. Dispatch surface from the RPC client's
perspective is byte-identical.
After this commit:
- vm_stats.go and ports.go are deleted; their content (plus the
stats-specific fields) lives in stats_service.go.
- VMService loses 12 methods. It's still the biggest service
(~30 methods, all lifecycle-supporting: handle cache, disk
provisioning, preflight, create-ops registry, lock helpers,
the lifecycle verbs themselves) but it's finally one coherent
concern instead of five.
Tests:
- TestWireServicesInstantiatesStatsService — pins that the
wiring order puts d.stats non-nil + its five closures all
populated. Prevents a silent background-loop regression.
- All existing tests that called d.vm.HealthVM / d.vm.PingVM /
d.vm.PortsVM / d.vm.collectStats were re-pointed at d.stats.
Smoke: all 21 scenarios green, including vm ports (exercises the
new PortsVM entry end-to-end) and the long-running workspace
scenarios (exercise the background stats poller implicitly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>