Commit graph

6 commits

Author SHA1 Message Date
cec7291184
Survive banger update with running VMs
Two coupled fixes that together make the daemon-restart path of
`banger update` non-destructive for running guests:

1. Unit templates set `KillMode=process` on bangerd.service and
   bangerd-root.service. The default control-group behaviour sent
   SIGKILL to every process in the cgroup on stop/restart — including
   jailer-spawned firecracker children, since fork/exec doesn't
   escape a systemd cgroup. With process mode only the unit's main
   PID is signalled; FC children stay alive in the (unowned)
   cgroup until the new helper instance starts up and re-claims them.

2. `fcproc.FindPID` falls back to the jailer-written pidfile at
   `<chroot>/firecracker.pid` (sibling of the api-sock target) when
   `pgrep -n -f <api-sock>` doesn't find a match. pgrep can't see
   jailer'd FCs because their cmdline only carries the chroot-relative
   `--api-sock /firecracker.socket`, not the host-side path. The
   pidfile is jailer's actual record of the post-exec FC PID, so
   reconcile can verify the surviving process is the right one
   (comm == "firecracker") and re-seed handles.json without tearing
   down the VM's dm-snapshot.

Verified live on the dev host: started a VM, restarted the helper
unit, restarted the daemon unit, and confirmed the FC PID was
unchanged, vm list still showed the guest as running, and
`banger vm ssh` returned the same boot_id pre and post restart.
The systemd journal now reports "firecracker remains running after
unit stopped" and "Found left-over process X (firecracker) in
control group while starting unit. Ignoring." — exactly the shape
`KillMode=process` is supposed to produce.

Tests cover both the parser (parseVersionOutput from the v0.1.2
fix) and the new pidfile lookup: happy path, missing pidfile,
stale pid, wrong comm, garbage content, non-symlink api-sock,
whitespace tolerance.

CHANGELOG corrects v0.1.0's misleading "daemon restarts do not
interrupt running guests" line and documents the unit-refresh
caveat: existing v0.1.0–v0.1.3 installs need a one-time
`sudo banger system install` after updating to v0.1.4 to pick up
the new KillMode directive (`banger update` swaps binaries, not
unit files).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 17:09:15 -03:00
6b543cb17f
firecracker: adopt firecracker-jailer for VM launch (Phase B)
Each VM's firecracker now runs inside a per-VM chroot dropped to the
registered owner UID via firecracker-jailer. Closes the broad ambient-
sudo escalation surface that survived Phase A: the helper still needs
caps for tap/bridge/dm/loop/iptables, but the VMM itself no longer
runs as root in the host root filesystem.

The host helper stages each chroot up front: hard-links the kernel
and (optional) initrd, mknods block-device drives + /dev/vhost-vsock,
copies in the firecracker binary (jailer opens it O_RDWR so a ro bind
fails with EROFS), and bind-mounts /usr/lib + /lib trees read-only so
the dynamic linker can resolve. Self-binds the chroot first so the
findmnt-guarded cleanup can recurse safely.

AF_UNIX sun_path is 108 bytes; the chroot path easily blows past that.
Daemon-side launch pre-symlinks the short request socket path to the
long chroot socket before Machine.Start so the SDK's poll/connect
sees the short path while the kernel resolves to the chroot socket.
--new-pid-ns is intentionally disabled — jailer's PID-namespace fork
makes the SDK see the parent exit and tear the API socket down too
early.

CapabilityBoundingSet for the helper expands to add CAP_FOWNER,
CAP_KILL, CAP_MKNOD, CAP_SETGID, CAP_SETUID, CAP_SYS_CHROOT alongside
the existing CAP_CHOWN/CAP_DAC_OVERRIDE/CAP_NET_ADMIN/CAP_NET_RAW/
CAP_SYS_ADMIN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 14:38:07 -03:00
71a332a6a1
cli: maturity polish — color, error translation, tabwriter consistency
Adds three small but high-leverage presentation tweaks for v0.1:

1. internal/cli/style is a new ~70 LOC package with Pass/Fail/Warn/
   Dim/Bold helpers. Each is TTY-gated and obeys NO_COLOR. No
   external dep. Wired into the doctor PASS/FAIL/WARN status, the
   "banger:" error prefix on stderr, and the dim 'ready in <elapsed>'
   line.
2. internal/cli/errors translates rpc.ErrorResponse into user-facing
   text. operation_failed becomes invisible (the message wins);
   not_found, already_exists, bad_request, bad_version, unauthorized,
   unknown_method get short labels; unknown codes pass through. The
   daemon-attached op_id lands in dim parens — paste into
   journalctl --grep to find the daemon log line that produced the
   failure.
3. Tabwriter config converges on (0, 8, 2, ' ', 0) across every
   list/table command. The vm prune confirmation table picked up the
   right config; system install + system status switched from bare
   "key: value\n" lines to tabular form. printVMSpecLine drops its
   Unicode middle dot for an ASCII '|' so terminals without UTF-8
   render cleanly.

Tests cover translateRPCError for every code, style helpers no-op
on non-TTY and under NO_COLOR. Smoke status greps switch from
"key: value" to "key   value" to match the new format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:27:07 -03:00
74e5a7cedb
cli: wait for the daemon socket to answer ping after install/restart
systemd's Type=simple reports a unit "active" the moment its
ExecStart binary is exec()'d, which for bangerd happens well before
the daemon has read its config and bound /run/banger/bangerd.sock.
'banger system install' and 'banger system restart' both returned
inside that window, so the very next 'banger ...' command would hit
ensureDaemon, miss on a single ping, and exit with "service not
reachable; run sudo banger system restart" — the same restart that
had just succeeded. Smoke tripped over this on every run.

Add waitForDaemonReady: poll daemonPing for up to 15s after the
restart returns. Both the system install and restart paths now
block until the daemon is genuinely accepting RPCs, so the next
CLI invocation can talk to it without retrying.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:22:31 -03:00
35bfac3f13
cli: rewrite help text for AI-driven discovery
Frontier models tend to discover a CLI by running --help, scanning
the Long description, and inferring the dominant workflow from the
examples. Today's banger help reads like a man page index — every
verb has a one-line Short and nothing else. This rewrites the
groups (banger, vm, vm workspace, image, kernel, system,
ssh-config) so each landing page answers "what is this for, what's
the 80% command, what comes next" in three to ten lines, with
runnable examples.

Also disambiguates the near-twin lifecycle commands so a model
reading the subcommand index can tell stop/kill/delete apart at a
glance:

  start    Start a stopped VM
  stop     Stop a running VM gracefully
  restart  Stop then start a VM
  kill     Force-kill a VM (use when 'vm stop' hangs)
  delete   Stop a VM and remove its disks (irreversible)

vm create / vm ssh / vm logs / vm show pick up Long descriptions
and examples for the same reason. No behaviour changes; help text
only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 15:02:08 -03:00
59e48e830b
daemon: split owner daemon from root helper
Move the supported systemd path to two services: an owner-user bangerd for
orchestration and a narrow root helper for bridge/tap, NAT/resolver, dm/loop,
and Firecracker ownership. This removes repeated sudo from daily vm and image
flows without leaving the general daemon running as root.

Add install metadata, system install/status/restart/uninstall commands, and a
system-owned runtime layout. Keep user SSH/config material in the owner home,
lock file_sync to the owner home, and move daemon known_hosts handling out of
the old root-owned control path.

Route privileged lifecycle steps through typed privilegedOps calls, harden the
two systemd units, and rewrite smoke plus docs around the supported service
model.

Verified with make build, make test, make lint, and make smoke on the
supported systemd host path.
2026-04-26 12:43:17 -03:00