banger

Author	SHA1	Message	Date
Thales Maciel	a59958d4f5	daemon: roll back host state on any Open() failure Open() touched several pieces of host state before hitting the step that returned the error: * SQLite handle (store.Open) * managed SSH client config block (ensureVMSSHClientConfig) * vm-DNS UDP listener goroutine (startVMDNS) * systemd-resolved per-interface routing (ensureVMDNSResolverRouting) The only deferred cleanup guarded stopVMDNS. A reconcile() or initializeTapPool() failure therefore left the listener running, the resolver wiring in place, and the SQLite handle open. A subsequent startup attempt ran into "port 42069 already in use" or silently published stale state. Fix: once `d` exists, defer `d.Close()` on `err != nil`. Close is idempotent (sync.Once) and every teardown step (listener close, DNS listener close, resolver revert, session registry close, store close) is nil-guarded, so calling it on a daemon that never got past the first startup step is safe. Tests (internal/daemon/open_close_test.go): - TestCloseOnPartiallyInitialisedDaemon: Close survives a daemon with only store + closing channel, and with a vmDNS listener but nothing else. Catches regressions where a teardown step forgets to nil-check. - TestCloseIdempotentUnderConcurrency: 5 goroutines racing on Close() never panic (sync.Once + close(d.closing) survive). - TestOpenFailureRunsCloseCleanup: structural check that the `defer cleanup() if err != nil` pattern actually fires. Live: `banger daemon stop` cleanly, `banger vm ls` restarts daemon without a residual listener on port 42069.	2026-04-19 16:36:29 -03:00
Thales Maciel	d1b9a8c102	remove experimental web UI The web UI shipped as "experimental" and was never finished — no nav off the dashboard, no live updates, no settled design, never a supported surface. It was opt-in by default already; leaving the code in the tree for v0.1.0 only invited "does this work?" questions and kept HostSummary/BangerSummary/SudoStatus types on the public RPC surface that nothing else uses. Removed: internal/webui/ (all Go + templates + assets) internal/daemon/web.go (server start / Layout / Config / ListVMs / ListImages) internal/daemon/dashboard.go (DashboardSummary aggregator) Simplified: internal/api/types.go drop WebURL on PingResult, drop HostSummary / SudoStatus / BangerSummary / DashboardSummary / DashboardSummaryResult internal/model/types.go drop DaemonConfig.WebListenAddr internal/config/config.go drop web_listen_addr from fileConfig + Load internal/daemon/daemon.go drop webListener / webServer / webURL fields + startWebServer() call + ping WebURL population internal/cli/banger.go `daemon status` output no longer branches on web internal/daemon/{doc.go,ARCHITECTURE.md} drop web UI sections README.md drop web_listen_addr config bullet + security paragraph Tests updated to reflect the new shape. Coverage 57.3 -> 58.9% (the webui package was largely untested; its removal lifts the ratio without moving the numerator). `banger daemon status` output and --help are web-free. Lint + full suite green.	2026-04-19 14:28:08 -03:00
Thales Maciel	687fcf0b59	vm state: split transient kernel/process handles off the durable schema Separates what a VM IS (durable intent + identity + deterministic derived paths — `VMRuntime`) from what is CURRENTLY TRUE about it (firecracker PID, tap device, loop devices, dm-snapshot target — new `VMHandles`). The durable state lives in the SQLite `vms` row; the transient state lives in an in-memory cache on the daemon plus a per-VM `handles.json` scratch file inside VMDir, rebuilt at startup from OS inspection. Nothing kernel-level rides the SQLite schema anymore. Why: Persisting ephemeral process handles to SQLite forced reconcile to treat "running with a stale PID" as a first-class case and mix it with real state transitions. The schema described what we last observed, not what the VM is. Every time the observation model shifted (tap pool, DM naming, pgrep fallback) the reconcile logic grew a new branch. Splitting lets each layer own what it's good at: durable records describe intent, in-memory cache + scratch file describe momentary reality. Shape: - `model.VMHandles` = PID, TapDevice, BaseLoop, COWLoop, DMName, DMDev. Never in SQLite. - `VMRuntime` keeps: State, GuestIP, APISockPath, VSockPath, VSockCID, LogPath, MetricsPath, DNSName, VMDir, SystemOverlay, WorkDiskPath, LastError. All durable or deterministic. - `handleCache` on `*Daemon` — mutex-guarded map + scratch-file plumbing (`writeHandlesFile` / `readHandlesFile` / `rediscoverHandles`). See `internal/daemon/vm_handles.go`. - `d.vmAlive(vm)` replaces the 20+ inline `vm.State==Running && ProcessRunning(vm.Runtime.PID, apiSock)` spreads. Single source of truth for liveness. - Startup reconcile: per running VM, load the scratch file, pgrep the api sock, either keep (cache seeded from scratch) or demote to stopped (scratch handles passed to cleanupRuntime first so DM / loops / tap actually get torn down). Verification: - `go test ./...` green. - Live: `banger vm run --name handles-test -- cat /etc/hostname` starts; `handles.json` appears in VMDir with the expected PID, tap, loops, DM. - `kill -9 $(pgrep bangerd)` while the VM is running, re-invoke the CLI, daemon auto-starts, reconcile recognises the VM as alive, `banger vm ssh` still connects, `banger vm delete` cleans up. Tests added: - vm_handles_test.go: scratch-file roundtrip, missing/corrupt file behaviour, cache concurrency, rediscoverHandles prefers pgrep over scratch, returns scratch contents even when process is dead (so cleanup can tear down kernel state). - vm_test.go: reconcile test rewritten to exercise the new flow (write scratch → reconcile reads it → verifies process is gone → issues dmsetup/losetup teardown). ARCHITECTURE.md updated; `handles` added to Daemon field docs.	2026-04-19 14:18:13 -03:00
Thales Maciel	2e6e64bc04	guest sshd: drop DEBUG3 + StrictModes no; normalise /root perms Previously /etc/ssh/sshd_config.d/99-banger.conf landed with: LogLevel DEBUG3 PermitRootLogin yes PubkeyAuthentication yes AuthorizedKeysFile /root/.ssh/authorized_keys StrictModes no DEBUG3 was debug leftover that floods journald in normal use. StrictModes no was a workaround for /root perm drift on the work disk — the real fix is to make those perms correct at provisioning time. New drop-in: PermitRootLogin prohibit-password PubkeyAuthentication yes PasswordAuthentication no KbdInteractiveAuthentication no AuthorizedKeysFile /root/.ssh/authorized_keys prohibit-password blocks password root login even if PasswordAuth gets flipped on elsewhere; KbdInteractiveAuth no closes the last interactive fallback; StrictModes is now on (sshd's default). normaliseHomeDirPerms chown/chmods /root to 0755 root:root at every work-disk mount (ensureAuthorizedKeyOnWorkDisk, seedAuthorizedKeyOnExt4Image); the .ssh dir also explicitly chown'd root:root. Verified end-to-end against a real VM: `sshd -T` reports strictmodes yes and all five directives match. Regression test (sshd_config_test.go) pins the allow-list and the deny-list (DEBUG3, StrictModes no, bare `PermitRootLogin yes`) so the next accidental reintroduction fails fast. README's Security section updated to reflect the new posture.	2026-04-19 13:40:40 -03:00
Thales Maciel	6cd52d12f4	workspace prepare: release VM mutex before guest I/O Previously withVMLockByRef held the per-VM mutex across InspectRepo, waitForGuestSSH, dialGuest, ImportRepoToGuest (the tar stream!), and the readonly chmod. A large repo could block `vm stop` / `vm delete` / `vm restart` on the same VM for however long the import took. Split into two phases: 1. VM mutex held briefly to validate state (running + PID alive) and snapshot the fields needed for SSH (guest IP, api sock). 2. VM mutex released. Acquire workspaceLocks[id] — a separate per-VM mutex scoped to workspace.prepare / workspace.export — for the guest I/O phase. Lifecycle ops (stop/delete/restart/set) only take vmLocks, so they no longer queue behind a slow import. Two concurrent prepares on the same VM still serialise via workspaceLocks so tar streams don't interleave. ExportVMWorkspace also acquires workspaceLocks to avoid snapshotting a half-streamed import. Two regression tests (sequential — they swap package-level seams): ReleasesVMLockDuringGuestIO: stall the import fake, assert the VM mutex is acquirable from another goroutine during the stall. SerialisesConcurrentPreparesOnSameVM: 3 concurrent prepares, assert Import is only ever invoked 1-at-a-time per VM. ARCHITECTURE.md documents the split + updated lock ordering.	2026-04-19 13:32:42 -03:00
Thales Maciel	99de42385f	workspace export: stop mutating the guest repo index Previously `banger vm workspace export` ran `git add -A` against the guest's real `.git/index`, so the observation step left staged changes behind that users never asked for. Reconnecting later (ssh, another export) surfaced them and looked like phantom work. Route `git add -A` through a throwaway index file instead: tmp_idx=$(mktemp ...) trap 'rm -f "$tmp_idx"' EXIT git read-tree <ref> --index-output="$tmp_idx" GIT_INDEX_FILE="$tmp_idx" git add -A GIT_INDEX_FILE="$tmp_idx" git diff --cached <ref> --binary\|--name-only The real .git/index, working tree, and refs stay exactly as the user left them. Same diff content — commits past <ref>, uncommitted edits, and untracked files (minus .gitignore) all captured. Regression test locks the invariant: every export script must route add -A through GIT_INDEX_FILE and clean the temp index on exit. CLI help text updated to say "non-mutating".	2026-04-19 13:20:56 -03:00
Thales Maciel	21b74639d8	vm defaults: host-aware sizing + spec line on spawn + doctor check Replaces the static model.Default* constants that drove --vcpu / --memory / --disk-size with a three-layer resolver: 1. [vm_defaults] in ~/.config/banger/config.toml (if set) 2. host-derived heuristics (cpus/4 capped at 4; ram/8 capped at 8 GiB) 3. baked-in constants (floor) Visibility: - Every `vm run` / `vm create` prints a `spec:` line before progress begins: `spec: 4 vcpu · 8192 MiB · 8G disk`. Matches the VM that actually gets created because the CLI is now the single source of truth — it resolves, populates the flag defaults, and forwards the explicit values to the daemon. - `banger doctor` adds a "vm defaults" check showing per-field provenance (config\|auto\|builtin) and the config file path for overrides. - `--help` shows the resolved defaults (e.g. `--vcpu int (default 4)` on an 8-core host). No `banger config init` command, no first-run side effects, no writes to the user's filesystem behind their back. Users who want explicit control set the keys; everyone else gets sensible numbers that track their hardware.	2026-04-19 13:06:51 -03:00
Thales Maciel	78ff482bfa	release prep: opt-in web UI, make uninstall, fix stale kernel-catalog docs - WebListenAddr default is now "" (empty). The experimental web UI was running on 127.0.0.1:7777 by default, which surprises users who never opted in. Users who want it set `web_listen_addr = "127.0.0.1:7777"` in config.toml. - `make uninstall` stops the daemon (if any) and removes the installed binaries. Preserves user data on disk but prints the paths so `rm -rf` can follow for a full purge. Documented in README next to install. - docs/kernel-catalog.md: replace the `void-6.12` and `alpine-3.23` examples (never published) with `generic-6.12` (the only cataloged kernel today). Updates the versioning-convention example too.	2026-04-19 12:43:58 -03:00
Thales Maciel	221fb03d68	cli QoL: vm prune, list→ls aliases, delete→rm aliases - `banger vm prune` sweeps every non-running VM (stopped, created, error) with an interactive confirmation; -f/--force skips the prompt. Partial failures report which VM failed and exit non-zero. - list commands gain `ls` alias: vm list already had it; added to image list, kernel list, and vm session list. - delete commands gain `rm` alias: vm delete and image delete. kernel rm already aliased delete/remove. Uses new test seams (vmListFunc) plus the existing vmDeleteFunc so prune unit-tests without touching the daemon socket.	2026-04-19 12:17:46 -03:00
Thales Maciel	e3eaa0c797	cli: shell completion via cobra + dynamic resource name lookups Re-enable cobra's default `completion` subcommand (`banger completion bash\|zsh\|fish\|powershell`). Plus live resource-name suggestions that hit the running daemon via the same RPC the real commands use: vm start/stop/restart/delete/kill/set → completeVMNames (variadic) vm ssh/show/logs/stats/ports/... → completeVMNameOnlyAtPos0 vm session list/start → completeVMNameOnlyAtPos0 vm session show/logs/stop/kill/attach/send → completeSessionNames (vm + session) image show/delete/promote → completeImageNameOnlyAtPos0 kernel show/rm → completeKernelNameOnlyAtPos0 vm run/create --image, image pull/register --kernel-ref → flag-value completion Design notes in internal/cli/completion.go: completers never auto-start the daemon (ping-check, bail with NoFileComp on miss), so tab-completion stays a zero-cost probe. Variadic completers exclude already-entered args to avoid duplicate suggestions. README: install recipes for bash / zsh / fish.	2026-04-19 12:12:40 -03:00
Thales Maciel	346eaba673	coverage: medium batch — hostnat runner, store guest-sessions, daemon helpers Reuses existing fixtures (CommandRunner fakes, SQLite tempfile store, pure-Go seams). No new infra needed. hostnat 50% -> 98% (iptables orchestration via fake runner) store 78% -> 91% (guest_sessions CRUD roundtrip) daemon/session 57% -> 95% (script gen, state parse, snapshot apply) daemon/opstate 67% -> 100% (Registry Insert/Get/Prune) daemon (firstNonEmpty) slight bump Total 54.0% -> 56.5%.	2026-04-18 18:03:37 -03:00
Thales Maciel	f8979de58a	coverage: easy-wins batch across cli, system, paths, vmdns, toolingplan Pure-Go tests for formatters, layout resolution, and validators — no fixtures, no external processes. Targets previously-zero functions the triage scan flagged as low-hanging fruit. cli 55% -> 65% paths 64% -> 91% system 65% -> 75% vmdns 72% -> 86% toolingplan 73% -> 78% Total 52.6% -> 54.0%.	2026-04-18 17:57:05 -03:00
Thales Maciel	a3cc296523	guest: tests for fingerprint, shellQuote, tar-entries edge cases, nil receivers Pure-Go additions (no SSH server fixture): AuthorizedPublicKeyFingerprint, shellQuote escaping, writeTarEntriesArchive error paths (.., ., missing, duplicates, blank entries) and symlink handling, StreamSession/Client nil-receiver safety, WaitForSSH context cancellation. internal/guest coverage 17.8% -> 47.6%. Total 52.1% -> 52.6%. The remaining uncovered paths need a real in-process SSH server; skip.	2026-04-18 17:47:24 -03:00
Thales Maciel	18bf89eae9	coverage: make targets + close zero-cov gaps (namegen, sessionstream) Adds `make coverage` (per-package + total via -coverpkg=./...), `make coverage-html`, and `make coverage-total` (CI-friendly). Wires coverage.out/coverage.html through `make clean` and .gitignore. Closes the two easy zero-coverage packages: namegen (77.8%) and sessionstream (93.5%). Total statement coverage 51.7% -> 52.1%.	2026-04-18 17:44:37 -03:00
Thales Maciel	2584f94828	image/kernel pull: heartbeat dots so slow pulls look alive Bundle downloads can take 20–60s on a typical connection and the CLI was going silent between "resolving daemon" and the final image summary. Users wondered whether banger had wedged. New `withHeartbeat` helper wraps an RPC call with a dot-every-2s ticker on stderr. No-op when stderr isn't a terminal, so piped or scripted invocations stay quiet. Wired into `image pull` and `kernel pull`, the two commands that actually download bytes. Example: $ banger image pull debian-bookworm [image pull] .......... id name managed ... Tests cover the non-TTY short-circuit and error propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 17:08:30 -03:00
Thales Maciel	b5c13e3938	Remove opencode package + vm acp command (dead code) The `internal/opencode` package and the `opencodeCapability` that consumed it were hard-wired to wait for opencode on guest port 4096 when an image shipped an initrd. After the prune commits (void / alpine / customize.sh / image build all removed), nothing banger produces today carries an initrd, so the capability's wait path was unreachable: every startup short-circuited to the "direct-boot, skip opencode" branch. Same logic for `banger vm acp`: it SSHes to `opencode acp --cwd <path>`, a binary the golden image no longer ships. Users who run their own image with opencode can still invoke `ssh vm -- opencode acp --cwd /root/repo` directly — no banger scaffolding required. Removed: - internal/opencode/ (whole package, 255 LOC incl. tests) - internal/daemon/opencode.go (opencodeCapability) - cli `vm acp` command + its helpers (runVMACP, sshACPCommandArgs, vmACPRemoteCommand) + their tests - The opencodeCapability{} entry in registeredCapabilities() plus the test that pinned its presence - `wait_opencode` progress-stage label from the vm-create renderer - Stale mentions in daemon/doc.go, README, and webui test fixtures ~480 lines gone, 12 added. `banger/internal` is now 25 packages instead of 26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:54:37 -03:00
Thales Maciel	0933deaeb1	file_sync: config-driven replacement for hardcoded auth sync Replace the three hardcoded host→guest credential syncs (opencode, claude, pi) with a generic `[[file_sync]]` config list. Default is empty — users opt in to exactly what they want synced, with no surprise about which tools banger "supports". ```toml [[file_sync]] host = "~/.local/share/opencode/auth.json" guest = "~/.local/share/opencode/auth.json" [[file_sync]] host = "~/.aws" # directories are copied recursively guest = "~/.aws" [[file_sync]] host = "~/bin/my-script" guest = "~/bin/my-script" mode = "0755" # optional; default 0600 for files ``` Semantics: - Host `~/...` expands against the host user's $HOME. Absolute host paths are used as-is. - Guest must live under `~/` or `/root/...` — banger's work disk is mounted at /root in the guest, so that's the syncable namespace. Anything outside is rejected at config load. - Validation at config load: reject empty paths, relative paths, `..` traversal, `~user/...`, malformed mode strings. Errors name the offending entry index. - Missing host paths are a soft skip with a warn log (existing behaviour). Other errors (read, mkdir, install) abort VM create. - File entries: `install -o 0 -g 0 -m <mode>` (default 0600). - Directory entries: walked in Go; each source file is installed with its own source permissions preserved. The entry's `mode` is ignored for directories. Removed (all dead after this): - `ensureOpencodeAuthOnWorkDisk`, `ensureClaudeAuthOnWorkDisk`, `ensurePiAuthOnWorkDisk`, the shared `ensureAuthFileOnWorkDisk`, their `warn*Skipped` helpers, `resolveHost{Opencode,Claude,Pi}AuthPath`, and the work-disk relative-path + default display-path constants. - The capability hook registering the three syncs now calls the generic `runFileSync` once. Seven tests exercising the old codepath deleted; six new tests cover the new runFileSync (no-op on empty config, file copy, custom mode, missing-host-skip, overwrite, recursive directory). Config-layer test adds happy-path parsing and a case-per-shape table of invalid entries (empty, relative host, guest outside /root, '..' traversal, `~user`, bad mode). README updated: replaces the "Credential sync" section with a "File sync" section showing the new config shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:40:11 -03:00
Thales Maciel	843314be5e	vm_authsync: s/repairing/provisioning/ in SSH work-disk stage The "repairing SSH access on work disk" stage detail sounded remedial, like something had gone wrong. It's just writing banger's SSH key to /root/.ssh/authorized_keys on the work disk for the first time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:29:18 -03:00
Thales Maciel	cdd857b288	vm run --rm: suppress the still-running reminder The deferred --rm delete fires AFTER runSSHSession returns, but runSSHSession prints "vm X is still running (stop with ...)" before returning. Net effect: the user sees the reminder, then the VM gets deleted behind it — misleading. Thread a skipReminder bool into runSSHSession. `vm run` passes the same value as removeOnExit; other callers (`vm ssh`) pass false. Reinforced by a new assertion in the --rm happy-path test that the reminder string never appears in stderr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:10:29 -03:00
Thales Maciel	b33f24865c	vm run --rm: ephemeral sandboxes New `--rm` flag deletes the VM once the ssh session or `-- cmd` exits, making `vm run` one-shot. Exit code from command mode still propagates correctly. Semantics: - Create fails → no VM to delete, nothing to do. - SSH-wait timeout → VM intentionally kept alive so `vm logs <name>` shows why; the timeout error already pointed users at that. Even with --rm, this path skips delete — a wedged sshd is exactly when you want post-mortem access. - Session/command ends (any exit code, any reason) → VM is deleted via `vm.delete` RPC. Uses a fresh 10s context so Ctrl-C during the session doesn't abort the cleanup. New vmDeleteFunc seam at the top of banger.go alongside the other RPC seams. Two tests cover the happy path (session ends cleanly → delete fires with correct ref) and the skip-on-timeout path (ssh wait errors → delete does NOT fire). README updated with an ephemeral example and a note about the timeout-skip behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:06:46 -03:00
Thales Maciel	3aa64a63c1	vm run: bound the ssh wait and give a useful error on timeout Before: `guestWaitForSSHFunc` loops forever bounded only by context cancellation, so if sshd fails to start in the guest `vm run` hangs indefinitely — which burned a long debugging session during the golden-image bring-up. After: the ssh wait gets its own 90s deadline. On guest-side timeout the error names the VM, explains sshd is the likely suspect, points at `banger vm logs <name>` for the console output, and notes the VM is still alive for inspection (or `vm delete` to clean up). Parent context cancellation (Ctrl-C, caller timeout) still surfaces as-is without the hint. `vmRunSSHTimeout` is a var rather than a const so tests can shrink it; the new TestRunVMRunSSHTimeoutReturnsActionableError sets it to 50ms and asserts the error message contains the actionable bits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:59:27 -03:00
Thales Maciel	ac7974f5b9	Remove image build --from-image; doctor treats catalog images as OK The `image build` flow spun up a transient Firecracker VM, SSHed in, and ran a large bash provisioning script to derive a new managed image from an existing one. It overlapped heavily with the golden- image Dockerfile flow (same mise/docker/tmux/opencode install logic duplicated in Go as `imagemgr.BuildProvisionScript`) and had far more machinery: async op state, RPC begin/status/cancel, webui form + operation page, preflight checks, API types, tests. For custom images, writing a Dockerfile is simpler and more reproducible. Removed end-to-end: - CLI `image build` subcommand + `absolutizeImageBuildPaths`. - Daemon: BuildImage method, imagebuild.go (transient-VM orchestration), image_build_ops.go (async begin/status/cancel), imagemgr/build.go (the 247-line provisioning script generator and all its append* helpers), validateImageBuildPrereqs + addImageBuildPrereqs. - RPC dispatches for image.build / .begin / .status / .cancel. - opstate registry `imageBuildOps`, daemon seam `imageBuild`, background pruner call. - API types: ImageBuildParams, ImageBuildOperation, ImageBuildBeginResult, ImageBuildStatusParams, ImageBuildStatusResult; model type ImageBuildRequest. - Web UI: Backend interface methods, handlers, form, routes, template branches (images.html build form, operation.html build branch, dashboard.html Build button). - Tests that directly exercised BuildImage. Doctor polish (task C): - Drop the "image build" preflight section entirely (its raison d'être is gone). - Default-image check now accepts "not local but in imagecat" as OK: vm create auto-pulls on first use. Only flag when the image is neither locally registered nor in the catalog. Net: 24 files touched, 1,373 lines deleted, 25 added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:54:29 -03:00
Thales Maciel	6083e2dde5	Prune legacy void/alpine + customize.sh flows The golden-image Dockerfile + catalog pipeline replaces the entire manual rootfs-build stack. With that shipped, the per-distro shell flows are dead code. Removed: - scripts/customize.sh, scripts/interactive.sh, scripts/verify.sh - scripts/make-rootfs{,-void,-alpine}.sh - scripts/register-{void,alpine}-image.sh - scripts/make-{void,alpine}-kernel.sh - internal/imagepreset/ (only consumer was `banger internal packages`, which fed customize.sh) - examples/{void,alpine}.config.toml - Makefile targets: rootfs, rootfs-void, rootfs-alpine, void-kernel, alpine-kernel, void-register, alpine-register, void-vm, alpine-vm, verify-void, verify-alpine, plus the ALPINE_RELEASE / _IMAGE_NAME / _VM_NAME variables The void-6.12 kernel catalog entry is also gone — golden image pairs with generic-6.12 and nothing else in the catalog depended on it. Consolidated: imagemgr now holds the small DebianBasePackages list + package-hash helper inline, so the `image build --from-image` flow (still supported) no longer pulls from a separate imagepreset package. Net: 3,815 lines deleted, 59 added. No runtime functionality removed beyond the `banger internal packages` subcommand (hidden, used only by the deleted customize.sh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:39:53 -03:00
Thales Maciel	75baf2e415	publish-golden-image: content-addressed tarball names Embed the sha256 prefix in the uploaded filename so every rebuild lives at a unique URL. Cloudflare's edge cache (and any similar CDN in front of R2) can never serve stale bytes for the URL the catalog points at. The R2 console offers no per-URL purge for this bucket layout, so making the URL itself content-addressed is the only durable fix. Also republishes the debian-bookworm catalog entry with the new filename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:26:57 -03:00
Thales Maciel	e0894376ea	vm create: auto-pull image and kernel from catalogs if missing One-command sandbox: `banger vm run` on a fresh host now Just Works. No prior `banger image pull` or `banger kernel pull` needed. Changes: - Default `default_image_name` flips from "default" to "debian-bookworm" so the golden image is the implicit target when `--image` is omitted. - `CreateVM` resolves the image via a new `findOrAutoPullImage`: try the local store first, and on miss fall back to the embedded imagecat catalog + auto-pull. Emits a vm-create progress stage so the user sees "pulling from image catalog" in the create output. - `resolveKernelInputs` gains context + the same pattern via `readOrAutoPullKernel`: try the local kernelcat, and on miss look up the embedded kernelcat and auto-pull. Fires whenever a bundle's manifest references a kernel the user hasn't pulled yet, not just during image pull — any CreateVM with an image that needs a kernel not yet local will resolve it. - `--image` help text updated on both `vm run` and `vm create`. Six tests cover local-hit-no-pull, auto-pull-on-miss, not-in-catalog error propagation, and a non-ENOENT kernel read error does NOT trigger a misleading "not in catalog" claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 15:10:26 -03:00
Thales Maciel	81a27d6648	imagecat: publish debian-bookworm bundle with boot fixes End-to-end verified: banger image pull debian-bookworm banger vm run --image debian-bookworm --name goldenvm boots through multi-user.target, sshd starts, and vm run drops into an interactive ssh session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:59:01 -03:00
Thales Maciel	66838bb135	make-bundle: strip /.dockerenv so systemd doesn't misdetect virt `docker create` drops /.dockerenv into the container's writable layer, and `docker export` includes it in the tar. When systemd later boots that rootfs it finds /.dockerenv and flags virtualization=docker, which disables a bunch of udev device-unit behaviour (device units never become active, mount units waiting on them hang forever). Strip /.dockerenv (and /run/.containerenv for podman symmetry) from the staging tree after FlattenTar and before BuildExt4 so systemd correctly detects virtualization=kvm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:58:42 -03:00
Thales Maciel	ed4117d926	imagepull/BuildExt4: omit positional fs-size; rely on file truncation mkfs.ext4's positional fs-size is documented in 1 KiB units (not the filesystem's 4 KiB block size), so passing sizeBytes/4096 made filesystems 1/4 the intended size. A 4 GiB request became a 1 GiB ext4 in a 4 GiB file, packed to 0 free blocks — VM create then failed with 'Could not allocate block' when patchRootOverlay tried to write guest config. The file is truncated to the target size before mkfs runs; without the positional arg, mkfs uses the whole device. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:58:42 -03:00
Thales Maciel	b2dcdf9757	vm_lifecycle: drop systemd.mask=dev-{ttyS0,vdb}.device Both masks were added when the direct-boot path first landed for container rootfses that didn't have anything mounted on /dev/vdb. The golden image (and any pulled OCI image running under banger's patchRootOverlay) has an /etc/fstab entry mounting /dev/vdb at /root — masking dev-vdb.device makes systemd wait forever for a unit that can never become active, and the work-disk mount never completes. dev-ttyS0 is a real serial console the image needs too. Drop both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:58:42 -03:00
Thales Maciel	ab5627aec2	imagecat: publish debian-bookworm golden image First entry in the image catalog. Verified end-to-end: - https://images.thaloco.com/debian-bookworm-x86_64.tar.zst reachable - sha256 071495e6... matches - bundle unpacks to rootfs.ext4 (4 GiB) + manifest.json with the expected name/distro/arch/kernel_ref. publish-golden-image.sh tweaks: - default RCLONE_REMOTE from 'r2' to 'banger-images' (matches the rclone config actually in use here). - rclone copyto now passes --s3-no-check-bucket and --no-check-dest so scoped R2 tokens without HeadBucket/HeadObject permission still upload cleanly. To use: restart bangerd so it picks up the new embedded catalog, then `banger image pull debian-bookworm`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:25:42 -03:00
Thales Maciel	5bdc9985c2	image pull: dispatch to imagecat bundle path before OCI PullImage now checks the embedded imagecat catalog first. If the ref matches a catalog entry, it takes the bundle path: 1. Fetch the .tar.zst bundle into a staging dir (rootfs.ext4 + manifest.json). 2. Strip manifest.json (staging-only metadata). 3. Stage kernel/initrd/modules alongside rootfs.ext4. 4. Publish the staging dir and upsert the image row. Bundle rootfs is already flattened + ownership-fixed + agent- injected at build time, so the daemon-side work is strictly I/O — no flatten, no mkfs, no debugfs. Kernel resolution in the bundle path: --kernel-ref > entry.kernel_ref > --kernel/--initrd/--modules. If the ref doesn't match a catalog entry, PullImage falls through to the existing OCI path unchanged (extracted into pullFromOCI). New test seam: d.bundleFetch. Six unit tests cover happy path, --kernel-ref override, existing-name rejection, kernel-required error, fetch-failure cleanup, and the catalog → OCI fallthrough. CLI help updated: image pull now documents both forms and takes <name-or-oci-ref> instead of requiring an OCI ref. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:43:33 -03:00
Thales Maciel	d22d05555c	scripts: bundle-based golden image pipeline Replaces the OCI-push flow with a bundle-based one that mirrors the kernel catalog (publish-kernel.sh / kernelcat). - scripts/make-golden-bundle.sh: docker build → docker create → docker export \| banger internal make-bundle → .tar.zst. Defaults target debian-bookworm / generic-6.12 / x86_64; pinned --size 4G to leave headroom for first-boot installs and in-VM apt use. - scripts/publish-golden-image.sh: rewritten to call make-golden-bundle, rclone upload to R2 (banger-images bucket, images.thaloco.com), and jq-patch internal/imagecat/catalog.json with URL / sha256 / size. --skip-upload stops after bundle build and copies to dist/. make-bundle default ext4 sizing also bumped from +25% to +50% headroom (mkfs.ext4 needs room for inode tables, block-group metadata, journal, and the default 5% reserved-blocks margin). The old 25% was too tight for the ~950 MB golden rootfs and aborted with "Could not allocate block". End-to-end smoke (local): golden Dockerfile → 286 MB tar.zst bundle with correct manifest, valid ext4, and all banger units + vsock agent present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:38:04 -03:00
Thales Maciel	a7d1a49aca	cli: restrict ExitCodeError unwrap to the CLI's own type main.go previously unwrapped any error implementing `ExitCode() int` into the process exit status, which matched *exec.ExitError too. So whenever a CLI command ran a subprocess (mkfs.ext4, debugfs, ssh to a daemon preflight, etc.) and that subprocess failed, the CLI would silently exit with the subprocess's code — no error message printed. Surfaced while bringing up `banger internal make-bundle`: mkfs.ext4 was failing on an undersized ext4 and the user saw only `EXIT=1`. Fix: export the type as `cli.ExitCodeError` and unwrap against the concrete type in main.go. The `ExitCode()` method is gone — only the explicit wrap at the `vm run` command-mode call site produces this error now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:37:47 -03:00
Thales Maciel	bb95a0a273	banger internal make-bundle: build image bundles from flat rootfs tars New hidden subcommand that turns a `docker export`-style rootfs tar into a banger bundle (`rootfs.ext4` + `manifest.json`, tar+zstd): 1. FlattenTar (new in imagepull) extracts the stream into a staging dir while capturing per-file uid/gid/mode into a Metadata record. 2. imagepull.BuildExt4 produces the ext4 via `mkfs.ext4 -d`. 3. imagepull.ApplyOwnership re-applies the captured metadata with `debugfs sif` so setuid/root-owned files keep their identity. 4. imagepull.InjectGuestAgents drops the vsock agent + network bootstrap + first-boot service into the ext4. 5. manifest.json is written with name/distro/arch/kernel_ref. 6. Both files are packaged as .tar.zst with max compression. Flags: --rootfs-tar (file or '-' for stdin), --name, --distro, --arch, --kernel-ref, --description, --size, --out. Stdout prints bundle path, sha256, and size so callers can patch the catalog. Unit tests cover flag registration, required-arg validation, the bundle tar round-trip, sha256HexFile, and dirSize. An end-to-end test runs the full pipeline against a synthesized tiny rootfs tar; skips gracefully when mkfs.ext4 / debugfs / companion binaries are missing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:17:50 -03:00
Thales Maciel	3d9ae624b1	imagecat: catalog + fetch for banger image bundles New package mirroring `kernelcat`: catalog + SHA256-verified HTTP fetch of `.tar.zst` bundles that contain rootfs.ext4 + manifest.json. Mounted empty (version:1, entries:[]) so nothing is pullable via the bundle path yet; wiring into `banger image pull` lands in a later phase. - catalog.go: Catalog/CatEntry, LoadEmbedded, ParseCatalog, Lookup, ValidateName. - fetch.go: Fetch(ctx, client, destDir, entry) downloads the bundle, verifies sha256, extracts exactly rootfs.ext4 and manifest.json into destDir, returns the parsed manifest. Rejects unexpected tar entries, unsafe paths, non-regular files, and cleans up partial writes on failure. - Thirteen unit tests (happy path + every failure mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:11:52 -03:00
Thales Maciel	feb679a301	vm run redesign: one command, three modes `vm run` now covers bare sandbox (no args), workspace sandbox (path), and workspace+command (path -- cmd) in a single entry point. Replaces the old print-next-steps-and-exit behaviour: bare and workspace modes drop into interactive ssh, command mode execs via ssh and propagates the remote exit code through banger's own exit status. - path argument is optional; --branch / --from still require a path. - workspace prep and mise tooling bootstrap only run when a path is given; command mode skips the bootstrap. - remote command exit status is wrapped as exitCodeError so main() can propagate it instead of collapsing every failure to 1. - README: promote vm run with three-mode examples; demote vm create to a scripting primitive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:00:45 -03:00
Thales Maciel	8f4be112c2	Generic kernel + init= boot path for OCI-pulled images Closes the full arc: banger kernel pull + image pull + vm create + vm ssh now works end-to-end against docker.io/library/debian:bookworm with zero manual image building. Generic kernel: - New scripts/make-generic-kernel.sh builds vmlinux from upstream kernel.org sources using Firecracker's official minimal config (configs/firecracker-x86_64-6.1.config). All critical drivers (virtio_blk, virtio_net, ext4, vsock) compiled in — no modules, no initramfs needed. - Published as generic-6.12 in the catalog (kernels.thaloco.com). - catalog.json updated with the new entry. Direct-boot init= override (vm_lifecycle.go): - For images without an initrd (direct-boot / OCI-pulled), banger now passes init=/usr/local/libexec/banger-first-boot on the kernel cmdline. The script runs as PID 1, mounts /proc /sys /dev /run, checks for systemd — if present execs it immediately; if not (container images), installs systemd-sysv + openssh-server via the guest's package manager, then execs systemd. - Also passes kernel-level ip= parameter via BuildBootArgsWithKernelIP so the kernel configures the network interface before init runs (container images don't ship iproute2, so the userspace bootstrap script can't call ip(8)). - Masks dev-ttyS0.device and dev-vdb.device systemd units that otherwise wait 90s for udev events that never fire in Firecracker guests started from container rootfses. first-boot.sh rewritten as universal init wrapper: - Works as PID 1 (mounts essential filesystems) OR as a systemd oneshot (existing behavior). - Installs both systemd-sysv AND openssh-server (container images have neither). - Dispatch updated: debian, alpine, fedora, arch, opensuse families + ID_LIKE fallback. All tests updated. Opencode capability skip for direct-boot images: - The opencode readiness check (WaitReady on vsock port 4096) now returns nil for images without an initrd, since pulled container images don't ship the opencode service. Without this, the VM would be marked as error for lacking an opinionated add-on. Docs: README and kernel-catalog.md updated to recommend generic-6.12 as the default kernel for OCI-pulled images. AGENTS.md notes the new build script. Verified live: - banger kernel pull generic-6.12 - banger image pull docker.io/library/debian:bookworm --kernel-ref generic-6.12 - banger vm create --image debian-bookworm --name testbox --nat - banger vm ssh testbox -- "id; uname -r; systemctl is-active banger-vsock-agent" → uid=0(root), kernel 6.12.8, Debian bookworm, vsock-agent active, sshd running, SSH working. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 20:12:56 -03:00
Thales Maciel	bddfa75feb	imagepull.Pull: don't eager-open layer readers The eager "fetch once to surface network errors" loop in Pull was opening each layer's Compressed() stream and immediately closing it without draining. The go-containerregistry filesystem cache populates lazily via tee-on-read — opening and closing without reading wrote ZERO-BYTE blobs into the cache. Every subsequent pull of the same digest then served those corrupted blobs, producing a 1 GiB ext4 containing nothing but banger's injected files. Symptom caught during B-4 live verification: real debian:bookworm pulls had 43 used inodes (out of 65536) and /usr contained only /usr/local — the debian content was silently missing. Fix: remove the eager-fetch loop entirely. Flatten naturally drains layers when it reads them, and the cache populates correctly on that path. Network errors now surface from Flatten instead of Pull, which is fine — they surface at the same place they always had to. Test TestPullCachesLayersAndReturnsImage → TestPullResolvesImageAnd FlattenPopulatesCache, reworded to assert the new contract: Pull resolves the image; Flatten is what populates the cache with non-empty blobs. Users with a corrupted cache from a pre-fix pull must clear it: rm -rf ~/.cache/banger/oci Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 19:03:52 -03:00
Thales Maciel	c3fb4ccc3e	Phase B-3: first-boot sshd install New internal/imagepull/assets/first-boot.sh: POSIX-sh oneshot that detects the guest distro from /etc/os-release (ID + ID_LIKE fallback), installs openssh-server via the native package manager, and enables/starts sshd. Covers debian/ubuntu/kali/raspbian/pop, alpine, fedora/rhel/centos/rocky/almalinux, arch/manjaro, and opensuse/suse. Unknown distros fail clearly with a pointer at editing the script to add a branch. Marker-driven: the service has ConditionPathExists= /var/lib/banger/first-boot-pending, and the script removes the marker on success. Subsequent boots no-op. Testability seams in the script: RUN_PLAN=1 skips the sshd-already-present short-circuit and makes the dispatch echo the planned command instead of executing it. OS_RELEASE_FILE and BANGER_FIRST_BOOT_MARKER env vars override paths so the Go tests exercise the real dispatch logic in a tempdir without touching /etc or /var/lib on the host. Embedding: internal/imagepull/firstboot.go go:embeds both the script and the systemd unit; exposes FirstBootScript() and FirstBootUnit() plus the FirstBootScriptPath / FirstBootMarkerPath / FirstBootUnitName constants. Injection: InjectGuestAgents now drops /usr/local/libexec/ banger-first-boot (0755), /etc/systemd/system/banger-first-boot. service (0644), the empty /var/lib/banger/first-boot-pending marker (0644), and the multi-user.target.wants enable symlink. All uid=0, gid=0. Tests: eight-case dispatch-by-distro (debian, ubuntu, alpine, fedora, arch, opensuse, plus ID_LIKE fallbacks for weird derivatives). Script syntax check via `sh -n`. Unit-contains- expected-fields check. Existing inject round-trip test extended to assert the first-boot bits land in the ext4. Deferred: per-image FirstBootPending flag + extended SSH wait timeout at VM start. Will add if live verification (B-4) shows the naive retry UX is unacceptable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:20:33 -03:00
Thales Maciel	491c8e1ebb	Phase B-2: pre-inject banger guest agents into pulled rootfs New imagepull.InjectGuestAgents writes banger's guest-side assets straight into the pulled ext4 so systemd will start them at first boot: /usr/local/bin/banger-vsock-agent (binary, 0755) /usr/local/libexec/banger-network-bootstrap (script, 0755) /etc/systemd/system/banger-network.service (unit, 0644) /etc/systemd/system/banger-vsock-agent.service (unit, 0644) /etc/modules-load.d/banger-vsock.conf (modules, 0644) plus enable-at-boot symlinks under /etc/systemd/system/multi-user.target.wants/ All writes + ownership + symlinks go through one `debugfs -w -f -` invocation. No sudo required because the caller owns the ext4 file. Script is deterministic: shallow-first mkdir, then write, then sif, then symlink. "File exists" errors from mkdir on already-present dirs are tolerated (debugfs keeps going past them with -f, and we filter them out of the output scan). Asset content reuses the existing guestnet.BootstrapScript / SystemdServiceUnit / ConfigPath and vsockagent.ServiceUnit / ModulesLoadConfig / GuestInstallPath — one source of truth, no duplicated systemd unit strings. Daemon wiring: new d.finalizePulledRootfs seam runs both ApplyOwnership (B-1) and InjectGuestAgents as one phase between BuildExt4 and StageBootArtifacts. The companion vsock-agent binary is resolved via paths.CompanionBinaryPath. Existing daemon tests stub the seam with a no-op to avoid needing a real companion binary + debugfs in the test harness. Tests: real-ext4 round-trip that builds a minimal ext4, runs InjectGuestAgents, then verifies every expected path is present via `debugfs stat`, plus uid=0 and mode 0755 on the vsock-agent binary. Also: missing-binary rejection, ancestor-collection order test. debugfs/mkfs.ext4 tests skip on hosts without the binaries. After B-1+B-2, any OCI image that already ships sshd boots with banger-network and banger-vsock-agent running; image pull is one step from "useful rootfs primitive". B-3 (first-boot sshd install) unlocks images that don't ship sshd. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:08:56 -03:00
Thales Maciel	43982a4ae3	Phase B-1: ownership fixup via debugfs pass imagepull.Flatten now captures per-file uid/gid/mode/type from the tar headers as it walks layers, returning a Metadata map alongside the extracted tree. Whiteouts correctly drop the victim's metadata. The returned Metadata feeds the new imagepull.ApplyOwnership, which pipes a batched `set_inode_field` script to `debugfs -w -f -`. Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so without this pass setuid binaries become setuid-nonroot and sshd refuses to start on the resulting image. With the pass, a pulled debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving intact. imagepull.BuildExt4 signature unchanged; ownership is applied as a separate step by the daemon orchestrator between BuildExt4 and StageBootArtifacts, keeping each helper focused. The seam (d.pullAndFlatten) now returns (Metadata, error) for test stubs to feed synthetic metadata. StdinRunner is a new duck-typed extension next to CommandRunner; the real system.Runner implements RunStdin, test mocks don't need to unless they exercise stdin. Prevents every existing mock from growing a new method. Tests: - TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the tar-header walk - TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip — create ext4 with runner's uid, apply synthetic metadata setting uid=0 + setuid mode, verify via `debugfs -R stat` that the inode now has uid=0 and mode 04755 - TestBuildOwnershipScriptDeterministic: sorted, well-formed sif script output Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 18:04:22 -03:00
Thales Maciel	fdaf7cce0f	imagepull + kernelcat: allow absolute symlink targets Container (and kernel) layers routinely ship symlinks with absolute targets — /usr/bin/mawk, /lib/modules/<ver>/build, etc. Those are interpreted relative to the rootfs at runtime (`/` inside the VM), not against the host filesystem, so they are rooted inside dest by construction and need no escape check at write time. The previous logic resolved absolute Linknames literally (against the host root), compared to the staging dir, and rejected everything that didn't happen to live under it. That made `banger image pull docker.io/library/debian:bookworm` fail on the very first symlink ("etc/alternatives/awk -> /usr/bin/mawk"). Relative targets still get the traversal check — a relative Linkname with ../s can genuinely escape dest at write time even if in-VM resolution would be safe — so the defense against malicious relative chains is intact. Tests: - TestFlattenAcceptsAbsoluteSymlink replaces the old overly-strict test, using the exact etc/alternatives/awk -> /usr/bin/mawk case that broke debian:bookworm. - TestFlattenRejectsRelativeSymlinkEscape confirms relative-with- traversal is still rejected with the same "unsafe symlink" error. Same fix applied in internal/kernelcat/fetch.go for consistency; future kernel bundles with absolute symlinks in the modules tree would otherwise hit the same wall. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 17:33:16 -03:00
Thales Maciel	d5f72dfad9	Phase 3: CLI banger image pull newImagePullCommand mirrors newImageRegisterCommand with a positional <oci-ref> arg, the same kernel-ref / direct-paths flag set + mutual exclusion, plus --size that parses human-friendly values via model.ParseSize before crossing the RPC boundary. Calls "image.pull" RPC, prints the resulting image summary on success. Long help warns about the Phase A bootability gap (ownership not preserved; suitable as `image build` base, not yet directly bootable). CLI test confirms image pull is registered with the expected flags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 17:29:06 -03:00
Thales Maciel	a8c9983542	Phase 2: daemon PullImage orchestration (d *Daemon).PullImage downloads an OCI image, flattens it into an ext4 rootfs, and registers the result as a managed banger image. Flow (internal/daemon/images_pull.go): 1. Parse + validate the OCI ref via go-containerregistry/name. 2. Derive a friendly default name from the ref ("debian-bookworm") when --name is omitted. 3. Reject if an image with that name already exists. 4. Resolve kernel info via the new shared resolveKernelInputs helper (refactored out of RegisterImage); ValidateKernelPaths checks the kernel triple alone. 5. Acquire imageOpsMu, generate a fresh image id, and stage at <ImagesDir>/<id>.staging. 6. imagepull.Pull → cache layers under OCICacheDir; imagepull.Flatten → temp rootfs tree under os.TempDir (so the state filesystem doesn't temporarily double in size). 7. Default size: max(treeSize × 1.25, 1 GiB); --size override accepted. 8. imagepull.BuildExt4 produces the rootfs.ext4 in the staging dir. 9. imagemgr.StageBootArtifacts stages the kernel/initrd/modules into the same dir (reused unchanged). 10. Atomic os.Rename(staging, finalDir) publishes the artifact dir. 11. Persist model.Image with Managed=true. Failure at any step removes the staging dir; failure post-rename removes finalDir. The pullAndFlatten field on Daemon is the test seam: tests stub it to write a fixture tree into destDir and skip the real registry. Refactor: extracted the "kernel-ref vs direct paths" resolution out of RegisterImage into d.resolveKernelInputs so PullImage and RegisterImage share one source of truth for that policy. Split ValidateRegisterPaths into a kernel-only ValidateKernelPaths so PullImage (which produces the rootfs itself) can validate just the kernel triple without the rootfs check. API: ImagePullParams { Ref, Name, KernelPath, InitrdPath, ModulesDir, KernelRef, SizeBytes }. RPC dispatch case image.pull mirrors image.register. Tests cover: happy-path producing a managed image with all four artifacts present + staging cleaned up, name-collision rejection, missing-kernel rejection, and staging cleanup on a failed pull. defaultImageNameFromRef handles tag/digest/no-suffix cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 17:27:32 -03:00
Thales Maciel	78376ba6ec	Phase 1: imagepull package — pull, flatten, ext4 New internal/imagepull/ subpackage. Three concerns, each independently testable: Pull (imagepull.go): - github.com/google/go-containerregistry's remote.Image with the linux/amd64 platform pinned. Anonymous pulls only for v1. - Layer blobs cached on disk via cache.NewFilesystemCache under <cacheDir>/blobs/sha256/<hex> — OCI-standard layout so skopeo/crane could co-exist later. - Eagerly touches every layer once so network errors surface at Pull time, not deep in Flatten. Flatten (flatten.go): - Replays layers oldest-first into destDir. - Whiteout-aware: .wh.<name> deletes the named entry, .wh..wh..opq wipes the parent directory's contents from prior layers. - Path-traversal hardening mirrored from kernelcat extractTar: reject .., absolute paths, and symlinks/hardlinks whose resolved target escapes destDir. - Handles tar.TypeReg, TypeDir, TypeSymlink, TypeLink. Skips device/fifo nodes silently (need privilege; udev/devtmpfs handles them in the guest). BuildExt4 (ext4.go): - Truncates outFile to sizeBytes, then runs `mkfs.ext4 -F -d <srcDir> -E root_owner=0:0`. No mount, no sudo, no loopback. - 64 MiB floor; callers handle real sizing with content-aware headroom. - File ownership in the resulting ext4 reflects srcDir's on-disk ownership — runner's uid/gid since extraction was unprivileged. Documented in package doc as a Phase A v1 limitation; Phase B will add a debugfs- or tar2ext4-based ownership fixup. paths.Layout gains OCICacheDir at $XDG_CACHE_HOME/banger/oci/, ensured at startup alongside the other dirs. Tests use go-containerregistry's in-process registry to push and pull synthetic multi-layer images. Cover: layer caching round-trip, whiteout + opaque-marker handling, path-traversal rejection, unsafe symlink rejection, real mkfs.ext4 round-trip (skipped if mkfs.ext4 absent), and tiny-size rejection. go-containerregistry v0.21.5 added as a direct dep, plus its transitive closure (containerd/stargz, opencontainers/go-digest, docker/cli config helpers, etc). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 17:22:13 -03:00
Thales Maciel	da4a6bf45b	Add lint targets, fix gofmt drift, broaden Makefile build inputs Three small operational improvements. 1. Makefile build dependencies now cover everything under cmd/ and internal/, not just .go. The previous GO_SOURCES find pattern missed embedded assets (catalog.json today, anything else added later), so editing a JSON manifest didn't trigger a rebuild and left the binary stale. New BUILD_INPUTS covers all files; go's own build cache absorbs any redundant invocations. GO_SOURCES is kept for fmt/lint targets which still want only Go files. 2. New `make lint` (default + lint-go + lint-shell): - lint-go: gofmt -l (fail if any output) and go vet ./... - lint-shell: shellcheck --severity=error on scripts/.sh The shell floor is set at error-level for now; the legacy make-rootfs-.sh / make--kernel.sh / customize.sh scripts have warning-level findings (sudo-cat redirects, heredoc quoting) that would block landing this if we tightened immediately. Documented as tech debt in docs/kernel-catalog.md alongside a note about eventually replacing the per-distro bash with a uniform Go tool. 3. gofmt drift fixed in internal/daemon/imagemgr/build.go, session/session.go, and vm_create_ops.go (trailing newline + gofmt's preferred function-definition wrapping). Now `make lint` passes cleanly; future drift will fail CI/local lint instead of accumulating. AGENTS.md gains a one-line note on make lint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 16:49:17 -03:00
Thales Maciel	f0c1dc924c	kernel catalog: add void-6.12	2026-04-16 16:28:45 -03:00
Thales Maciel	f0668ee598	Phase 4: remote catalog + banger kernel pull Introduces the headline feature of the kernel catalog: pulling a kernel bundle over HTTP without any local build step. Catalog format (internal/kernelcat/catalog.go): - Catalog { Version, Entries } + CatEntry { Name, Distro, Arch, KernelVersion, TarballURL, TarballSHA256, SizeBytes, Description }. - catalog.json is embedded via go:embed and ships with each banger binary. It starts empty (Phase 5's CI pipeline will populate it). - Lookup(name) returns the matching entry or os.ErrNotExist. Fetch (internal/kernelcat/fetch.go): - HTTP GET with streaming SHA256 over the response body. - zstd-decode (github.com/klauspost/compress/zstd) -> tar extract into <kernelsDir>/<name>/. - Hardens against path-traversal tarball entries (members whose normalised path escapes the target dir, and unsafe symlink targets) and sha256-mismatch downloads; any failure removes the partially-populated target dir. - Regular files, directories, and safe symlinks are supported; other tar types (hardlinks, devices, fifos) are silently skipped. - After extraction, recomputes sha256 over the on-disk vmlinux and writes the manifest with Source="pull:<url>". Daemon methods (internal/daemon/kernels.go): - KernelPull(ctx, {Name, Force}) - lookup in embedded catalog, refuse overwrite unless Force, delegate to kernelcat.Fetch. - KernelCatalog(ctx) - return the embedded catalog annotated per-entry with whether it has been pulled locally. RPC: kernel.pull, kernel.catalog dispatch cases. CLI: - `banger kernel pull <name> [--force]`. - `banger kernel list --available` prints the catalog with a pulled/available STATE column and a human-readable size. Tests: fetch round-trip (extract + manifest + sha256), sha256 mismatch rejection with cleanup, missing-vmlinux rejection, path-traversal rejection, HTTP error propagation, catalog parsing, lookup, pulled-status reconciliation. All 20 packages green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 15:05:42 -03:00
Thales Maciel	7192ba24ae	Phase 3: banger kernel import bridges make--kernel.sh output `banger kernel import <name> --from <dir>` copies a staged kernel bundle into the local catalog. <dir> is the output of `make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/ or build/manual/alpine-kernel/). kernelcat.DiscoverPaths locates artifacts under <dir>: 1. Prefers metadata.json (written by make-void-kernel.sh). 2. Falls back to globbing: boot/vmlinux- or vmlinuz-* (Alpine fallback), boot/initramfs-*, lib/modules/<latest>. The daemon's KernelImport copies kernel + optional initrd via system.CopyFilePreferClone and modules via system.CopyDirContents (no-sudo mode — catalog lives under ~/.local/state), computes SHA256 over the kernel, and writes the manifest via kernelcat.WriteLocal. While wiring this up, fixed a latent bug in system.CopyDirContents: filepath.Join(sourceDir, ".") silently drops the trailing dot, so `cp -a source source/contents target/` was copying the whole source directory (including its basename) instead of just its contents. Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts (the only existing caller) silently benefits. scripts/register-void-image.sh and scripts/register-alpine-image.sh are rewritten to use `banger kernel import … && banger image register --kernel-ref …` instead of the find-and-pass-paths dance. Preserves the same user-facing commands and env vars. Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz fallback, kernel-missing error, round-trip copy into the catalog, and the --from required flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:53:49 -03:00
Thales Maciel	48e3a938cf	Phase 2: image register --kernel-ref resolves through the catalog `banger image register --kernel-ref <name>` now substitutes for the --kernel/--initrd/--modules triple. The daemon looks the name up via kernelcat.ReadLocal under d.layout.KernelsDir, populates the three paths from the resolved entry, then continues through the existing validate/persist flow unchanged. Passing both --kernel-ref and any of --kernel/--initrd/--modules is rejected — at the CLI layer (before starting the daemon) and defensively at the RPC layer. A missing catalog entry produces a clear "run 'banger kernel list'" message. Once registered, the image stores the resolved absolute paths, so deleting the catalog entry later does not invalidate already-registered images — managed image build still copies the kernel into its artifact dir per imagemgr.StageBootArtifacts. Tests cover: resolution success (absolute KernelPath populated from catalog), mutual-exclusion rejection, and missing-entry error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-16 14:25:50 -03:00

1 2 3

124 commits