The eager "fetch once to surface network errors" loop in Pull was
opening each layer's Compressed() stream and immediately closing it
without draining. The go-containerregistry filesystem cache populates
lazily via tee-on-read — opening and closing without reading wrote
ZERO-BYTE blobs into the cache. Every subsequent pull of the same
digest then served those corrupted blobs, producing a 1 GiB ext4
containing nothing but banger's injected files.
Symptom caught during B-4 live verification: real debian:bookworm
pulls had 43 used inodes (out of 65536) and /usr contained only
/usr/local — the debian content was silently missing.
Fix: remove the eager-fetch loop entirely. Flatten naturally drains
layers when it reads them, and the cache populates correctly on that
path. Network errors now surface from Flatten instead of Pull, which
is fine — they surface at the same place they always had to.
Test TestPullCachesLayersAndReturnsImage → TestPullResolvesImageAnd
FlattenPopulatesCache, reworded to assert the new contract: Pull
resolves the image; Flatten is what populates the cache with
non-empty blobs.
Users with a corrupted cache from a pre-fix pull must clear it:
rm -rf ~/.cache/banger/oci
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New internal/imagepull/assets/first-boot.sh: POSIX-sh oneshot that
detects the guest distro from /etc/os-release (ID + ID_LIKE
fallback), installs openssh-server via the native package manager,
and enables/starts sshd. Covers debian/ubuntu/kali/raspbian/pop,
alpine, fedora/rhel/centos/rocky/almalinux, arch/manjaro, and
opensuse/suse. Unknown distros fail clearly with a pointer at
editing the script to add a branch.
Marker-driven: the service has ConditionPathExists=
/var/lib/banger/first-boot-pending, and the script removes the
marker on success. Subsequent boots no-op.
Testability seams in the script: RUN_PLAN=1 skips the
sshd-already-present short-circuit and makes the dispatch echo the
planned command instead of executing it. OS_RELEASE_FILE and
BANGER_FIRST_BOOT_MARKER env vars override paths so the Go tests
exercise the real dispatch logic in a tempdir without touching
/etc or /var/lib on the host.
Embedding: internal/imagepull/firstboot.go go:embeds both the
script and the systemd unit; exposes FirstBootScript() and
FirstBootUnit() plus the FirstBootScriptPath /
FirstBootMarkerPath / FirstBootUnitName constants.
Injection: InjectGuestAgents now drops /usr/local/libexec/
banger-first-boot (0755), /etc/systemd/system/banger-first-boot.
service (0644), the empty /var/lib/banger/first-boot-pending
marker (0644), and the multi-user.target.wants enable symlink.
All uid=0, gid=0.
Tests: eight-case dispatch-by-distro (debian, ubuntu, alpine,
fedora, arch, opensuse, plus ID_LIKE fallbacks for weird
derivatives). Script syntax check via `sh -n`. Unit-contains-
expected-fields check. Existing inject round-trip test extended
to assert the first-boot bits land in the ext4.
Deferred: per-image FirstBootPending flag + extended SSH wait
timeout at VM start. Will add if live verification (B-4) shows
the naive retry UX is unacceptable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New imagepull.InjectGuestAgents writes banger's guest-side assets
straight into the pulled ext4 so systemd will start them at first boot:
/usr/local/bin/banger-vsock-agent (binary, 0755)
/usr/local/libexec/banger-network-bootstrap (script, 0755)
/etc/systemd/system/banger-network.service (unit, 0644)
/etc/systemd/system/banger-vsock-agent.service (unit, 0644)
/etc/modules-load.d/banger-vsock.conf (modules, 0644)
plus enable-at-boot symlinks under
/etc/systemd/system/multi-user.target.wants/
All writes + ownership + symlinks go through one `debugfs -w -f -`
invocation. No sudo required because the caller owns the ext4 file.
Script is deterministic: shallow-first mkdir, then write, then sif,
then symlink. "File exists" errors from mkdir on already-present
dirs are tolerated (debugfs keeps going past them with -f, and we
filter them out of the output scan).
Asset content reuses the existing guestnet.BootstrapScript /
SystemdServiceUnit / ConfigPath and vsockagent.ServiceUnit /
ModulesLoadConfig / GuestInstallPath — one source of truth, no
duplicated systemd unit strings.
Daemon wiring: new d.finalizePulledRootfs seam runs both
ApplyOwnership (B-1) and InjectGuestAgents as one phase between
BuildExt4 and StageBootArtifacts. The companion vsock-agent binary
is resolved via paths.CompanionBinaryPath. Existing daemon tests
stub the seam with a no-op to avoid needing a real companion
binary + debugfs in the test harness.
Tests: real-ext4 round-trip that builds a minimal ext4, runs
InjectGuestAgents, then verifies every expected path is present
via `debugfs stat`, plus uid=0 and mode 0755 on the vsock-agent
binary. Also: missing-binary rejection, ancestor-collection order
test. debugfs/mkfs.ext4 tests skip on hosts without the binaries.
After B-1+B-2, any OCI image that already ships sshd boots with
banger-network and banger-vsock-agent running; image pull is
one step from "useful rootfs primitive". B-3 (first-boot sshd
install) unlocks images that don't ship sshd.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
imagepull.Flatten now captures per-file uid/gid/mode/type from the
tar headers as it walks layers, returning a Metadata map alongside
the extracted tree. Whiteouts correctly drop the victim's metadata.
The returned Metadata feeds the new imagepull.ApplyOwnership, which
pipes a batched `set_inode_field` script to `debugfs -w -f -`.
Why: mkfs.ext4 -d copies the runner's on-disk uids verbatim, so
without this pass setuid binaries become setuid-nonroot and sshd
refuses to start on the resulting image. With the pass, a pulled
debian:bookworm has /usr/bin/sudo with uid=0 + setuid bit surviving
intact.
imagepull.BuildExt4 signature unchanged; ownership is applied as a
separate step by the daemon orchestrator between BuildExt4 and
StageBootArtifacts, keeping each helper focused. The seam
(d.pullAndFlatten) now returns (Metadata, error) for test stubs to
feed synthetic metadata.
StdinRunner is a new duck-typed extension next to CommandRunner;
the real system.Runner implements RunStdin, test mocks don't need
to unless they exercise stdin. Prevents every existing mock from
growing a new method.
Tests:
- TestFlattenCapturesHeaderMetadata: setuid bit + mode survive the
tar-header walk
- TestApplyOwnershipRewritesUidGidMode: real debugfs round-trip —
create ext4 with runner's uid, apply synthetic metadata setting
uid=0 + setuid mode, verify via `debugfs -R stat` that the
inode now has uid=0 and mode 04755
- TestBuildOwnershipScriptDeterministic: sorted, well-formed
sif script output
Debugfs and mkfs.ext4 tests skip if the binaries aren't on PATH.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New docs/oci-import.md covers the full Phase A story:
- end-user flow (kernel pull + image pull + image list)
- what works now (layer replay + whiteouts, path-traversal
hardening, content-aware sizing, layer caching, composition
with image build)
- what does not work yet (direct boot due to ownership
caveat, private registries, non-amd64 platforms)
- architecture of internal/imagepull + the daemon orchestrator
- path layout (OCI cache, staging, published)
- tech debt: the three plausible ownership-fixup approaches
(debugfs, hcsshim/tar2ext4, user namespaces) with honest
trade-offs for Phase B to choose from later
- trust model (digest chain covers transport; signature
verification out of scope)
README.md gains an image pull example alongside image register
+ --kernel-ref, with a pointer to the docs and an honest "pulled
images are a base for image build, not yet directly bootable"
warning.
AGENTS.md gets the one-line note pointing at the new doc.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Container (and kernel) layers routinely ship symlinks with absolute
targets — /usr/bin/mawk, /lib/modules/<ver>/build, etc. Those are
interpreted relative to the rootfs at runtime (`/` inside the VM),
not against the host filesystem, so they are rooted inside dest by
construction and need no escape check at write time.
The previous logic resolved absolute Linknames literally (against
the host root), compared to the staging dir, and rejected everything
that didn't happen to live under it. That made `banger image pull
docker.io/library/debian:bookworm` fail on the very first symlink
("etc/alternatives/awk -> /usr/bin/mawk").
Relative targets still get the traversal check — a relative
Linkname with ../s can genuinely escape dest at write time even if
in-VM resolution would be safe — so the defense against malicious
relative chains is intact.
Tests:
- TestFlattenAcceptsAbsoluteSymlink replaces the old overly-strict
test, using the exact etc/alternatives/awk -> /usr/bin/mawk case
that broke debian:bookworm.
- TestFlattenRejectsRelativeSymlinkEscape confirms relative-with-
traversal is still rejected with the same "unsafe symlink"
error.
Same fix applied in internal/kernelcat/fetch.go for consistency;
future kernel bundles with absolute symlinks in the modules tree
would otherwise hit the same wall.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newImagePullCommand mirrors newImageRegisterCommand with a positional
<oci-ref> arg, the same kernel-ref / direct-paths flag set + mutual
exclusion, plus --size that parses human-friendly values via
model.ParseSize before crossing the RPC boundary.
Calls "image.pull" RPC, prints the resulting image summary on success.
Long help warns about the Phase A bootability gap (ownership not
preserved; suitable as `image build` base, not yet directly bootable).
CLI test confirms image pull is registered with the expected flags.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
(d *Daemon).PullImage downloads an OCI image, flattens it into an
ext4 rootfs, and registers the result as a managed banger image.
Flow (internal/daemon/images_pull.go):
1. Parse + validate the OCI ref via go-containerregistry/name.
2. Derive a friendly default name from the ref ("debian-bookworm")
when --name is omitted.
3. Reject if an image with that name already exists.
4. Resolve kernel info via the new shared resolveKernelInputs
helper (refactored out of RegisterImage); ValidateKernelPaths
checks the kernel triple alone.
5. Acquire imageOpsMu, generate a fresh image id, and stage at
<ImagesDir>/<id>.staging.
6. imagepull.Pull → cache layers under OCICacheDir;
imagepull.Flatten → temp rootfs tree under os.TempDir (so the
state filesystem doesn't temporarily double in size).
7. Default size: max(treeSize × 1.25, 1 GiB); --size override
accepted.
8. imagepull.BuildExt4 produces the rootfs.ext4 in the staging dir.
9. imagemgr.StageBootArtifacts stages the kernel/initrd/modules
into the same dir (reused unchanged).
10. Atomic os.Rename(staging, finalDir) publishes the artifact dir.
11. Persist model.Image with Managed=true. Failure at any step
removes the staging dir; failure post-rename removes finalDir.
The pullAndFlatten field on Daemon is the test seam: tests stub it
to write a fixture tree into destDir and skip the real registry.
Refactor: extracted the "kernel-ref vs direct paths" resolution
out of RegisterImage into d.resolveKernelInputs so PullImage and
RegisterImage share one source of truth for that policy. Split
ValidateRegisterPaths into a kernel-only ValidateKernelPaths so
PullImage (which produces the rootfs itself) can validate just
the kernel triple without the rootfs check.
API: ImagePullParams { Ref, Name, KernelPath, InitrdPath,
ModulesDir, KernelRef, SizeBytes }. RPC dispatch case image.pull
mirrors image.register.
Tests cover: happy-path producing a managed image with all four
artifacts present + staging cleaned up, name-collision rejection,
missing-kernel rejection, and staging cleanup on a failed pull.
defaultImageNameFromRef handles tag/digest/no-suffix cases.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New internal/imagepull/ subpackage. Three concerns, each
independently testable:
Pull (imagepull.go):
- github.com/google/go-containerregistry's remote.Image with the
linux/amd64 platform pinned. Anonymous pulls only for v1.
- Layer blobs cached on disk via cache.NewFilesystemCache under
<cacheDir>/blobs/sha256/<hex> — OCI-standard layout so
skopeo/crane could co-exist later.
- Eagerly touches every layer once so network errors surface at
Pull time, not deep in Flatten.
Flatten (flatten.go):
- Replays layers oldest-first into destDir.
- Whiteout-aware: .wh.<name> deletes the named entry,
.wh..wh..opq wipes the parent directory's contents from prior
layers.
- Path-traversal hardening mirrored from kernelcat extractTar:
reject .., absolute paths, and symlinks/hardlinks whose
resolved target escapes destDir.
- Handles tar.TypeReg, TypeDir, TypeSymlink, TypeLink. Skips
device/fifo nodes silently (need privilege; udev/devtmpfs
handles them in the guest).
BuildExt4 (ext4.go):
- Truncates outFile to sizeBytes, then runs `mkfs.ext4 -F -d
<srcDir> -E root_owner=0:0`. No mount, no sudo, no loopback.
- 64 MiB floor; callers handle real sizing with content-aware
headroom.
- File ownership in the resulting ext4 reflects srcDir's on-disk
ownership — runner's uid/gid since extraction was unprivileged.
Documented in package doc as a Phase A v1 limitation; Phase B
will add a debugfs- or tar2ext4-based ownership fixup.
paths.Layout gains OCICacheDir at $XDG_CACHE_HOME/banger/oci/,
ensured at startup alongside the other dirs.
Tests use go-containerregistry's in-process registry to push and
pull synthetic multi-layer images. Cover: layer caching round-trip,
whiteout + opaque-marker handling, path-traversal rejection, unsafe
symlink rejection, real mkfs.ext4 round-trip (skipped if mkfs.ext4
absent), and tiny-size rejection.
go-containerregistry v0.21.5 added as a direct dep, plus its
transitive closure (containerd/stargz, opencontainers/go-digest,
docker/cli config helpers, etc).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three small operational improvements.
1. Makefile build dependencies now cover everything under cmd/ and
internal/, not just *.go. The previous GO_SOURCES find pattern
missed embedded assets (catalog.json today, anything else added
later), so editing a JSON manifest didn't trigger a rebuild and
left the binary stale. New BUILD_INPUTS covers all files; go's own
build cache absorbs any redundant invocations. GO_SOURCES is kept
for fmt/lint targets which still want only Go files.
2. New `make lint` (default + lint-go + lint-shell):
- lint-go: gofmt -l (fail if any output) and go vet ./...
- lint-shell: shellcheck --severity=error on scripts/*.sh
The shell floor is set at error-level for now; the legacy
make-rootfs-*.sh / make-*-kernel.sh / customize.sh scripts have
warning-level findings (sudo-cat redirects, heredoc quoting) that
would block landing this if we tightened immediately. Documented
as tech debt in docs/kernel-catalog.md alongside a note about
eventually replacing the per-distro bash with a uniform Go tool.
3. gofmt drift fixed in internal/daemon/imagemgr/build.go,
session/session.go, and vm_create_ops.go (trailing newline +
gofmt's preferred function-definition wrapping). Now
`make lint` passes cleanly; future drift will fail CI/local lint
instead of accumulating.
AGENTS.md gains a one-line note on make lint.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Manual publish flow for the kernel catalog, designed for the current
no-CI, private-repo state of banger.
scripts/publish-kernel.sh <name>:
- Reads $BANGER_KERNELS_DIR/<name>/ (the canonical layout produced by
`banger kernel import`).
- Pulls distro / arch / kernel_version from the local manifest.
- Packages vmlinux + optional initrd.img + optional modules/ as
<name>-<arch>.tar.zst with zstd -19.
- Computes sha256 + size.
- rclone copyto -> r2:banger-kernels/<file>.
- HEAD-checks https://kernels.thaloco.com/<file> to catch
public-access misconfig before declaring success.
- jq-patches internal/kernelcat/catalog.json: replaces any prior
entry with the same name, then sorts entries by name.
- Prints next-step git+make commands; does not commit or rebuild
automatically.
Environment overrides RCLONE_REMOTE / RCLONE_BUCKET / BASE_URL /
BANGER_KERNELS_DIR for non-default setups.
docs/kernel-catalog.md covers the architecture (embedded JSON +
external tarballs), end-user flow, the add/update/remove playbook,
naming and tarball-layout conventions, the trust model (sha256 in
embedded catalog catches transport/swap; no signing yet), and where
the bucket lives.
README.md gains a kernel-catalog example next to the existing image
register example. AGENTS.md points at publish-kernel.sh and the docs.
.gitignore now excludes .env so accidental drops of R2 credentials
don't follow into commits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces the headline feature of the kernel catalog: pulling a kernel
bundle over HTTP without any local build step.
Catalog format (internal/kernelcat/catalog.go):
- Catalog { Version, Entries } + CatEntry { Name, Distro, Arch,
KernelVersion, TarballURL, TarballSHA256, SizeBytes, Description }.
- catalog.json is embedded via go:embed and ships with each banger
binary. It starts empty (Phase 5's CI pipeline will populate it).
- Lookup(name) returns the matching entry or os.ErrNotExist.
Fetch (internal/kernelcat/fetch.go):
- HTTP GET with streaming SHA256 over the response body.
- zstd-decode (github.com/klauspost/compress/zstd) -> tar extract into
<kernelsDir>/<name>/.
- Hardens against path-traversal tarball entries (members whose
normalised path escapes the target dir, and unsafe symlink
targets) and sha256-mismatch downloads; any failure removes the
partially-populated target dir.
- Regular files, directories, and safe symlinks are supported; other
tar types (hardlinks, devices, fifos) are silently skipped.
- After extraction, recomputes sha256 over the on-disk vmlinux and
writes the manifest with Source="pull:<url>".
Daemon methods (internal/daemon/kernels.go):
- KernelPull(ctx, {Name, Force}) - lookup in embedded catalog, refuse
overwrite unless Force, delegate to kernelcat.Fetch.
- KernelCatalog(ctx) - return the embedded catalog annotated per-entry
with whether it has been pulled locally.
RPC: kernel.pull, kernel.catalog dispatch cases.
CLI:
- `banger kernel pull <name> [--force]`.
- `banger kernel list --available` prints the catalog with a
pulled/available STATE column and a human-readable size.
Tests: fetch round-trip (extract + manifest + sha256), sha256 mismatch
rejection with cleanup, missing-vmlinux rejection, path-traversal
rejection, HTTP error propagation, catalog parsing, lookup,
pulled-status reconciliation. All 20 packages green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`banger kernel import <name> --from <dir>` copies a staged kernel
bundle into the local catalog. <dir> is the output of
`make void-kernel` or `make alpine-kernel` (build/manual/void-kernel/
or build/manual/alpine-kernel/).
kernelcat.DiscoverPaths locates artifacts under <dir>:
1. Prefers metadata.json (written by make-void-kernel.sh).
2. Falls back to globbing: boot/vmlinux-* or vmlinuz-* (Alpine
fallback), boot/initramfs-*, lib/modules/<latest>.
The daemon's KernelImport copies kernel + optional initrd via
system.CopyFilePreferClone and modules via system.CopyDirContents
(no-sudo mode — catalog lives under ~/.local/state), computes SHA256
over the kernel, and writes the manifest via kernelcat.WriteLocal.
While wiring this up, fixed a latent bug in system.CopyDirContents:
filepath.Join(sourceDir, ".") silently drops the trailing dot, so
`cp -a source source/contents target/` was copying the whole source
directory (including its basename) instead of just its contents.
Replaced the join with a manual "/." suffix. imagemgr.StageBootArtifacts
(the only existing caller) silently benefits.
scripts/register-void-image.sh and scripts/register-alpine-image.sh
are rewritten to use `banger kernel import … && banger image register
--kernel-ref …` instead of the find-and-pass-paths dance. Preserves
the same user-facing commands and env vars.
Tests cover: metadata.json preference, glob fallback, Alpine vmlinuz
fallback, kernel-missing error, round-trip copy into the catalog, and
the --from required flag.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`banger image register --kernel-ref <name>` now substitutes for the
--kernel/--initrd/--modules triple. The daemon looks the name up via
kernelcat.ReadLocal under d.layout.KernelsDir, populates the three
paths from the resolved entry, then continues through the existing
validate/persist flow unchanged.
Passing both --kernel-ref and any of --kernel/--initrd/--modules is
rejected — at the CLI layer (before starting the daemon) and
defensively at the RPC layer. A missing catalog entry produces a clear
"run 'banger kernel list'" message.
Once registered, the image stores the resolved absolute paths, so
deleting the catalog entry later does not invalidate already-registered
images — managed image build still copies the kernel into its artifact
dir per imagemgr.StageBootArtifacts.
Tests cover: resolution success (absolute KernelPath populated from
catalog), mutual-exclusion rejection, and missing-entry error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces a read/write kernel catalog on disk without any network
dependency, so later phases (image register --kernel-ref, import, pull)
can build on a working foundation.
Layout: adds KernelsDir to paths.Layout, ensured under
~/.local/state/banger/kernels/. Each cataloged kernel lives at
<KernelsDir>/<name>/ with a manifest.json alongside vmlinux and optional
initrd.img / modules/.
New internal/kernelcat package owns the disk format:
- Entry (Name, Distro, Arch, KernelVersion, SHA256, Source, ImportedAt)
- ValidateName (alphanumeric + dots/hyphens/underscores, no traversal)
- ReadLocal / ListLocal / WriteLocal / DeleteLocal
- SumFile helper
The daemon exposes three RPC methods dispatched in daemon.go:
kernel.list, kernel.show, kernel.delete. Implementations live in a new
internal/daemon/kernels.go and are thin wrappers over kernelcat using
d.layout.KernelsDir.
CLI: new top-level `banger kernel` with list / show / rm subcommands
mirroring the image-command pattern (ensureDaemon, RPC call, table or
JSON output). No sudo required — kernel ops are user-space only.
Users can now manually populate ~/.local/state/banger/kernels/<name>/
and see it via `banger kernel list`. Phase 2 wires --kernel-ref into
image register; Phase 3 adds `banger kernel import`; Phase 4 adds
remote pulls.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
internal/daemon/doc.go and ARCHITECTURE.md were written before the
subpackage extractions and still referenced old structure (in-progress
phrasing, missing opstate/dmsnap/fcproc/imagemgr/session/workspace,
mentions of opRegistry by its old name). Both now describe the current
shape: composition root + six leaf subpackages, lock ordering rooted
at vmLocks[id], and the one intra-package dependency (workspace →
session for ShellQuote + FormatStepError).
README.md and AGENTS.md mark the local web UI as experimental. It is
still enabled by default at 127.0.0.1:7777, but the docs now state
plainly that its surface is not stable or hardened and not intended for
anything beyond single-user localhost use. AGENTS.md also points at
ARCHITECTURE.md for the subpackage layout.
No code changes; tests still green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves the stateless parts of the workspace subsystem into
internal/daemon/workspace:
- RepoSpec struct + InspectRepo for host-side git inspection
- ImportRepoToGuest (taking a minimal GuestClient interface) with the
full-copy and metadata-only / shallow-overlay paths
- FinalizeScript, PrepareRepoCopy, ResolveSourcePath
- ListSubmodules, ListOverlayPaths, ParsePrepareMode
- Git helpers (GitOutput, GitTrimmedOutput, GitResolvedConfigValue,
ParseNullSeparatedOutput, RunHostCommand, GitFileURL) and the
HostCommandOutputFunc test seam
- ShallowFetchDepth const
The subpackage imports internal/daemon/session for ShellQuote and
FormatStepError so both workspace and session pure helpers live in
their own subpackages with a clean session→workspace direction of use.
daemon/workspace.go shrinks from 481 → 156 LOC, keeping just the three
orchestrator methods (Export, Prepare, prepareLocked) that still touch
d.store, d.FindVM, d.dialGuest, d.waitForGuestSSH, and the VM lock set.
guestSessionHostCommandOutputFunc is removed from guest_sessions.go (its
only caller was workspace.go; the new package has its own copy).
All tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves the stateless parts of the guest-session subsystem into
internal/daemon/session:
- consts (BackendSSH, attach/transport kinds, StateRoot, LogTailLineDefault)
- StateSnapshot plus ParseState / InspectStateFromDir / ApplyStateSnapshot / StateChanged
- 10 on-guest path helpers (StateDir, StdoutLogPath, StdinPipePath, …)
- 3 bash script generators (Script, InspectScript, SignalScript)
- small utilities (ShellQuote, ExitCode, CloneStringMap, TailFileContent,
ProcessAlive + syscallKill test seam, FormatStepError)
- launch helpers (DefaultName, DefaultCWD, FailLaunch,
NormalizeRequiredCommands, CWDPreflightScript, CommandPreflightScript,
AttachInputCommand, AttachTailCommand, EnvLines)
Callers inside the daemon package import the new package under the
alias "sess" to avoid colliding with the local `session model.GuestSession`
variables threaded through the orchestrator code. guest_sessions.go
shrinks from 616 → 156 LOC; session_stream.go, session_attach.go,
session_lifecycle.go, workspace.go, and guest_sessions_test.go rewire to
the exported names.
The orchestrator methods (StartGuestSession, BeginGuestSessionAttach,
SendToGuestSession, GuestSessionLogs, refresh/inspect, sessionRegistry,
guestSessionController) stay on *Daemon. Full Manager-style extraction
would need prerequisite phases (operation protocol, workdisk helpers),
mirroring Phase 4a's trade-off.
All tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves the stateless helpers of the image subsystem into
internal/daemon/imagemgr:
paths.go — path validators (ValidateRegisterPaths,
ValidatePromotePaths), artifact staging (StageBootArtifacts,
StageOptionalArtifactPath), metadata (BuildMetadataPackages,
WritePackagesMetadata).
build.go — ResizeRootfs, WriteBuildLog, and the full guest
provisioning script generator (BuildProvisionScript, BuildModulesCommand
and all private script-append helpers) along with the mise/tmux/opencode
version constants.
The orchestrator methods (BuildImage, RegisterImage, PromoteImage,
DeleteImage, runImageBuildNative) stay on *Daemon: they still touch
d.store, d.imageOpsMu, d.beginOperation, capability hooks, and
fcproc-wrapped Daemon helpers — extracting them needs prerequisite
phases (operation protocol, workdisk helpers, tap pool). This commit is
strictly the pure-helper extraction that can land cleanly today.
imagebuild.go shrinks from 453 -> 225 LOC (half gone). images.go shrinks
from 450 -> 374 LOC. imagebuild_test.go updated to call the exported
imagemgr.BuildProvisionScript. Zero behavior change; all tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moves the host-side firecracker primitives — bridge setup, socket dir,
binary resolution, tap creation, socket chown, PID lookup, resolve,
ctrl-alt-del, wait-for-exit, SIGKILL — plus the shared
ErrWaitForExitTimeout sentinel and a small waitForPath helper into
internal/daemon/fcproc.
Manager is stateless beyond its runner + config + logger. The daemon
package keeps thin forwarders (d.ensureBridge, d.createTap, etc.) so no
call site or test changes. A d.fc() helper builds a Manager on demand
from Daemon state, which lets tests keep constructing &Daemon{...}
literals without wiring fcproc explicitly.
This unblocks Phase 4 (imagemgr extraction): imagebuild.go's dependence
on d.createTap/d.firecrackerBinary/etc. can now be satisfied by
importing fcproc instead of reaching back to *Daemon.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two leaves of the daemon package that carry no back-references to Daemon
move out:
- internal/daemon/opstate: generic Registry[T AsyncOp]. The AsyncOp
interface methods are capitalised (ID, IsDone, UpdatedAt, Cancel);
vmCreateOperationState and imageBuildOperationState implement it.
- internal/daemon/dmsnap: Create, Cleanup, Remove plus the Handles type
for device-mapper snapshot lifecycle. Takes an explicit Runner
interface. The daemon-package snapshot.go keeps thin forwarders and a
type alias so existing call sites and tests are untouched.
Skipped on purpose: tap_pool has too many Daemon-scoped dependencies
(config, store, closing, createTap) for a clean extraction at this
stage; nat.go is already a thin facade over internal/hostnat;
dns_routing.go tests tightly couple to package internals, so extraction
would be more churn than payoff. Each can be revisited when a
subsystem-level refactor forces the boundary.
All tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Daemon no longer owns a coarse mu shared across unrelated concerns.
Each subsystem now carries its own state and lock:
- tapPool: entries, next, and mu move onto a new tapPool struct.
- sessionRegistry: sessionControllers + its mutex move off Daemon.
- opRegistry[T asyncOp]: generic registry collapses the two ad-hoc
vm-create and image-build operation maps (and their mutexes) into one
shared type; the Begin/Status/Cancel/Prune methods simplify.
- vmLockSet: the sync.Map of per-VM mutexes moves into its own type;
lockVMID forwards.
- Daemon.mu splits into imageOpsMu (image-registry mutations) and
createVMMu (CreateVM serialisation) so image ops and VM creates no
longer block each other.
Lock ordering collapses to vmLocks[id] -> {createVMMu, imageOpsMu} ->
subsystem-local leaves. doc.go and ARCHITECTURE.md updated.
No behavior change; tests green.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vm.go (1529 LOC) splits into vm_create, vm_lifecycle, vm_set, vm_stats,
vm_disk, vm_authsync; firecracker/DNS/helpers stay in vm.go.
guest_sessions.go (1266 LOC) splits into session_controller,
session_lifecycle, session_attach, session_stream; scripts and helpers
stay in guest_sessions.go.
Mechanical move only. No behavior change. Adds doc.go and
ARCHITECTURE.md capturing subsystem map and current lock ordering as
the baseline for the upcoming subsystem extraction.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use context.Background() for resolveFirecrackerPID so a cancelled
request context (client disconnect) doesn't prevent tracking the
spawned Firecracker process, leaving it orphaned on cleanup.
Drop ne.Temporary() check in accept loop; deprecated since Go 1.18
and unreliable. Retry on any net.Error instead.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
vm.go: Firecracker was launched with context.Background() instead of
the incoming request ctx. A cancelled or timed-out VM creation request
could not stop mid-flight Firecracker process spawning, leaving an
orphaned process and leaked resources. Replace the four firecrackerCtx
uses with ctx directly; the local variable is removed.
guest_sessions.go / daemon.go: sessionControllers map was lazily
initialized with a nil-check inside every mutating method. With d.mu
held this isn't a data race, but the pattern is fragile — any new
method that writes to the map without copying the guard can panic.
Initialize the map once in Open() alongside the other daemon maps and
channels, and remove the redundant nil-checks from setGuestSessionController
and claimGuestSessionController.
The old pattern held vmLocksMu to get/create a *sync.Mutex, then
released vmLocksMu before calling lock.Lock(). In the gap between
the two operations a concurrent goroutine could observe the entry,
and any future cleanup path that deleted map entries could let a
third goroutine create a fresh *sync.Mutex for the same ID — leaving
two callers holding independent locks with no mutual exclusion.
Fix: replace the manual map + vmLocksMu pair with sync.Map and
LoadOrStore. LoadOrStore is atomic at the map level: exactly one
*sync.Mutex wins for each VM ID, with no release-then-reacquire
gap between the lookup and the insert. vmLocksMu is removed.
- MIT LICENSE (2026 Thales Maciel)
- .gitignore: replace broad /build/ with explicit /build/bin/ and
build/manual/ so large manual rootfs/kernel artifacts are clearly
excluded; add *.pem, *.key, id_rsa
- README: add Security section documenting intentional
PermitRootLogin yes / StrictModes no in guest sshd and the
network boundary that makes it acceptable
Without base_commit, export diffs against the current guest HEAD.
If the worker ran git commit inside the VM, HEAD advanced and the
diff came back empty — committed work was silently lost.
With base_commit set to the head_commit from workspace.prepare,
the diff uses that fixed point instead. After git add -A the index
holds the full working state, so git diff --cached <base_commit>
captures everything: committed deltas (HEAD moved past base) and
any uncommitted changes on top, in one patch, applied with the
same git apply flow.
- WorkspaceExportParams gains base_commit
- WorkspaceExportResult echoes back the ref actually used
- CLI gains --base-commit flag
- Tests assert scripts use the caller-supplied ref and that
omitting it falls back to HEAD
guest.session.send — write to a pipe-mode session's stdin without
holding the exclusive attach. The daemon dials a fresh SSH connection,
uploads the payload to a temp file, and cats it into the session's
named FIFO. Linux atomicity for writes ≤ PIPE_BUF covers all pi RPC
JSONL lines. Attach exclusivity is unchanged.
vm.workspace.export — pull changes from guest back to host. Runs
`git add -A && git diff --cached HEAD --binary` inside the guest via a
new RunScriptOutput helper on guest.Client (stdout-only capture,
distinct from RunScript which merges stderr). Returns a binary-safe
patch and a list of changed files. CLI writes the patch to stdout for
`| git apply` or to a file via --output.
RunScriptOutput is implemented as a direct SSH session (same pattern as
runSession) rather than going through StartCommand/StreamSession to
avoid closing the underlying Client, which is required since
ExportVMWorkspace calls it twice on the same connection.
New files: internal/daemon/workspace_test.go
Provisioning was still installing `claude` and `pi` through a separate
npm-global prefix even after the guest images had switched to `mise` for
Node and opencode. That left two competing install paths and made the
runtime layout harder to reason about.
Switch the Debian and Void image setup flows to install `claude` and `pi`
as `mise` npm tools, assert their shims exist after `mise reshim`, and
symlink `node`, `npm`, `opencode`, `claude`, and `pi` directly from the
mise shim directory into `/usr/local/bin`.
Update the imagebuild test expectations and bump the Void rootfs default
size to 4G so the larger default toolset still fits reliably.
Guest session cwd and command preflight helpers were emitting literal
`\\n` separators, so the guest shell saw malformed one-line scripts and
could fail `preflight_cwd` even when `/root/repo` already existed.
Replace those builders with real newlines, and fix the nearby attach
helper commands that were making the same mistake.
Add a small daemon guest-SSH seam so workspace preparation and session
start can share a fake backend in tests, then cover the regression with
an end-to-end daemon test for `PrepareVMWorkspace` followed by
`StartGuestSession` on `/root/repo`.
Validation: `GOCACHE=/tmp/banger-gocache go test ./internal/daemon` and
`GOCACHE=/tmp/banger-gocache go test ./...`.
Add daemon-backed workspace and guest-session primitives so host
orchestrators can prepare /root/repo, launch long-lived guest commands,
and attach to pipe-mode sessions over the local stdio mux bridge.
Persist richer session metadata and launch diagnostics, preflight guest
cwd/command requirements, make pipe-mode attach rehydratable from guest
state after daemon restart, and allow submodules when workspace prepare
runs in full_copy mode.
At the same time, stop vm run from auto-attaching opencode, make it
print next-step commands instead, and make glibc guest images more
agent-ready by installing node, opencode, claude, and pi while syncing
opencode/claude/pi auth files into work disks on VM start.
Validation:
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
- banger vm workspace prepare --help
- banger vm session --help
- banger vm session start --help
- banger vm session attach --help
Replace the old `void-exp` repository defaults with `void` so the Make targets,
registration helper, example config, verification messaging, and sample test
fixtures all line up with the new managed image name.
Keep the scope to repo-facing naming only: config overrides, helper output, and
test fixtures now expect `void`, while runtime compatibility for existing local
`void-exp` VMs remains an operational concern outside this commit.
Validation: go test ./..., make build, and a local `banger vm create --image void`
smoke boot with ssh and opencode ports up.
Replace the stale `RUNTIME_DIR` mkdir in the experimental Void kernel helper with
creation of the parent directory for `OUT_DIR`, which is the current
BANGER_MANUAL_DIR/custom --out-dir flow used by the Make target.
This restores `make void-kernel` without requiring an extra environment override.
Validation: make void-kernel ARGS='--out-dir /tmp/banger-void-kernel-verify-$$'.
Normalize repo-backed guest checkouts to /root/repo so vm run, attach, and
follow-on guest tooling stop depending on the source repository name.
Add `banger vm acp [--cwd] <vm>` as an SSH stdio bridge to guest `opencode acp`,
defaulting to /root/repo when that checkout exists and falling back to /root.
Update the README and CLI coverage around the fixed guest path and ACP command.
Validation: go test ./internal/cli, go test ./..., make build.
Make `banger ps` a true alias of `banger vm list` and add `banger vm ls`
and `banger vm ps` so the common listing entrypoints all share one path.
Default the shared list command to running VMs only, add `--all` to include
stopped entries, `--latest` to keep only the newest matching VM, and `--quiet`
to print full VM IDs without the table renderer.
Cover the alias wiring plus the running/latest/quiet helpers in CLI tests.
Validation: go test ./internal/cli; GOCACHE=/tmp/banger-gocache go test ./...;
make build; ./build/bin/banger ps --help; ./build/bin/banger vm ls --help.
Capture the repository preference that shell facing tools should consume
files when they support them instead of large inline strings.
Add explicit guidance for prompt files, temporary files, and git commit
message files so future automation avoids quoting bugs and stays aligned
with the vm run harness and commit workflows.
Bring the vm run documentation back in line with the current behavior.
Explain that vm run now starts a best effort guest tooling harness,
prefers a host side opencode attach session when the local client
supports it, and falls back to guest opencode over SSH otherwise.
Also note that the harness runs asynchronously and logs inside the guest.
Speed up first use of repo backed VMs by bootstrapping obvious tools before
the best effort LLM harness runs.
Add a host side tooling plan for pinned Go, Node, Python, and Rust versions,
summarize that plan in the uploaded prompt, and run repo mise install plus
guest global mise use -g --pin steps before the bounded opencode inspection.
Keep the harness non fatal, prefer host opencode attach when the client
supports it, fall back to guest opencode over SSH for older clients, and
cover the new flow with CLI plus planner tests.
Validation:
- go test ./internal/cli ./internal/toolingplan
- GOCACHE=/tmp/banger-gocache go test ./...
- make build
Replace the post-boot full-history git bundle path with a shallow repo copy so vm run no longer spends its quiet time shipping and cloning every object in the source repository.
Stage a depth-10 no-checkout clone from the host repo, fetch the requested checkout commit only when it is outside the shallow window, rewrite origin back to the host repo's origin URL, and keep the existing guest checkout plus working-tree overlay behavior.
Add explicit [vm run] progress lines after [vm create] ready so the user can see the SSH wait, shallow repo prep, guest copy, overlay, and opencode attach phases instead of a silent pause.
Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, and a local payload comparison showing the banger repo dropping from a ~400 MB full bundle to a ~294 KB shallow metadata copy.
Populate guest /root/.gitconfig from host git config --global during work-disk preparation so plain VM shells can commit.
Resolve user.name and user.email from the source repo for vm run and write them only into the imported checkout, preserving repo-specific identity overrides.
Update mounted guest .gitconfig through a host temp file plus sudo install instead of direct git config --file writes, since the mounted root-owned work disk blocks Git lockfile creation.
Validated with GOCACHE=/tmp/banger-gocache go test ./..., make build, and a live alpine vm create smoke check for guest git config.
Treat `banger`, `bangerd`, and `banger-vsock-agent` as one release by
stamping the same version, commit SHA, and build timestamp into every
binary through a shared ldflag-backed `internal/buildinfo` package.
Add `banger version`, extend daemon ping/status to report the running
daemon's build tuple, and keep the guest helper linked to the same build
metadata without adding a new public version surface for it.
Validate with `GOCACHE=/tmp/banger-gocache go test ./...`, `make build`,
`./build/bin/banger version`, and `./build/bin/banger daemon status`
after the daemon restarts onto the new binary.
Make daemon startup sync a managed `Host *.vm` block into `~/.ssh/config` so plain `ssh root@<vm>.vm` uses banger's managed key and the same publickey-only options as `banger vm ssh`.
Write the block directly instead of relying on a separate include file so it still applies when a user's SSH config ends inside another `Host` stanza, and remove the legacy managed include path. Add daemon tests that cover fresh config creation and managed-block replacement while preserving user entries.
Validate with `go test ./...`, `make build`, `ssh -G alp.vm`, and `ssh alp.vm true`.
Banger was already serving VM records on 127.0.0.1:42069, but hosts using systemd-resolved were not routing .vm queries there. That made direct lookups against the local server work while normal host resolution and commands like opencode attach <vm>.vm:4096 failed.\n\nSync resolvectl dns/domain/default-route settings onto the banger bridge when the daemon opens and whenever VM DNS records are published, and revert that bridge-scoped configuration on daemon shutdown. This uses sudo resolvectl because unprivileged resolved reconfiguration on this host requires interactive authentication.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./..., make build, daemon restart, resolvectl dns/domain br-fc, resolvectl query vrum.vm, and curl http://vrum.vm:4096.
Extract the host worktree overlay with tar -o so the guest repo stays owned by root instead of inheriting host UID/GID metadata. That avoids Git's dubious ownership check on /root/<repo> after vm run.\n\nAlso register the guest checkout as a safe.directory during repo setup so opencode and manual git commands can read branch state reliably after attach.\n\nValidation: GOCACHE=/tmp/banger-gocache go test ./... and make build.
Create a CLI-only banger vm run [path] flow that resolves the enclosing git repository, creates a VM, imports a guest checkout, and launches opencode attach automatically from the host.
Build the guest checkout by bundling git history plus the resolved base and head commits, cloning that bundle in the guest, and overlaying tracked plus untracked non-ignored files over SSH so local working-tree changes carry over. Support guest-only branch creation with --branch and --from, reject bare repos and submodules, and add selective tar helpers plus CLI seams to keep the workflow testable.
Validate with go test ./..., make build, banger vm run --help, and the expected --from requires --branch error path.
Refresh guest opencode auth from the host at VM start so guest opencode can reuse the local login without baking secrets into managed images.
Reuse the existing work-disk preparation path to copy ~/.local/share/opencode/auth.json into /root/.local/share/opencode/auth.json with mode 0600, and warn and skip when the host file is missing or unreadable so any existing guest auth stays in place.
Add daemon coverage for copy, replacement, and warn-and-skip cases, document the restart behavior in the README, and validate with go test ./... plus make build. Existing VMs pick the new auth up on their next restart.
Make `banger vm list` easier to scan by resolving each VM image ID back to the registered image name when that mapping is available, while still falling back to a short ID for unknown images.
Raise the shared default VM memory from 1024 MiB to 2048 MiB so new VMs, CLI help, and daemon-side defaults all align on a 2 GiB baseline.
Add CLI coverage for the image-name rendering path and validate the change with go test ./..., make build, `banger vm list`, and `banger vm create --help`.
Replace the noisy rootfs path column in `banger image list` with the current rootfs file size so the table is easier to scan.
Render a ROOTFS SIZE column from the on-disk image size, fall back to `-` when the artifact cannot be statted, and keep the existing image summary output unchanged.
Add CLI coverage for both the formatted size case and the missing-file fallback, then rebuild and check the live command output.