banger/internal/daemon/workspace_service.go
Thales Maciel e47b8146dc
daemon: thread per-RPC op_id end-to-end
Today there's no way to correlate a CLI failure with a daemon log
line. operationLog records relative timing but no id, two concurrent
vm.start calls log indistinguishably, and the async
vmCreateOperationState.ID is user-facing yet never reaches the
journal. The root helper logs plain text to stderr while bangerd
logs JSON, so a merged journalctl is hard to grep across the
trust-boundary split.

Mint a per-RPC op id at dispatch entry, store it on context, and
include it as an "op_id" attr on every operationLog record. The
id is stamped onto every error response (including the early
short-circuit paths bad_version and unknown_method). rpc.Call
forwards the context op id on requests so a daemon RPC and the
helper RPCs it triggers all share one id. The helper now logs
JSON to match bangerd, adopts the inbound id, and emits a single
"helper rpc completed" / "helper rpc failed" line per call so
operators can see at a glance how long each privileged op took.

vmCreateOperationState.ID is now the same id dispatch generated
for vm.create.begin — one identifier between client status polls,
daemon logs, and helper logs.

The wire format gains two optional fields: rpc.Request.OpID and
rpc.ErrorResponse.OpID, both omitempty so older peers (and the
opposite direction) ignore them. ErrorResponse.Error() now appends
"(op-XXXXXX)" to its string form when set; existing callers that
just print err.Error() get the id for free.

Tests cover: dispatch stamps op_id on unknown_method, bad_version,
and handler-returned errors; rpc.Call exposes the typed
*ErrorResponse via errors.As so the CLI can read code/op_id; ctx
op_id is forwarded to the server in the request envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 22:13:44 -03:00

94 lines
4 KiB
Go

package daemon
import (
"context"
"log/slog"
"time"
ws "banger/internal/daemon/workspace"
"banger/internal/model"
"banger/internal/paths"
"banger/internal/store"
"banger/internal/system"
)
// WorkspaceService owns workspace.prepare / workspace.export plus the
// ssh-key + git-identity sync that runs as part of VM start's
// prepare_work_disk capability hook. The workspaceLocks set lives here
// so its scope (serialise concurrent tar imports on the same VM) is
// obvious at the field definition.
//
// The inspect/import test seams are per-service fields so tests inject
// fakes without mutating package-level state.
type WorkspaceService struct {
runner system.CommandRunner
logger *slog.Logger
config model.DaemonConfig
layout paths.Layout
store *store.Store
// workspaceLocks serialises concurrent workspace.prepare /
// workspace.export on the same VM. Separate from vmLocks so slow
// guest I/O doesn't block lifecycle ops.
workspaceLocks vmLockSet
// Peer-service access via narrow function-typed dependencies.
// WorkspaceService doesn't hold pointers to the full VMService or
// HostNetwork; it only sees the exact operations it needs.
vmResolver func(ctx context.Context, idOrName string) (model.VMRecord, error)
aliveChecker func(vm model.VMRecord) bool
waitGuestSSH func(ctx context.Context, address string, interval time.Duration) error
dialGuest func(ctx context.Context, address string) (guestSSHClient, error)
imageResolver func(ctx context.Context, idOrName string) (model.Image, error)
imageWorkSeed func(ctx context.Context, image model.Image, fingerprint string) error
withVMLockByRef func(ctx context.Context, idOrName string, fn func(model.VMRecord) (model.VMRecord, error)) (model.VMRecord, error)
beginOperation func(ctx context.Context, name string, attrs ...any) *operationLog
// repoInspector is the Inspector used by the real InspectRepo /
// ImportRepoToGuest fallbacks when the test seams below aren't
// set. wireServices installs the production one; tests that want
// to intercept only the host-command surface (not the whole
// inspect/import hook) can assign a stub-runner Inspector here.
repoInspector *ws.Inspector
// Test seams.
workspaceInspectRepo func(ctx context.Context, sourcePath, branchName, fromRef string, includeUntracked bool) (ws.RepoSpec, error)
workspaceImport func(ctx context.Context, client ws.GuestClient, spec ws.RepoSpec, guestPath string, mode model.WorkspacePrepareMode) error
}
type workspaceServiceDeps struct {
runner system.CommandRunner
logger *slog.Logger
config model.DaemonConfig
layout paths.Layout
store *store.Store
repoInspector *ws.Inspector
vmResolver func(ctx context.Context, idOrName string) (model.VMRecord, error)
aliveChecker func(vm model.VMRecord) bool
waitGuestSSH func(ctx context.Context, address string, interval time.Duration) error
dialGuest func(ctx context.Context, address string) (guestSSHClient, error)
imageResolver func(ctx context.Context, idOrName string) (model.Image, error)
imageWorkSeed func(ctx context.Context, image model.Image, fingerprint string) error
withVMLockByRef func(ctx context.Context, idOrName string, fn func(model.VMRecord) (model.VMRecord, error)) (model.VMRecord, error)
beginOperation func(ctx context.Context, name string, attrs ...any) *operationLog
}
func newWorkspaceService(deps workspaceServiceDeps) *WorkspaceService {
return &WorkspaceService{
runner: deps.runner,
logger: deps.logger,
config: deps.config,
layout: deps.layout,
store: deps.store,
repoInspector: deps.repoInspector,
vmResolver: deps.vmResolver,
aliveChecker: deps.aliveChecker,
waitGuestSSH: deps.waitGuestSSH,
dialGuest: deps.dialGuest,
imageResolver: deps.imageResolver,
imageWorkSeed: deps.imageWorkSeed,
withVMLockByRef: deps.withVMLockByRef,
beginOperation: deps.beginOperation,
}
}