Commit graph

3 commits

Author SHA1 Message Date
7b7f7e676c
Harden VM stop cleanup for stale snapshots
Stop and delete could fail with device-mapper busy errors when the persisted Firecracker PID was stale or the kernel needed longer to release the root snapshot.

Rediscover a live Firecracker process by API socket during cleanup, kill and wait on that PID instead of trusting only the stored runtime PID, and extend dm snapshot removal retries for transient busy handles.

Add daemon regressions for stale-runtime reconcile, rediscovered process cleanup, and repeated busy dm removal. Validate with go test ./..., make build, and a live ./banger vm stop debug-ssh run that now exits cleanly.
2026-03-18 12:28:15 -03:00
60294e8c90
Fix VM lifecycle issues behind verify.sh
Make the Firecracker and bangerd processes outlive short-lived CLI request contexts so vm create no longer kills the VMM or daemon as soon as the RPC returns.

Fix fresh-VM SSH by flattening the seeded /root work disk when the copied home tree lands under a nested root/ directory, and write a guest sshd override to keep root pubkey auth explicit while debugging.

Harden teardown and smoke diagnostics: verify.sh now reports early Firecracker exit and delete failures directly, while dm snapshot cleanup tolerates already-gone handles and retries busy mapper removal long enough for Firecracker to release the device.

Validation: go test ./..., make build, bash -n verify.sh, direct SSH against a fresh VM, and a live ./verify.sh run that now completes with [verify] ok.
2026-03-17 14:43:09 -03:00
375900cf65
Rollback partial dm snapshot startup
Prevent partial VM startup failures from leaking loop devices and dm state on the host.

Move root snapshot setup into a rollback-safe helper that records loop and mapper handles incrementally, tears them down in reverse order on failure, and reuses the same dm/loop cleanup path during normal runtime teardown. Also switch the daemon runner field to a small command-runner interface so the snapshot path can be tested with injected failures.

Add failure-injection coverage for losetup, blockdev, dmsetup, partial teardown, and joined rollback errors. Validated with go test ./... and make build.
2026-03-16 14:06:17 -03:00