Align use-case smokes with canonical workspace recipes

The 3.10.0 milestone was about making the advertised smoke pack trustworthy enough to act like a real release gate. The main drift was in the repro-plus-fix scenario: the recipe docs were SDK-first, but the smoke still shelled out to CLI patch apply and asserted a human summary string.\n\nSwitch the smoke runner to use the structured SDK patch flow directly, remove the harness-only CLI dependency, and tighten the fake smoke tests so they prove the same structured path the docs recommend. This keeps smoke failures tied to real user-facing regressions instead of human-output formatting drift.\n\nPromote make smoke-use-cases as the trustworthy guest-backed verification path in the top-level docs, bump the release surface to 3.10.0, and mark the roadmap milestone done.\n\nValidation:\n- uv lock\n- UV_CACHE_DIR=.uv-cache uv run pytest --no-cov tests/test_workspace_use_case_smokes.py\n- UV_CACHE_DIR=.uv-cache make check\n- UV_CACHE_DIR=.uv-cache make dist-check\n- USE_CASE_ENVIRONMENT=debian:12 UV_CACHE_DIR=.uv-cache make smoke-use-cases
This commit is contained in:
Thales Maciel 2026-03-13 13:30:52 -03:00
parent cc5f566bcc
commit 79a7d71d3b
12 changed files with 59 additions and 74 deletions

View file

@ -22,7 +22,7 @@ Networking: tun=yes ip_forward=yes
```bash
$ uvx --from pyro-mcp pyro env list
Catalog version: 3.9.0
Catalog version: 3.10.0
debian:12 [installed|not installed] Debian 12 environment with Git preinstalled for common agent workflows.
debian:12-base [installed|not installed] Minimal Debian 12 environment for shell and core Unix tooling.
debian:12-build [installed|not installed] Debian 12 environment with Git and common build tools preinstalled.
@ -126,7 +126,8 @@ snapshots, secrets, network policy, or disk tools.
Once that stable workspace flow works, continue with the five recipe docs in
[use-cases/README.md](use-cases/README.md) or run the real guest-backed smoke packs directly with
`make smoke-use-cases`.
`make smoke-use-cases`. Treat that smoke pack as the trustworthy guest-backed
verification path for the advertised workspace workflows.
When you need repeated commands in one sandbox, switch to `pyro workspace ...`:
@ -153,8 +154,7 @@ $ uvx --from pyro-mcp pyro workspace file read WORKSPACE_ID src/note.txt
hello from synced workspace
[workspace-file-read] workspace_id=... path=/workspace/src/note.txt size_bytes=... truncated=False execution_mode=guest_vsock
$ uvx --from pyro-mcp pyro workspace patch apply WORKSPACE_ID --patch "$(cat fix.patch)"
[workspace-patch] workspace_id=... total=... added=... modified=... deleted=... execution_mode=guest_vsock
$ uvx --from pyro-mcp pyro workspace patch apply WORKSPACE_ID --patch-file fix.patch
$ uvx --from pyro-mcp pyro workspace exec WORKSPACE_ID -- cat src/note.txt
hello from synced workspace
@ -259,7 +259,7 @@ State: started
Use `--seed-path` when the workspace should start from a host directory or a local
`.tar` / `.tar.gz` / `.tgz` archive instead of an empty `/workspace`. Use
`pyro workspace sync push` when you need to import later host-side changes into a started
workspace. Sync is non-atomic in `3.9.0`; if it fails partway through, prefer `pyro workspace reset`
workspace. Sync is non-atomic in `3.10.0`; if it fails partway through, prefer `pyro workspace reset`
to recover from `baseline` or one named snapshot. Use `pyro workspace diff` to compare the current
`/workspace` tree to its immutable create-time baseline, `pyro workspace snapshot *` to create
named checkpoints, and `pyro workspace export` to copy one changed file or directory back to the

View file

@ -85,7 +85,7 @@ uvx --from pyro-mcp pyro env list
Expected output:
```bash
Catalog version: 3.9.0
Catalog version: 3.10.0
debian:12 [installed|not installed] Debian 12 environment with Git preinstalled for common agent workflows.
debian:12-base [installed|not installed] Minimal Debian 12 environment for shell and core Unix tooling.
debian:12-build [installed|not installed] Debian 12 environment with Git and common build tools preinstalled.
@ -171,6 +171,8 @@ When that stable workspace path is working, continue with the recipe index at
[use-cases/README.md](use-cases/README.md). It groups the five core workspace stories and the
real smoke targets behind them, starting with `make smoke-use-cases` or one of the per-scenario
targets such as `make smoke-repro-fix-loop`.
Treat `make smoke-use-cases` as the trustworthy guest-backed verification path for the advertised
workspace workflows.
## 6. Optional demo proof point
@ -294,7 +296,7 @@ the identifier programmatically, use `--id-only` for only the identifier or `--j
workspace payload. Use `--seed-path`
when the workspace should start from a host directory or a local `.tar` / `.tar.gz` / `.tgz`
archive. Use `pyro workspace sync push` for later host-side changes to a started workspace. Sync
is non-atomic in `3.9.0`; if it fails partway through, prefer `pyro workspace reset` to recover
is non-atomic in `3.10.0`; if it fails partway through, prefer `pyro workspace reset` to recover
from `baseline` or one named snapshot. Use `pyro workspace diff` to compare the current workspace
tree to its immutable create-time baseline, `pyro workspace snapshot *` to capture named
checkpoints, and `pyro workspace export` to copy one changed file or directory back to the host. Use

View file

@ -6,7 +6,7 @@ goal:
make the core agent-workspace use cases feel trivial from a chat-driven LLM
interface.
Current baseline is `3.9.0`:
Current baseline is `3.10.0`:
- the stable workspace contract exists across CLI, SDK, and MCP
- one-shot `pyro run` still exists as the narrow entrypoint
@ -35,12 +35,8 @@ More concretely, the model should not need to:
The remaining UX friction for a technically strong new user is now narrower:
- the recommended chat-host onramp is now explicit, but human-mode file reads
still need final transcript polish for copy-paste and chat logs
- the five use-case smokes now exist, but the advertised smoke pack is only as
trustworthy as its weakest scenario and exact recipe fidelity
- generic MCP guidance is strong, but Codex and OpenCode still ask the user to
translate the generic config into host-specific setup steps
- the generic MCP guidance is strong, but Codex and OpenCode still ask the user
to translate the generic config into host-specific setup steps
- `workspace-core` is clearly the recommended profile, but `pyro mcp serve` and
`create_server()` still default to `workspace-full` for `3.x` compatibility
@ -66,7 +62,7 @@ The remaining UX friction for a technically strong new user is now narrower:
6. [`3.7.0` Handoff Shortcuts And File Input Sources](llm-chat-ergonomics/3.7.0-handoff-shortcuts-and-file-input-sources.md) - Done
7. [`3.8.0` Chat-Host Onramp And Recommended Defaults](llm-chat-ergonomics/3.8.0-chat-host-onramp-and-recommended-defaults.md) - Done
8. [`3.9.0` Content-Only Reads And Human Output Polish](llm-chat-ergonomics/3.9.0-content-only-reads-and-human-output-polish.md) - Done
9. [`3.10.0` Use-Case Smoke Trust And Recipe Fidelity](llm-chat-ergonomics/3.10.0-use-case-smoke-trust-and-recipe-fidelity.md)
9. [`3.10.0` Use-Case Smoke Trust And Recipe Fidelity](llm-chat-ergonomics/3.10.0-use-case-smoke-trust-and-recipe-fidelity.md) - Done
10. [`3.11.0` Host-Specific MCP Onramps](llm-chat-ergonomics/3.11.0-host-specific-mcp-onramps.md)
11. [`4.0.0` Workspace-Core Default Profile](llm-chat-ergonomics/4.0.0-workspace-core-default-profile.md)
@ -92,13 +88,11 @@ Completed so far:
docs pass while keeping `workspace-full` as the 3.x compatibility default.
- `3.9.0` added content-only workspace file and disk reads plus cleaner default human-mode
transcript separation for files that do not end with a trailing newline.
- `3.10.0` aligned the five guest-backed use-case smokes with their recipe docs and promoted
`make smoke-use-cases` as the trustworthy verification path for the advertised workspace flows.
Planned next:
- `3.10.0` makes the use-case recipe set fully trustworthy by requiring
`make smoke-use-cases` to pass cleanly, aligning recipe docs with what the
smoke harness actually proves, and removing brittle assertions against
human-mode output when structured results are already available.
- `3.11.0` adds exact host-specific onramps for Claude, Codex, and OpenCode so
a new chat-host user can copy one known-good config or command instead of
translating the generic MCP example by hand.

View file

@ -1,6 +1,6 @@
# `3.10.0` Use-Case Smoke Trust And Recipe Fidelity
Status: Planned
Status: Done
## Goal

View file

@ -28,4 +28,5 @@ uv run python scripts/workspace_use_case_smoke.py --scenario all --environment d
That runner generates its own host fixtures, creates real guest-backed workspaces,
verifies the intended flow, exports one concrete result when relevant, and cleans
up on both success and failure.
up on both success and failure. Treat `make smoke-use-cases` as the trustworthy
guest-backed verification path for the advertised workspace workflows.