Turn the stable workspace surface into five documented, runnable stories with a shared guest-backed smoke runner, new docs/use-cases recipes, and Make targets for cold-start validation, repro/fix loops, parallel workspaces, untrusted inspection, and review/eval workflows. Bump the package and catalog surface to 3.6.0, update the main docs to point users from the stable workspace walkthrough into the recipe index and smoke packs, and mark the 3.6.0 roadmap milestone done. Fix a regression uncovered by the real parallel-workspaces smoke: workspace_file_read must not bump last_activity_at. Verified with uv lock, UV_CACHE_DIR=.uv-cache make check, UV_CACHE_DIR=.uv-cache make dist-check, and USE_CASE_ENVIRONMENT=debian:12 UV_CACHE_DIR=.uv-cache make smoke-use-cases.
41 lines
1.2 KiB
Markdown
41 lines
1.2 KiB
Markdown
# Review And Evaluation Workflows
|
|
|
|
Recommended profile: `workspace-full`
|
|
|
|
Smoke target:
|
|
|
|
```bash
|
|
make smoke-review-eval
|
|
```
|
|
|
|
Use this flow when an agent needs to read a checklist interactively, run an
|
|
evaluation script, checkpoint or reset its changes, and export the final report.
|
|
|
|
Canonical SDK flow:
|
|
|
|
```python
|
|
from pyro_mcp import Pyro
|
|
|
|
pyro = Pyro()
|
|
created = pyro.create_workspace(environment="debian:12", seed_path="./review-fixture")
|
|
workspace_id = str(created["workspace_id"])
|
|
|
|
pyro.create_snapshot(workspace_id, "pre-review")
|
|
shell = pyro.open_shell(workspace_id)
|
|
pyro.write_shell(workspace_id, shell["shell_id"], input="cat CHECKLIST.md")
|
|
pyro.read_shell(
|
|
workspace_id,
|
|
shell["shell_id"],
|
|
plain=True,
|
|
wait_for_idle_ms=300,
|
|
)
|
|
pyro.close_shell(workspace_id, shell["shell_id"])
|
|
pyro.exec_workspace(workspace_id, command="sh review.sh")
|
|
pyro.export_workspace(workspace_id, "review-report.txt", output_path="./review-report.txt")
|
|
pyro.reset_workspace(workspace_id, snapshot="pre-review")
|
|
pyro.delete_workspace(workspace_id)
|
|
```
|
|
|
|
This is the stable shell-facing story: readable PTY output for chat loops,
|
|
checkpointed evaluation, explicit export, and reset when a review branch goes
|
|
sideways.
|