Introduce explicit repro-fix, inspect, cold-start, and review-eval modes across the MCP server, CLI, and host helpers, with canonical mode-to-tool mappings, narrowed schemas, and mode-specific tool descriptions on top of the existing workspace runtime. Reposition the docs, host onramps, and use-case recipes so named modes are the primary user-facing startup story while the generic no-mode workspace-core path remains the escape hatch, and update the shared smoke runner to validate repro-fix and cold-start through mode-backed servers. Validation: UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache uv run pytest --no-cov tests/test_api.py tests/test_server.py tests/test_host_helpers.py tests/test_public_contract.py tests/test_cli.py tests/test_workspace_use_case_smokes.py; UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make check; UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make dist-check; real guest-backed make smoke-repro-fix-loop smoke-cold-start-validation outside the sandbox.
950 B
950 B
Review And Evaluation Workflows
Recommended mode: review-eval
Recommended startup:
pyro host connect claude-code --mode review-eval
Smoke target:
make smoke-review-eval
Use this flow when an agent needs to read a checklist interactively, run an evaluation script, checkpoint or reset its changes, and export the final report.
Chat-host recipe:
- Create a named snapshot before the review starts.
- Open a readable PTY shell and inspect the checklist interactively.
- Run the review or evaluation script in the same workspace.
- Capture
workspace summaryto review what changed and what to export. - Export the final report.
- Reset back to the snapshot if the review branch goes sideways.
- Delete the workspace when the evaluation is done.
This is the stable shell-facing story: readable PTY output for chat loops, checkpointed evaluation, explicit export, and reset when a review branch goes sideways.