Add workspace summary across the CLI, SDK, and MCP, and include it in the workspace-core profile so chat hosts can review one concise view of the current session. Persist lightweight review events for syncs, file edits, patch applies, exports, service lifecycle, and snapshot activity, then synthesize them with command history, current services, snapshot state, and current diff data since the last reset. Update the walkthroughs, use-case docs, public contract, changelog, and roadmap for 4.3.0, and make dist-check invoke the CLI module directly so local package reinstall quirks do not break the packaging gate. Validation: uv lock; ./.venv/bin/pytest --no-cov tests/test_vm_manager.py tests/test_cli.py tests/test_api.py tests/test_server.py tests/test_public_contract.py tests/test_workspace_use_case_smokes.py; UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make check; UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make dist-check; real guest-backed workspace create -> patch apply -> workspace summary --json -> delete smoke.
8.9 KiB
8.9 KiB
LLM Chat Ergonomics Roadmap
This roadmap picks up after the completed workspace GA plan and focuses on one goal:
make the core agent-workspace use cases feel trivial from a chat-driven LLM interface.
Current baseline is 4.3.0:
pyro mcp serveis now the default product entrypointworkspace-coreis now the default MCP profile- one-shot
pyro runstill exists as the terminal companion path - workspaces already support seeding, sync push, exec, export, diff, snapshots, reset, services, PTY shells, secrets, network policy, and published ports
- host-specific onramps exist for Claude Code, Codex, and OpenCode
- the five documented use cases are now recipe-backed and smoke-tested
- stopped-workspace disk tools now exist, but remain explicitly secondary
What "Trivial In Chat" Means
The roadmap is done only when a chat-driven LLM can cover the main use cases without awkward shell choreography or hidden host-side glue:
- cold-start repo validation
- repro plus fix loops
- parallel isolated workspaces for multiple issues or PRs
- unsafe or untrusted code inspection
- review and evaluation workflows
More concretely, the model should not need to:
- patch files through shell-escaped
printfor heredoc tricks - rely on opaque workspace IDs without a discovery surface
- consume raw terminal control sequences as normal shell output
- choose from an unnecessarily large tool surface when a smaller profile would work
The next gaps for the narrowed persona are now product-shaping rather than raw capability gaps:
- starting from the current repo still needs to feel native from the first chat
- host setup and repair still lean on manual commands and config copying
- reviewing what the agent actually did still requires digging through several surfaces
- the five use cases exist as docs and smokes, not yet as explicit product modes
- daily reset and retry loops can still feel heavier than they should
Locked Decisions
- keep the workspace product identity central; do not drift toward CI, queue, or runner abstractions
- keep disk tools secondary and do not make them the main chat-facing surface
- prefer narrow tool profiles and structured outputs over more raw shell calls
- optimize the MCP/chat-host path first and keep the CLI companion path good enough to validate and debug it
- lower-level SDK and repo substrate work can continue, but they should not drive milestone scope or naming
- CLI-only ergonomics are allowed when the SDK and MCP surfaces already have the structured behavior natively
- prioritize repo-aware startup, trust, and daily-loop speed before adding more low-level workspace surface area
- breaking changes are acceptable while there are still no users and the chat-host product is still being shaped
- every milestone below must also update docs, help text, runnable examples, and at least one real smoke scenario
Milestones
3.2.0Model-Native Workspace File Ops - Done3.3.0Workspace Naming And Discovery - Done3.4.0Tool Profiles And Canonical Chat Flows - Done3.5.0Chat-Friendly Shell Output - Done3.6.0Use-Case Recipes And Smoke Packs - Done3.7.0Handoff Shortcuts And File Input Sources - Done3.8.0Chat-Host Onramp And Recommended Defaults - Done3.9.0Content-Only Reads And Human Output Polish - Done3.10.0Use-Case Smoke Trust And Recipe Fidelity - Done3.11.0Host-Specific MCP Onramps - Done4.0.0Workspace-Core Default Profile - Done4.1.0Project-Aware Chat Startup - Done4.2.0Host Bootstrap And Repair - Done4.3.0Reviewable Agent Output - Done4.4.0Opinionated Use-Case Modes - Planned4.5.0Faster Daily Loops - Planned
Completed so far:
3.2.0added model-nativeworkspace file *andworkspace patch applyso chat-driven agents can inspect and edit/workspacewithout shell-escaped file mutation flows.3.3.0added workspace names, key/value labels,workspace list,workspace update, andlast_activity_attracking so humans and chat-driven agents can rediscover and resume the right workspace without external notes.3.4.0added stable MCP/server tool profiles withvm-run,workspace-core, andworkspace-full, plus canonical profile-based OpenAI and MCP examples so chat hosts can start narrow and widen only when needed.3.5.0added chat-friendly shell reads with plain-text rendering and idle batching so PTY sessions are readable enough to feed directly back into a chat model.3.6.0added recipe docs and real guest-backed smoke packs for the five core workspace use cases so the stable product is now demonstrated as repeatable end-to-end stories instead of only isolated feature surfaces.3.7.0removed the remaining shell glue from canonical CLI workspace flows with--id-only,--text-file, and--patch-file, so the shortest handoff path no longer depends onpython -cextraction or$(cat ...)expansion.3.8.0madeworkspace-corethe obvious first MCP/chat-host profile from the first help and docs pass while keepingworkspace-fullas the 3.x compatibility default.3.9.0added content-only workspace file and disk reads plus cleaner default human-mode transcript separation for files that do not end with a trailing newline.3.10.0aligned the five guest-backed use-case smokes with their recipe docs and promotedmake smoke-use-casesas the trustworthy verification path for the advertised workspace flows.3.11.0added exact host-specific MCP onramps for Claude Code, Codex, and OpenCode so new chat-host users can copy one known-good setup example instead of translating the generic MCP config manually.4.0.0flipped the default MCP/server profile toworkspace-core, so the bare entrypoint now matches the recommended narrow chat-host profile across CLI, SDK, and package-level factories.4.1.0made repo-root startup native for chat hosts, so barepyro mcp servecan auto-detect the current Git checkout and let the firstworkspace_createomitseed_path, with explicit--project-pathand--repo-urlfallbacks when cwd is not the source of truth.4.2.0adds first-class host bootstrap and repair helpers so Claude Code, Codex, and OpenCode users can connect or repair the supported chat-host path without manually composing raw MCP commands or config edits.4.3.0adds a concise workspace review surface so users can inspect what the agent changed and ran since the last reset without reconstructing the session from several lower-level views by hand.
Planned next:
Expected Outcome
After this roadmap, the product should still look like an agent workspace, not like a CI runner with more isolation.
The intended model-facing shape is:
- one-shot work starts with
vm_run - persistent work moves to a small workspace-first contract
- file edits are structured and model-native
- workspace discovery is human and model-friendly
- shells are readable in chat
- CLI handoff paths do not depend on ad hoc shell parsing
- the recommended chat-host profile is obvious from the first MCP example
- the documented smoke pack is trustworthy enough to use as a release gate
- major chat hosts have copy-pasteable MCP setup examples instead of only a generic config template
- human-mode content reads are copy-paste safe
- the default bare MCP server entrypoint matches the recommended narrow profile
- the five core use cases are documented and smoke-tested end to end
- starting from the current repo feels native from the first chat-host setup
- supported hosts can be connected or repaired without manual config spelunking
- users can review one concise summary of what the agent changed and ran
- the main workflows feel like named modes instead of one giant reference
- reset and retry loops are fast enough to encourage daily use