pyro-mcp/docs/roadmap/llm-chat-ergonomics.md
Thales Maciel 79a7d71d3b Align use-case smokes with canonical workspace recipes
The 3.10.0 milestone was about making the advertised smoke pack trustworthy enough to act like a real release gate. The main drift was in the repro-plus-fix scenario: the recipe docs were SDK-first, but the smoke still shelled out to CLI patch apply and asserted a human summary string.\n\nSwitch the smoke runner to use the structured SDK patch flow directly, remove the harness-only CLI dependency, and tighten the fake smoke tests so they prove the same structured path the docs recommend. This keeps smoke failures tied to real user-facing regressions instead of human-output formatting drift.\n\nPromote make smoke-use-cases as the trustworthy guest-backed verification path in the top-level docs, bump the release surface to 3.10.0, and mark the roadmap milestone done.\n\nValidation:\n- uv lock\n- UV_CACHE_DIR=.uv-cache uv run pytest --no-cov tests/test_workspace_use_case_smokes.py\n- UV_CACHE_DIR=.uv-cache make check\n- UV_CACHE_DIR=.uv-cache make dist-check\n- USE_CASE_ENVIRONMENT=debian:12 UV_CACHE_DIR=.uv-cache make smoke-use-cases
2026-03-13 13:30:52 -03:00

6.5 KiB

LLM Chat Ergonomics Roadmap

This roadmap picks up after the completed workspace GA plan and focuses on one goal:

make the core agent-workspace use cases feel trivial from a chat-driven LLM interface.

Current baseline is 3.10.0:

  • the stable workspace contract exists across CLI, SDK, and MCP
  • one-shot pyro run still exists as the narrow entrypoint
  • workspaces already support seeding, sync push, exec, export, diff, snapshots, reset, services, PTY shells, secrets, network policy, and published ports
  • stopped-workspace disk tools now exist, but remain explicitly secondary

What "Trivial In Chat" Means

The roadmap is done only when a chat-driven LLM can cover the main use cases without awkward shell choreography or hidden host-side glue:

  • cold-start repo validation
  • repro plus fix loops
  • parallel isolated workspaces for multiple issues or PRs
  • unsafe or untrusted code inspection
  • review and evaluation workflows

More concretely, the model should not need to:

  • patch files through shell-escaped printf or heredoc tricks
  • rely on opaque workspace IDs without a discovery surface
  • consume raw terminal control sequences as normal shell output
  • choose from an unnecessarily large tool surface when a smaller profile would work

The remaining UX friction for a technically strong new user is now narrower:

  • the generic MCP guidance is strong, but Codex and OpenCode still ask the user to translate the generic config into host-specific setup steps
  • workspace-core is clearly the recommended profile, but pyro mcp serve and create_server() still default to workspace-full for 3.x compatibility

Locked Decisions

  • keep the workspace product identity central; do not drift toward CI, queue, or runner abstractions
  • keep disk tools secondary and do not make them the main chat-facing surface
  • prefer narrow tool profiles and structured outputs over more raw shell calls
  • capability milestones should update CLI, SDK, and MCP together
  • CLI-only ergonomics are allowed when the SDK and MCP surfaces already have the structured behavior natively
  • every milestone below must also update docs, help text, runnable examples, and at least one real smoke scenario

Milestones

  1. 3.2.0 Model-Native Workspace File Ops - Done
  2. 3.3.0 Workspace Naming And Discovery - Done
  3. 3.4.0 Tool Profiles And Canonical Chat Flows - Done
  4. 3.5.0 Chat-Friendly Shell Output - Done
  5. 3.6.0 Use-Case Recipes And Smoke Packs - Done
  6. 3.7.0 Handoff Shortcuts And File Input Sources - Done
  7. 3.8.0 Chat-Host Onramp And Recommended Defaults - Done
  8. 3.9.0 Content-Only Reads And Human Output Polish - Done
  9. 3.10.0 Use-Case Smoke Trust And Recipe Fidelity - Done
  10. 3.11.0 Host-Specific MCP Onramps
  11. 4.0.0 Workspace-Core Default Profile

Completed so far:

  • 3.2.0 added model-native workspace file * and workspace patch apply so chat-driven agents can inspect and edit /workspace without shell-escaped file mutation flows.
  • 3.3.0 added workspace names, key/value labels, workspace list, workspace update, and last_activity_at tracking so humans and chat-driven agents can rediscover and resume the right workspace without external notes.
  • 3.4.0 added stable MCP/server tool profiles with vm-run, workspace-core, and workspace-full, plus canonical profile-based OpenAI and MCP examples so chat hosts can start narrow and widen only when needed.
  • 3.5.0 added chat-friendly shell reads with plain-text rendering and idle batching so PTY sessions are readable enough to feed directly back into a chat model.
  • 3.6.0 added recipe docs and real guest-backed smoke packs for the five core workspace use cases so the stable product is now demonstrated as repeatable end-to-end stories instead of only isolated feature surfaces.
  • 3.7.0 removed the remaining shell glue from canonical CLI workspace flows with --id-only, --text-file, and --patch-file, so the shortest handoff path no longer depends on python -c extraction or $(cat ...) expansion.
  • 3.8.0 made workspace-core the obvious first MCP/chat-host profile from the first help and docs pass while keeping workspace-full as the 3.x compatibility default.
  • 3.9.0 added content-only workspace file and disk reads plus cleaner default human-mode transcript separation for files that do not end with a trailing newline.
  • 3.10.0 aligned the five guest-backed use-case smokes with their recipe docs and promoted make smoke-use-cases as the trustworthy verification path for the advertised workspace flows.

Planned next:

  • 3.11.0 adds exact host-specific onramps for Claude, Codex, and OpenCode so a new chat-host user can copy one known-good config or command instead of translating the generic MCP example by hand.
  • 4.0.0 flips the default MCP profile from workspace-full to workspace-core so the no-flag server entrypoint finally matches the recommended docs path, while keeping explicit opt-in access to the full advanced surface.

Expected Outcome

After this roadmap, the product should still look like an agent workspace, not like a CI runner with more isolation.

The intended model-facing shape is:

  • one-shot work starts with vm_run
  • persistent work moves to a small workspace-first contract
  • file edits are structured and model-native
  • workspace discovery is human and model-friendly
  • shells are readable in chat
  • CLI handoff paths do not depend on ad hoc shell parsing
  • the recommended chat-host profile is obvious from the first MCP example
  • the documented smoke pack is trustworthy enough to use as a release gate
  • major chat hosts have copy-pasteable MCP setup examples instead of only a generic config template
  • human-mode content reads are copy-paste safe
  • the default bare MCP server entrypoint matches the recommended narrow profile
  • the five core use cases are documented and smoke-tested end to end