pyro-mcp/docs/roadmap/llm-chat-ergonomics.md
Thales Maciel 535efc6919 Add project-aware chat startup defaults
Make repo-root chat startup native by letting MCP servers carry a default project source for workspace creation. When a chat host starts from a Git checkout, workspace_create can now omit seed_path and inherit the server startup source; explicit --project-path and clean-clone --repo-url/--repo-ref paths are supported as fallbacks.

Add project startup resolution and materialization, surface origin_kind/origin_ref in workspace_seed, update chat-host docs and the repro/fix smoke to use project-aware workspace creation, and switch dist-check to uv run pyro so verification stays stable after uv reinstalls.

Validated with uv lock, focused startup/server/CLI pytest coverage, UV_CACHE_DIR=.uv-cache make check, UV_CACHE_DIR=.uv-cache make dist-check, and real guest-backed smokes for both explicit project_path and bare repo-root auto-detection.
2026-03-13 15:51:47 -03:00

8.7 KiB

LLM Chat Ergonomics Roadmap

This roadmap picks up after the completed workspace GA plan and focuses on one goal:

make the core agent-workspace use cases feel trivial from a chat-driven LLM interface.

Current baseline is 4.1.0:

  • pyro mcp serve is now the default product entrypoint
  • workspace-core is now the default MCP profile
  • one-shot pyro run still exists as the terminal companion path
  • workspaces already support seeding, sync push, exec, export, diff, snapshots, reset, services, PTY shells, secrets, network policy, and published ports
  • host-specific onramps exist for Claude Code, Codex, and OpenCode
  • the five documented use cases are now recipe-backed and smoke-tested
  • stopped-workspace disk tools now exist, but remain explicitly secondary

What "Trivial In Chat" Means

The roadmap is done only when a chat-driven LLM can cover the main use cases without awkward shell choreography or hidden host-side glue:

  • cold-start repo validation
  • repro plus fix loops
  • parallel isolated workspaces for multiple issues or PRs
  • unsafe or untrusted code inspection
  • review and evaluation workflows

More concretely, the model should not need to:

  • patch files through shell-escaped printf or heredoc tricks
  • rely on opaque workspace IDs without a discovery surface
  • consume raw terminal control sequences as normal shell output
  • choose from an unnecessarily large tool surface when a smaller profile would work

The next gaps for the narrowed persona are now product-shaping rather than raw capability gaps:

  • starting from the current repo still needs to feel native from the first chat
  • host setup and repair still lean on manual commands and config copying
  • reviewing what the agent actually did still requires digging through several surfaces
  • the five use cases exist as docs and smokes, not yet as explicit product modes
  • daily reset and retry loops can still feel heavier than they should

Locked Decisions

  • keep the workspace product identity central; do not drift toward CI, queue, or runner abstractions
  • keep disk tools secondary and do not make them the main chat-facing surface
  • prefer narrow tool profiles and structured outputs over more raw shell calls
  • optimize the MCP/chat-host path first and keep the CLI companion path good enough to validate and debug it
  • lower-level SDK and repo substrate work can continue, but they should not drive milestone scope or naming
  • CLI-only ergonomics are allowed when the SDK and MCP surfaces already have the structured behavior natively
  • prioritize repo-aware startup, trust, and daily-loop speed before adding more low-level workspace surface area
  • breaking changes are acceptable while there are still no users and the chat-host product is still being shaped
  • every milestone below must also update docs, help text, runnable examples, and at least one real smoke scenario

Milestones

  1. 3.2.0 Model-Native Workspace File Ops - Done
  2. 3.3.0 Workspace Naming And Discovery - Done
  3. 3.4.0 Tool Profiles And Canonical Chat Flows - Done
  4. 3.5.0 Chat-Friendly Shell Output - Done
  5. 3.6.0 Use-Case Recipes And Smoke Packs - Done
  6. 3.7.0 Handoff Shortcuts And File Input Sources - Done
  7. 3.8.0 Chat-Host Onramp And Recommended Defaults - Done
  8. 3.9.0 Content-Only Reads And Human Output Polish - Done
  9. 3.10.0 Use-Case Smoke Trust And Recipe Fidelity - Done
  10. 3.11.0 Host-Specific MCP Onramps - Done
  11. 4.0.0 Workspace-Core Default Profile - Done
  12. 4.1.0 Project-Aware Chat Startup - Done
  13. 4.2.0 Host Bootstrap And Repair - Planned
  14. 4.3.0 Reviewable Agent Output - Planned
  15. 4.4.0 Opinionated Use-Case Modes - Planned
  16. 4.5.0 Faster Daily Loops - Planned

Completed so far:

  • 3.2.0 added model-native workspace file * and workspace patch apply so chat-driven agents can inspect and edit /workspace without shell-escaped file mutation flows.
  • 3.3.0 added workspace names, key/value labels, workspace list, workspace update, and last_activity_at tracking so humans and chat-driven agents can rediscover and resume the right workspace without external notes.
  • 3.4.0 added stable MCP/server tool profiles with vm-run, workspace-core, and workspace-full, plus canonical profile-based OpenAI and MCP examples so chat hosts can start narrow and widen only when needed.
  • 3.5.0 added chat-friendly shell reads with plain-text rendering and idle batching so PTY sessions are readable enough to feed directly back into a chat model.
  • 3.6.0 added recipe docs and real guest-backed smoke packs for the five core workspace use cases so the stable product is now demonstrated as repeatable end-to-end stories instead of only isolated feature surfaces.
  • 3.7.0 removed the remaining shell glue from canonical CLI workspace flows with --id-only, --text-file, and --patch-file, so the shortest handoff path no longer depends on python -c extraction or $(cat ...) expansion.
  • 3.8.0 made workspace-core the obvious first MCP/chat-host profile from the first help and docs pass while keeping workspace-full as the 3.x compatibility default.
  • 3.9.0 added content-only workspace file and disk reads plus cleaner default human-mode transcript separation for files that do not end with a trailing newline.
  • 3.10.0 aligned the five guest-backed use-case smokes with their recipe docs and promoted make smoke-use-cases as the trustworthy verification path for the advertised workspace flows.
  • 3.11.0 added exact host-specific MCP onramps for Claude Code, Codex, and OpenCode so new chat-host users can copy one known-good setup example instead of translating the generic MCP config manually.
  • 4.0.0 flipped the default MCP/server profile to workspace-core, so the bare entrypoint now matches the recommended narrow chat-host profile across CLI, SDK, and package-level factories.
  • 4.1.0 made repo-root startup native for chat hosts, so bare pyro mcp serve can auto-detect the current Git checkout and let the first workspace_create omit seed_path, with explicit --project-path and --repo-url fallbacks when cwd is not the source of truth.

Planned next:

Expected Outcome

After this roadmap, the product should still look like an agent workspace, not like a CI runner with more isolation.

The intended model-facing shape is:

  • one-shot work starts with vm_run
  • persistent work moves to a small workspace-first contract
  • file edits are structured and model-native
  • workspace discovery is human and model-friendly
  • shells are readable in chat
  • CLI handoff paths do not depend on ad hoc shell parsing
  • the recommended chat-host profile is obvious from the first MCP example
  • the documented smoke pack is trustworthy enough to use as a release gate
  • major chat hosts have copy-pasteable MCP setup examples instead of only a generic config template
  • human-mode content reads are copy-paste safe
  • the default bare MCP server entrypoint matches the recommended narrow profile
  • the five core use cases are documented and smoke-tested end to end
  • starting from the current repo feels native from the first chat-host setup
  • supported hosts can be connected or repaired without manual config spelunking
  • users can review one concise summary of what the agent changed and ran
  • the main workflows feel like named modes instead of one giant reference
  • reset and retry loops are fast enough to encourage daily use