Make the local chat-host loop explicit and cheap so users can warm the machine once instead of rediscovering environment and guest setup on every session. Add cache-backed daily-loop manifests plus the new `pyro prepare` flow, extend `pyro doctor --environment` with warm/cold/stale readiness reporting, and add `make smoke-daily-loop` to prove the warmed repro-fix reset path end to end. Also fix `python -m pyro_mcp.cli` to invoke `main()` so the new smoke and `dist-check` actually exercise the CLI module, and update the docs/roadmap to present `doctor -> prepare -> connect host -> reset` as the recommended daily path. Validation: `uv lock`, `UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make check`, `UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make dist-check`, and `UV_OFFLINE=1 UV_CACHE_DIR=.uv-cache make smoke-daily-loop`.
157 lines
9.1 KiB
Markdown
157 lines
9.1 KiB
Markdown
# LLM Chat Ergonomics Roadmap
|
|
|
|
This roadmap picks up after the completed workspace GA plan and focuses on one
|
|
goal:
|
|
|
|
make the core agent-workspace use cases feel trivial from a chat-driven LLM
|
|
interface.
|
|
|
|
Current baseline is `4.5.0`:
|
|
|
|
- `pyro mcp serve` is now the default product entrypoint
|
|
- `workspace-core` is now the default MCP profile
|
|
- one-shot `pyro run` still exists as the terminal companion path
|
|
- workspaces already support seeding, sync push, exec, export, diff, snapshots,
|
|
reset, services, PTY shells, secrets, network policy, and published ports
|
|
- host-specific onramps exist for Claude Code, Codex, and OpenCode
|
|
- the five documented use cases are now recipe-backed and smoke-tested
|
|
- stopped-workspace disk tools now exist, but remain explicitly secondary
|
|
|
|
## What "Trivial In Chat" Means
|
|
|
|
The roadmap is done only when a chat-driven LLM can cover the main use cases
|
|
without awkward shell choreography or hidden host-side glue:
|
|
|
|
- cold-start repo validation
|
|
- repro plus fix loops
|
|
- parallel isolated workspaces for multiple issues or PRs
|
|
- unsafe or untrusted code inspection
|
|
- review and evaluation workflows
|
|
|
|
More concretely, the model should not need to:
|
|
|
|
- patch files through shell-escaped `printf` or heredoc tricks
|
|
- rely on opaque workspace IDs without a discovery surface
|
|
- consume raw terminal control sequences as normal shell output
|
|
- choose from an unnecessarily large tool surface when a smaller profile would
|
|
work
|
|
|
|
The next gaps for the narrowed persona are now product-shaping rather than raw
|
|
capability gaps:
|
|
|
|
- starting from the current repo still needs to feel native from the first chat
|
|
- host setup and repair still lean on manual commands and config copying
|
|
- reviewing what the agent actually did still requires digging through several
|
|
surfaces
|
|
- the five use cases exist as docs and smokes, not yet as explicit product
|
|
modes
|
|
- daily reset and retry loops can still feel heavier than they should
|
|
|
|
## Locked Decisions
|
|
|
|
- keep the workspace product identity central; do not drift toward CI, queue,
|
|
or runner abstractions
|
|
- keep disk tools secondary and do not make them the main chat-facing surface
|
|
- prefer narrow tool profiles and structured outputs over more raw shell calls
|
|
- optimize the MCP/chat-host path first and keep the CLI companion path good
|
|
enough to validate and debug it
|
|
- lower-level SDK and repo substrate work can continue, but they should not
|
|
drive milestone scope or naming
|
|
- CLI-only ergonomics are allowed when the SDK and MCP surfaces already have the
|
|
structured behavior natively
|
|
- prioritize repo-aware startup, trust, and daily-loop speed before adding more
|
|
low-level workspace surface area
|
|
- breaking changes are acceptable while there are still no users and the
|
|
chat-host product is still being shaped
|
|
- every milestone below must also update docs, help text, runnable examples,
|
|
and at least one real smoke scenario
|
|
|
|
## Milestones
|
|
|
|
1. [`3.2.0` Model-Native Workspace File Ops](llm-chat-ergonomics/3.2.0-model-native-workspace-file-ops.md) - Done
|
|
2. [`3.3.0` Workspace Naming And Discovery](llm-chat-ergonomics/3.3.0-workspace-naming-and-discovery.md) - Done
|
|
3. [`3.4.0` Tool Profiles And Canonical Chat Flows](llm-chat-ergonomics/3.4.0-tool-profiles-and-canonical-chat-flows.md) - Done
|
|
4. [`3.5.0` Chat-Friendly Shell Output](llm-chat-ergonomics/3.5.0-chat-friendly-shell-output.md) - Done
|
|
5. [`3.6.0` Use-Case Recipes And Smoke Packs](llm-chat-ergonomics/3.6.0-use-case-recipes-and-smoke-packs.md) - Done
|
|
6. [`3.7.0` Handoff Shortcuts And File Input Sources](llm-chat-ergonomics/3.7.0-handoff-shortcuts-and-file-input-sources.md) - Done
|
|
7. [`3.8.0` Chat-Host Onramp And Recommended Defaults](llm-chat-ergonomics/3.8.0-chat-host-onramp-and-recommended-defaults.md) - Done
|
|
8. [`3.9.0` Content-Only Reads And Human Output Polish](llm-chat-ergonomics/3.9.0-content-only-reads-and-human-output-polish.md) - Done
|
|
9. [`3.10.0` Use-Case Smoke Trust And Recipe Fidelity](llm-chat-ergonomics/3.10.0-use-case-smoke-trust-and-recipe-fidelity.md) - Done
|
|
10. [`3.11.0` Host-Specific MCP Onramps](llm-chat-ergonomics/3.11.0-host-specific-mcp-onramps.md) - Done
|
|
11. [`4.0.0` Workspace-Core Default Profile](llm-chat-ergonomics/4.0.0-workspace-core-default-profile.md) - Done
|
|
12. [`4.1.0` Project-Aware Chat Startup](llm-chat-ergonomics/4.1.0-project-aware-chat-startup.md) - Done
|
|
13. [`4.2.0` Host Bootstrap And Repair](llm-chat-ergonomics/4.2.0-host-bootstrap-and-repair.md) - Done
|
|
14. [`4.3.0` Reviewable Agent Output](llm-chat-ergonomics/4.3.0-reviewable-agent-output.md) - Done
|
|
15. [`4.4.0` Opinionated Use-Case Modes](llm-chat-ergonomics/4.4.0-opinionated-use-case-modes.md) - Done
|
|
16. [`4.5.0` Faster Daily Loops](llm-chat-ergonomics/4.5.0-faster-daily-loops.md) - Done
|
|
|
|
Completed so far:
|
|
|
|
- `3.2.0` added model-native `workspace file *` and `workspace patch apply` so chat-driven agents
|
|
can inspect and edit `/workspace` without shell-escaped file mutation flows.
|
|
- `3.3.0` added workspace names, key/value labels, `workspace list`, `workspace update`, and
|
|
`last_activity_at` tracking so humans and chat-driven agents can rediscover and resume the right
|
|
workspace without external notes.
|
|
- `3.4.0` added stable MCP/server tool profiles with `vm-run`, `workspace-core`, and
|
|
`workspace-full`, plus canonical profile-based OpenAI and MCP examples so chat hosts can start
|
|
narrow and widen only when needed.
|
|
- `3.5.0` added chat-friendly shell reads with plain-text rendering and idle batching so PTY
|
|
sessions are readable enough to feed directly back into a chat model.
|
|
- `3.6.0` added recipe docs and real guest-backed smoke packs for the five core workspace use
|
|
cases so the stable product is now demonstrated as repeatable end-to-end stories instead of
|
|
only isolated feature surfaces.
|
|
- `3.7.0` removed the remaining shell glue from canonical CLI workspace flows with `--id-only`,
|
|
`--text-file`, and `--patch-file`, so the shortest handoff path no longer depends on `python -c`
|
|
extraction or `$(cat ...)` expansion.
|
|
- `3.8.0` made `workspace-core` the obvious first MCP/chat-host profile from the first help and
|
|
docs pass while keeping `workspace-full` as the 3.x compatibility default.
|
|
- `3.9.0` added content-only workspace file and disk reads plus cleaner default human-mode
|
|
transcript separation for files that do not end with a trailing newline.
|
|
- `3.10.0` aligned the five guest-backed use-case smokes with their recipe docs and promoted
|
|
`make smoke-use-cases` as the trustworthy verification path for the advertised workspace flows.
|
|
- `3.11.0` added exact host-specific MCP onramps for Claude Code, Codex, and OpenCode so new
|
|
chat-host users can copy one known-good setup example instead of translating the generic MCP
|
|
config manually.
|
|
- `4.0.0` flipped the default MCP/server profile to `workspace-core`, so the bare entrypoint now
|
|
matches the recommended narrow chat-host profile across CLI, SDK, and package-level factories.
|
|
- `4.1.0` made repo-root startup native for chat hosts, so bare `pyro mcp serve` can auto-detect
|
|
the current Git checkout and let the first `workspace_create` omit `seed_path`, with explicit
|
|
`--project-path` and `--repo-url` fallbacks when cwd is not the source of truth.
|
|
- `4.2.0` adds first-class host bootstrap and repair helpers so Claude Code,
|
|
Codex, and OpenCode users can connect or repair the supported chat-host path
|
|
without manually composing raw MCP commands or config edits.
|
|
- `4.3.0` adds a concise workspace review surface so users can inspect what the
|
|
agent changed and ran since the last reset without reconstructing the
|
|
session from several lower-level views by hand.
|
|
- `4.4.0` adds named use-case modes so chat hosts can start from `repro-fix`,
|
|
`inspect`, `cold-start`, or `review-eval` instead of choosing from the full
|
|
generic workspace surface first.
|
|
- `4.5.0` adds `pyro prepare`, daily-loop readiness in `pyro doctor`, and a
|
|
real `make smoke-daily-loop` verification path so the local machine warmup
|
|
story is explicit before the chat host connects.
|
|
|
|
## Expected Outcome
|
|
|
|
After this roadmap, the product should still look like an agent workspace, not
|
|
like a CI runner with more isolation.
|
|
|
|
The intended model-facing shape is:
|
|
|
|
- one-shot work starts with `vm_run`
|
|
- persistent work moves to a small workspace-first contract
|
|
- file edits are structured and model-native
|
|
- workspace discovery is human and model-friendly
|
|
- shells are readable in chat
|
|
- CLI handoff paths do not depend on ad hoc shell parsing
|
|
- the recommended chat-host profile is obvious from the first MCP example
|
|
- the documented smoke pack is trustworthy enough to use as a release gate
|
|
- major chat hosts have copy-pasteable MCP setup examples instead of only a
|
|
generic config template
|
|
- human-mode content reads are copy-paste safe
|
|
- the default bare MCP server entrypoint matches the recommended narrow profile
|
|
- the five core use cases are documented and smoke-tested end to end
|
|
- starting from the current repo feels native from the first chat-host setup
|
|
- supported hosts can be connected or repaired without manual config spelunking
|
|
- users can review one concise summary of what the agent changed and ran
|
|
- the main workflows feel like named modes instead of one giant reference
|
|
- reset and retry loops are fast enough to encourage daily use
|