Thales Maciel 79a7d71d3b Align use-case smokes with canonical workspace recipes

The 3.10.0 milestone was about making the advertised smoke pack trustworthy enough to act like a real release gate. The main drift was in the repro-plus-fix scenario: the recipe docs were SDK-first, but the smoke still shelled out to CLI patch apply and asserted a human summary string.\n\nSwitch the smoke runner to use the structured SDK patch flow directly, remove the harness-only CLI dependency, and tighten the fake smoke tests so they prove the same structured path the docs recommend. This keeps smoke failures tied to real user-facing regressions instead of human-output formatting drift.\n\nPromote make smoke-use-cases as the trustworthy guest-backed verification path in the top-level docs, bump the release surface to 3.10.0, and mark the roadmap milestone done.\n\nValidation:\n- uv lock\n- UV_CACHE_DIR=.uv-cache uv run pytest --no-cov tests/test_workspace_use_case_smokes.py\n- UV_CACHE_DIR=.uv-cache make check\n- UV_CACHE_DIR=.uv-cache make dist-check\n- USE_CASE_ENVIRONMENT=debian:12 UV_CACHE_DIR=.uv-cache make smoke-use-cases

2026-03-13 13:30:52 -03:00

6.5 KiB

Raw Blame History

LLM Chat Ergonomics Roadmap

This roadmap picks up after the completed workspace GA plan and focuses on one goal:

make the core agent-workspace use cases feel trivial from a chat-driven LLM interface.

Current baseline is 3.10.0:

the stable workspace contract exists across CLI, SDK, and MCP
one-shot pyro run still exists as the narrow entrypoint
workspaces already support seeding, sync push, exec, export, diff, snapshots, reset, services, PTY shells, secrets, network policy, and published ports
stopped-workspace disk tools now exist, but remain explicitly secondary

What "Trivial In Chat" Means

The roadmap is done only when a chat-driven LLM can cover the main use cases without awkward shell choreography or hidden host-side glue:

cold-start repo validation
repro plus fix loops
parallel isolated workspaces for multiple issues or PRs
unsafe or untrusted code inspection
review and evaluation workflows

More concretely, the model should not need to:

patch files through shell-escaped printf or heredoc tricks
rely on opaque workspace IDs without a discovery surface
consume raw terminal control sequences as normal shell output
choose from an unnecessarily large tool surface when a smaller profile would work

The remaining UX friction for a technically strong new user is now narrower:

the generic MCP guidance is strong, but Codex and OpenCode still ask the user to translate the generic config into host-specific setup steps
workspace-core is clearly the recommended profile, but pyro mcp serve and create_server() still default to workspace-full for 3.x compatibility

Locked Decisions

keep the workspace product identity central; do not drift toward CI, queue, or runner abstractions
keep disk tools secondary and do not make them the main chat-facing surface
prefer narrow tool profiles and structured outputs over more raw shell calls
capability milestones should update CLI, SDK, and MCP together
CLI-only ergonomics are allowed when the SDK and MCP surfaces already have the structured behavior natively
every milestone below must also update docs, help text, runnable examples, and at least one real smoke scenario

Milestones

3.2.0 Model-Native Workspace File Ops - Done
3.3.0 Workspace Naming And Discovery - Done
3.4.0 Tool Profiles And Canonical Chat Flows - Done
3.5.0 Chat-Friendly Shell Output - Done
3.6.0 Use-Case Recipes And Smoke Packs - Done
3.7.0 Handoff Shortcuts And File Input Sources - Done
3.8.0 Chat-Host Onramp And Recommended Defaults - Done
3.9.0 Content-Only Reads And Human Output Polish - Done
3.10.0 Use-Case Smoke Trust And Recipe Fidelity - Done
3.11.0 Host-Specific MCP Onramps
4.0.0 Workspace-Core Default Profile

Completed so far:

3.2.0 added model-native workspace file * and workspace patch apply so chat-driven agents can inspect and edit /workspace without shell-escaped file mutation flows.
3.3.0 added workspace names, key/value labels, workspace list, workspace update, and last_activity_at tracking so humans and chat-driven agents can rediscover and resume the right workspace without external notes.
3.4.0 added stable MCP/server tool profiles with vm-run, workspace-core, and workspace-full, plus canonical profile-based OpenAI and MCP examples so chat hosts can start narrow and widen only when needed.
3.5.0 added chat-friendly shell reads with plain-text rendering and idle batching so PTY sessions are readable enough to feed directly back into a chat model.
3.6.0 added recipe docs and real guest-backed smoke packs for the five core workspace use cases so the stable product is now demonstrated as repeatable end-to-end stories instead of only isolated feature surfaces.
3.7.0 removed the remaining shell glue from canonical CLI workspace flows with --id-only, --text-file, and --patch-file, so the shortest handoff path no longer depends on python -c extraction or $(cat ...) expansion.
3.8.0 made workspace-core the obvious first MCP/chat-host profile from the first help and docs pass while keeping workspace-full as the 3.x compatibility default.
3.9.0 added content-only workspace file and disk reads plus cleaner default human-mode transcript separation for files that do not end with a trailing newline.
3.10.0 aligned the five guest-backed use-case smokes with their recipe docs and promoted make smoke-use-cases as the trustworthy verification path for the advertised workspace flows.

Planned next:

3.11.0 adds exact host-specific onramps for Claude, Codex, and OpenCode so a new chat-host user can copy one known-good config or command instead of translating the generic MCP example by hand.
4.0.0 flips the default MCP profile from workspace-full to workspace-core so the no-flag server entrypoint finally matches the recommended docs path, while keeping explicit opt-in access to the full advanced surface.

Expected Outcome

After this roadmap, the product should still look like an agent workspace, not like a CI runner with more isolation.

The intended model-facing shape is:

one-shot work starts with vm_run
persistent work moves to a small workspace-first contract
file edits are structured and model-native
workspace discovery is human and model-friendly
shells are readable in chat
CLI handoff paths do not depend on ad hoc shell parsing
the recommended chat-host profile is obvious from the first MCP example
the documented smoke pack is trustworthy enough to use as a release gate
major chat hosts have copy-pasteable MCP setup examples instead of only a generic config template
human-mode content reads are copy-paste safe
the default bare MCP server entrypoint matches the recommended narrow profile
the five core use cases are documented and smoke-tested end to end

6.5 KiB Raw Blame History