pyro-mcp/docs/vision.md
Thales Maciel dbb71a3174
Add chat-first workspace roadmap
Document the post-3.1 milestones needed to make the stable workspace product feel natural in chat-driven LLM interfaces.

Add a follow-on roadmap for model-native file ops, workspace naming and discovery, tool profiles, shell output cleanup, and use-case recipes with smoke coverage. Link it from the README, vision doc, and completed workspace GA roadmap so the next phase is explicit.

Keep the sequence anchored to the workspace-first vision and continue to treat disk tools as secondary rather than the main chat-facing surface.
2026-03-12 21:06:14 -03:00

208 lines
5.2 KiB
Markdown

# Vision
`pyro-mcp` should become the disposable sandbox where an agent can do real
development work safely, repeatedly, and reproducibly.
That is a different product from a generic VM wrapper, a secure CI runner, or a
task queue with better isolation.
## Core Thesis
The goal is not just to run one command in a microVM.
The goal is to give an LLM or coding agent a bounded workspace where it can:
- inspect a repo
- install dependencies
- edit files
- run tests
- start and inspect services
- reset and retry
- export patches and artifacts
- destroy the sandbox when the task is done
The sandbox is the execution boundary for agentic software work.
## What This Is Not
`pyro-mcp` should not drift into:
- a YAML pipeline system
- a build farm
- a generic CI job runner
- a scheduler or queueing platform
- a broad VM orchestration product
Those products optimize for queued work, throughput, retries, matrix builds, and
shared infrastructure.
`pyro-mcp` should optimize for agent loops:
- explore
- edit
- test
- observe
- reset
- export
## Why This Can Look Like CI
Any sandbox product starts to look like CI if the main abstraction is:
- submit a command
- wait
- collect logs
- fetch artifacts
That shape is useful, but it is not the center of the vision.
To stay aligned, the primary abstraction should be a workspace the agent
inhabits, not a job the agent submits.
## Product Principles
### Workspace-First
The default mental model should be "open a disposable workspace" rather than
"enqueue a task".
### Stateful Interaction
The product should support repeated interaction in one sandbox. One-shot command
execution matters, but it is the entry point, not the destination.
### Explicit Host Crossing
Anything that crosses the host boundary should be intentional and visible:
- seeding a workspace
- syncing changes in
- exporting artifacts out
- granting secrets or network access
### Reset Over Repair
Agents should be able to checkpoint, reset, and retry cheaply. Disposable state
is a feature, not a limitation.
### Same Contract Across Surfaces
CLI, Python, and MCP should expose the same underlying workspace model so the
product feels coherent no matter how it is consumed.
### Agent-Native Observability
The sandbox should expose the things an agent actually needs to reason about:
- command output
- file diffs
- service status
- logs
- readiness
- exported results
## The Shape Of An LLM-First Sandbox
The strongest future direction is a small, agent-native contract built around
workspaces, shells, files, services, and reset.
Representative primitives:
- `workspace.create`
- `workspace.status`
- `workspace.delete`
- `workspace.sync_push`
- `workspace.export`
- `workspace.diff`
- `workspace.snapshot`
- `workspace.reset`
- `shell.open`
- `shell.read`
- `shell.write`
- `shell.signal`
- `shell.close`
- `workspace.exec`
- `service.start`
- `service.status`
- `service.logs`
- `service.stop`
These names are illustrative, not a committed public API.
The important point is the interaction model:
- a shell session is interactive state inside the sandbox
- a workspace is durable for the life of the task
- services are first-class, not accidental background jobs
- reset is a core workflow primitive
## Interactive Shells And Disk Operations
Interactive shells are aligned with the vision because they make the agent feel
present inside the sandbox rather than reduced to one-shot job submission.
That does not mean `pyro-mcp` should become a raw SSH replacement. The shell
should sit inside a higher-level workspace model with structured file, service,
diff, and reset operations around it.
Disk-level operations are also useful, but they should remain supporting tools.
They are good for:
- fast workspace seeding
- snapshotting
- offline inspection
- diffing
- export/import without a full boot
They should not become the primary product identity. If the center of the
product becomes "operate on VM disks", it will read as image tooling rather
than an agent workspace.
## What To Build Next
Features should be prioritized in this order:
1. Repeated commands in one persistent workspace
2. Interactive shell sessions with PTY semantics
3. Structured workspace sync and export
4. Service lifecycle and readiness checks
5. Snapshot and reset workflows
6. Explicit secrets and network policy
7. Secondary disk-level import/export and inspection tools
The completed workspace GA roadmap lives in
[roadmap/task-workspace-ga.md](roadmap/task-workspace-ga.md).
The next implementation milestones that make those workflows feel natural from
chat-driven LLM interfaces live in
[roadmap/llm-chat-ergonomics.md](roadmap/llm-chat-ergonomics.md).
## Naming Guidance
Prefer language that reinforces the workspace model:
- `workspace`
- `sandbox`
- `shell`
- `service`
- `snapshot`
- `reset`
Avoid centering language that makes the product feel like CI infrastructure:
- `job`
- `runner`
- `pipeline`
- `worker`
- `queue`
- `build matrix`
## Litmus Test
When evaluating a new feature, ask:
"Does this help an agent inhabit a safe disposable workspace and do real
software work inside it?"
If the better description is "it helps submit, schedule, and report jobs", the
feature is probably pushing the product in the wrong direction.