Document the post-3.1 milestones needed to make the stable workspace product feel natural in chat-driven LLM interfaces. Add a follow-on roadmap for model-native file ops, workspace naming and discovery, tool profiles, shell output cleanup, and use-case recipes with smoke coverage. Link it from the README, vision doc, and completed workspace GA roadmap so the next phase is explicit. Keep the sequence anchored to the workspace-first vision and continue to treat disk tools as secondary rather than the main chat-facing surface.
208 lines
5.2 KiB
Markdown
208 lines
5.2 KiB
Markdown
# Vision
|
|
|
|
`pyro-mcp` should become the disposable sandbox where an agent can do real
|
|
development work safely, repeatedly, and reproducibly.
|
|
|
|
That is a different product from a generic VM wrapper, a secure CI runner, or a
|
|
task queue with better isolation.
|
|
|
|
## Core Thesis
|
|
|
|
The goal is not just to run one command in a microVM.
|
|
|
|
The goal is to give an LLM or coding agent a bounded workspace where it can:
|
|
|
|
- inspect a repo
|
|
- install dependencies
|
|
- edit files
|
|
- run tests
|
|
- start and inspect services
|
|
- reset and retry
|
|
- export patches and artifacts
|
|
- destroy the sandbox when the task is done
|
|
|
|
The sandbox is the execution boundary for agentic software work.
|
|
|
|
## What This Is Not
|
|
|
|
`pyro-mcp` should not drift into:
|
|
|
|
- a YAML pipeline system
|
|
- a build farm
|
|
- a generic CI job runner
|
|
- a scheduler or queueing platform
|
|
- a broad VM orchestration product
|
|
|
|
Those products optimize for queued work, throughput, retries, matrix builds, and
|
|
shared infrastructure.
|
|
|
|
`pyro-mcp` should optimize for agent loops:
|
|
|
|
- explore
|
|
- edit
|
|
- test
|
|
- observe
|
|
- reset
|
|
- export
|
|
|
|
## Why This Can Look Like CI
|
|
|
|
Any sandbox product starts to look like CI if the main abstraction is:
|
|
|
|
- submit a command
|
|
- wait
|
|
- collect logs
|
|
- fetch artifacts
|
|
|
|
That shape is useful, but it is not the center of the vision.
|
|
|
|
To stay aligned, the primary abstraction should be a workspace the agent
|
|
inhabits, not a job the agent submits.
|
|
|
|
## Product Principles
|
|
|
|
### Workspace-First
|
|
|
|
The default mental model should be "open a disposable workspace" rather than
|
|
"enqueue a task".
|
|
|
|
### Stateful Interaction
|
|
|
|
The product should support repeated interaction in one sandbox. One-shot command
|
|
execution matters, but it is the entry point, not the destination.
|
|
|
|
### Explicit Host Crossing
|
|
|
|
Anything that crosses the host boundary should be intentional and visible:
|
|
|
|
- seeding a workspace
|
|
- syncing changes in
|
|
- exporting artifacts out
|
|
- granting secrets or network access
|
|
|
|
### Reset Over Repair
|
|
|
|
Agents should be able to checkpoint, reset, and retry cheaply. Disposable state
|
|
is a feature, not a limitation.
|
|
|
|
### Same Contract Across Surfaces
|
|
|
|
CLI, Python, and MCP should expose the same underlying workspace model so the
|
|
product feels coherent no matter how it is consumed.
|
|
|
|
### Agent-Native Observability
|
|
|
|
The sandbox should expose the things an agent actually needs to reason about:
|
|
|
|
- command output
|
|
- file diffs
|
|
- service status
|
|
- logs
|
|
- readiness
|
|
- exported results
|
|
|
|
## The Shape Of An LLM-First Sandbox
|
|
|
|
The strongest future direction is a small, agent-native contract built around
|
|
workspaces, shells, files, services, and reset.
|
|
|
|
Representative primitives:
|
|
|
|
- `workspace.create`
|
|
- `workspace.status`
|
|
- `workspace.delete`
|
|
- `workspace.sync_push`
|
|
- `workspace.export`
|
|
- `workspace.diff`
|
|
- `workspace.snapshot`
|
|
- `workspace.reset`
|
|
- `shell.open`
|
|
- `shell.read`
|
|
- `shell.write`
|
|
- `shell.signal`
|
|
- `shell.close`
|
|
- `workspace.exec`
|
|
- `service.start`
|
|
- `service.status`
|
|
- `service.logs`
|
|
- `service.stop`
|
|
|
|
These names are illustrative, not a committed public API.
|
|
|
|
The important point is the interaction model:
|
|
|
|
- a shell session is interactive state inside the sandbox
|
|
- a workspace is durable for the life of the task
|
|
- services are first-class, not accidental background jobs
|
|
- reset is a core workflow primitive
|
|
|
|
## Interactive Shells And Disk Operations
|
|
|
|
Interactive shells are aligned with the vision because they make the agent feel
|
|
present inside the sandbox rather than reduced to one-shot job submission.
|
|
|
|
That does not mean `pyro-mcp` should become a raw SSH replacement. The shell
|
|
should sit inside a higher-level workspace model with structured file, service,
|
|
diff, and reset operations around it.
|
|
|
|
Disk-level operations are also useful, but they should remain supporting tools.
|
|
They are good for:
|
|
|
|
- fast workspace seeding
|
|
- snapshotting
|
|
- offline inspection
|
|
- diffing
|
|
- export/import without a full boot
|
|
|
|
They should not become the primary product identity. If the center of the
|
|
product becomes "operate on VM disks", it will read as image tooling rather
|
|
than an agent workspace.
|
|
|
|
## What To Build Next
|
|
|
|
Features should be prioritized in this order:
|
|
|
|
1. Repeated commands in one persistent workspace
|
|
2. Interactive shell sessions with PTY semantics
|
|
3. Structured workspace sync and export
|
|
4. Service lifecycle and readiness checks
|
|
5. Snapshot and reset workflows
|
|
6. Explicit secrets and network policy
|
|
7. Secondary disk-level import/export and inspection tools
|
|
|
|
The completed workspace GA roadmap lives in
|
|
[roadmap/task-workspace-ga.md](roadmap/task-workspace-ga.md).
|
|
|
|
The next implementation milestones that make those workflows feel natural from
|
|
chat-driven LLM interfaces live in
|
|
[roadmap/llm-chat-ergonomics.md](roadmap/llm-chat-ergonomics.md).
|
|
|
|
## Naming Guidance
|
|
|
|
Prefer language that reinforces the workspace model:
|
|
|
|
- `workspace`
|
|
- `sandbox`
|
|
- `shell`
|
|
- `service`
|
|
- `snapshot`
|
|
- `reset`
|
|
|
|
Avoid centering language that makes the product feel like CI infrastructure:
|
|
|
|
- `job`
|
|
- `runner`
|
|
- `pipeline`
|
|
- `worker`
|
|
- `queue`
|
|
- `build matrix`
|
|
|
|
## Litmus Test
|
|
|
|
When evaluating a new feature, ask:
|
|
|
|
"Does this help an agent inhabit a safe disposable workspace and do real
|
|
software work inside it?"
|
|
|
|
If the better description is "it helps submit, schedule, and report jobs", the
|
|
feature is probably pushing the product in the wrong direction.
|