diff --git a/README.md b/README.md index 2057f42..44c02a4 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ It exposes the same runtime in three public forms: ## Start Here - Install: [docs/install.md](docs/install.md) +- Vision: [docs/vision.md](docs/vision.md) - First run transcript: [docs/first-run.md](docs/first-run.md) - Terminal walkthrough GIF: [docs/assets/first-run.gif](docs/assets/first-run.gif) - PyPI package: [pypi.org/project/pyro-mcp](https://pypi.org/project/pyro-mcp/) @@ -198,6 +199,11 @@ The walkthrough GIF above was rendered from [docs/assets/first-run.tape](docs/as Use `pyro run` for one-shot commands. Use `pyro task ...` when you need repeated commands in one workspace without recreating the sandbox every time. +The project direction is an agent workspace, not a CI job runner. Persistent +tasks are meant to let an agent stay inside one bounded sandbox across multiple +steps. See [docs/vision.md](docs/vision.md) for the product thesis and the +longer-term interaction model. + ```bash pyro task create debian:12 --source-path ./repo pyro task sync push TASK_ID ./changes --dest src diff --git a/docs/vision.md b/docs/vision.md new file mode 100644 index 0000000..93a4d33 --- /dev/null +++ b/docs/vision.md @@ -0,0 +1,201 @@ +# Vision + +`pyro-mcp` should become the disposable sandbox where an agent can do real +development work safely, repeatedly, and reproducibly. + +That is a different product from a generic VM wrapper, a secure CI runner, or a +task queue with better isolation. + +## Core Thesis + +The goal is not just to run one command in a microVM. + +The goal is to give an LLM or coding agent a bounded workspace where it can: + +- inspect a repo +- install dependencies +- edit files +- run tests +- start and inspect services +- reset and retry +- export patches and artifacts +- destroy the sandbox when the task is done + +The sandbox is the execution boundary for agentic software work. + +## What This Is Not + +`pyro-mcp` should not drift into: + +- a YAML pipeline system +- a build farm +- a generic CI job runner +- a scheduler or queueing platform +- a broad VM orchestration product + +Those products optimize for queued work, throughput, retries, matrix builds, and +shared infrastructure. + +`pyro-mcp` should optimize for agent loops: + +- explore +- edit +- test +- observe +- reset +- export + +## Why This Can Look Like CI + +Any sandbox product starts to look like CI if the main abstraction is: + +- submit a command +- wait +- collect logs +- fetch artifacts + +That shape is useful, but it is not the center of the vision. + +To stay aligned, the primary abstraction should be a workspace the agent +inhabits, not a job the agent submits. + +## Product Principles + +### Workspace-First + +The default mental model should be "open a disposable workspace" rather than +"enqueue a task". + +### Stateful Interaction + +The product should support repeated interaction in one sandbox. One-shot command +execution matters, but it is the entry point, not the destination. + +### Explicit Host Crossing + +Anything that crosses the host boundary should be intentional and visible: + +- seeding a workspace +- syncing changes in +- exporting artifacts out +- granting secrets or network access + +### Reset Over Repair + +Agents should be able to checkpoint, reset, and retry cheaply. Disposable state +is a feature, not a limitation. + +### Same Contract Across Surfaces + +CLI, Python, and MCP should expose the same underlying workspace model so the +product feels coherent no matter how it is consumed. + +### Agent-Native Observability + +The sandbox should expose the things an agent actually needs to reason about: + +- command output +- file diffs +- service status +- logs +- readiness +- exported results + +## The Shape Of An LLM-First Sandbox + +The strongest future direction is a small, agent-native contract built around +workspaces, shells, files, services, and reset. + +Representative primitives: + +- `workspace.create` +- `workspace.status` +- `workspace.delete` +- `workspace.sync_push` +- `workspace.export` +- `workspace.diff` +- `workspace.snapshot` +- `workspace.reset` +- `shell.open` +- `shell.read` +- `shell.write` +- `shell.signal` +- `shell.close` +- `workspace.exec` +- `service.start` +- `service.status` +- `service.logs` +- `service.stop` + +These names are illustrative, not a committed public API. + +The important point is the interaction model: + +- a shell session is interactive state inside the sandbox +- a workspace is durable for the life of the task +- services are first-class, not accidental background jobs +- reset is a core workflow primitive + +## Interactive Shells And Disk Operations + +Interactive shells are aligned with the vision because they make the agent feel +present inside the sandbox rather than reduced to one-shot job submission. + +That does not mean `pyro-mcp` should become a raw SSH replacement. The shell +should sit inside a higher-level workspace model with structured file, service, +diff, and reset operations around it. + +Disk-level operations are also useful, but they should remain supporting tools. +They are good for: + +- fast workspace seeding +- snapshotting +- offline inspection +- diffing +- export/import without a full boot + +They should not become the primary product identity. If the center of the +product becomes "operate on VM disks", it will read as image tooling rather +than an agent workspace. + +## What To Build Next + +Features should be prioritized in this order: + +1. Repeated commands in one persistent workspace +2. Interactive shell sessions with PTY semantics +3. Structured workspace sync and export +4. Service lifecycle and readiness checks +5. Snapshot and reset workflows +6. Explicit secrets and network policy +7. Secondary disk-level import/export and inspection tools + +## Naming Guidance + +Prefer language that reinforces the workspace model: + +- `workspace` +- `sandbox` +- `shell` +- `service` +- `snapshot` +- `reset` + +Avoid centering language that makes the product feel like CI infrastructure: + +- `job` +- `runner` +- `pipeline` +- `worker` +- `queue` +- `build matrix` + +## Litmus Test + +When evaluating a new feature, ask: + +"Does this help an agent inhabit a safe disposable workspace and do real +software work inside it?" + +If the better description is "it helps submit, schedule, and report jobs", the +feature is probably pushing the product in the wrong direction.