diff --git a/README.md b/README.md
index 2057f42..44c02a4 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@ It exposes the same runtime in three public forms:
 ## Start Here
 
 - Install: [docs/install.md](docs/install.md)
+- Vision: [docs/vision.md](docs/vision.md)
 - First run transcript: [docs/first-run.md](docs/first-run.md)
 - Terminal walkthrough GIF: [docs/assets/first-run.gif](docs/assets/first-run.gif)
 - PyPI package: [pypi.org/project/pyro-mcp](https://pypi.org/project/pyro-mcp/)
@@ -198,6 +199,11 @@ The walkthrough GIF above was rendered from [docs/assets/first-run.tape](docs/as
 Use `pyro run` for one-shot commands. Use `pyro task ...` when you need repeated commands in one
 workspace without recreating the sandbox every time.
 
+The project direction is an agent workspace, not a CI job runner. Persistent
+tasks are meant to let an agent stay inside one bounded sandbox across multiple
+steps. See [docs/vision.md](docs/vision.md) for the product thesis and the
+longer-term interaction model.
+
 ```bash
 pyro task create debian:12 --source-path ./repo
 pyro task sync push TASK_ID ./changes --dest src
diff --git a/docs/vision.md b/docs/vision.md
new file mode 100644
index 0000000..93a4d33
--- /dev/null
+++ b/docs/vision.md
@@ -0,0 +1,201 @@
+# Vision
+
+`pyro-mcp` should become the disposable sandbox where an agent can do real
+development work safely, repeatedly, and reproducibly.
+
+That is a different product from a generic VM wrapper, a secure CI runner, or a
+task queue with better isolation.
+
+## Core Thesis
+
+The goal is not just to run one command in a microVM.
+
+The goal is to give an LLM or coding agent a bounded workspace where it can:
+
+- inspect a repo
+- install dependencies
+- edit files
+- run tests
+- start and inspect services
+- reset and retry
+- export patches and artifacts
+- destroy the sandbox when the task is done
+
+The sandbox is the execution boundary for agentic software work.
+
+## What This Is Not
+
+`pyro-mcp` should not drift into:
+
+- a YAML pipeline system
+- a build farm
+- a generic CI job runner
+- a scheduler or queueing platform
+- a broad VM orchestration product
+
+Those products optimize for queued work, throughput, retries, matrix builds, and
+shared infrastructure.
+
+`pyro-mcp` should optimize for agent loops:
+
+- explore
+- edit
+- test
+- observe
+- reset
+- export
+
+## Why This Can Look Like CI
+
+Any sandbox product starts to look like CI if the main abstraction is:
+
+- submit a command
+- wait
+- collect logs
+- fetch artifacts
+
+That shape is useful, but it is not the center of the vision.
+
+To stay aligned, the primary abstraction should be a workspace the agent
+inhabits, not a job the agent submits.
+
+## Product Principles
+
+### Workspace-First
+
+The default mental model should be "open a disposable workspace" rather than
+"enqueue a task".
+
+### Stateful Interaction
+
+The product should support repeated interaction in one sandbox. One-shot command
+execution matters, but it is the entry point, not the destination.
+
+### Explicit Host Crossing
+
+Anything that crosses the host boundary should be intentional and visible:
+
+- seeding a workspace
+- syncing changes in
+- exporting artifacts out
+- granting secrets or network access
+
+### Reset Over Repair
+
+Agents should be able to checkpoint, reset, and retry cheaply. Disposable state
+is a feature, not a limitation.
+
+### Same Contract Across Surfaces
+
+CLI, Python, and MCP should expose the same underlying workspace model so the
+product feels coherent no matter how it is consumed.
+
+### Agent-Native Observability
+
+The sandbox should expose the things an agent actually needs to reason about:
+
+- command output
+- file diffs
+- service status
+- logs
+- readiness
+- exported results
+
+## The Shape Of An LLM-First Sandbox
+
+The strongest future direction is a small, agent-native contract built around
+workspaces, shells, files, services, and reset.
+
+Representative primitives:
+
+- `workspace.create`
+- `workspace.status`
+- `workspace.delete`
+- `workspace.sync_push`
+- `workspace.export`
+- `workspace.diff`
+- `workspace.snapshot`
+- `workspace.reset`
+- `shell.open`
+- `shell.read`
+- `shell.write`
+- `shell.signal`
+- `shell.close`
+- `workspace.exec`
+- `service.start`
+- `service.status`
+- `service.logs`
+- `service.stop`
+
+These names are illustrative, not a committed public API.
+
+The important point is the interaction model:
+
+- a shell session is interactive state inside the sandbox
+- a workspace is durable for the life of the task
+- services are first-class, not accidental background jobs
+- reset is a core workflow primitive
+
+## Interactive Shells And Disk Operations
+
+Interactive shells are aligned with the vision because they make the agent feel
+present inside the sandbox rather than reduced to one-shot job submission.
+
+That does not mean `pyro-mcp` should become a raw SSH replacement. The shell
+should sit inside a higher-level workspace model with structured file, service,
+diff, and reset operations around it.
+
+Disk-level operations are also useful, but they should remain supporting tools.
+They are good for:
+
+- fast workspace seeding
+- snapshotting
+- offline inspection
+- diffing
+- export/import without a full boot
+
+They should not become the primary product identity. If the center of the
+product becomes "operate on VM disks", it will read as image tooling rather
+than an agent workspace.
+
+## What To Build Next
+
+Features should be prioritized in this order:
+
+1. Repeated commands in one persistent workspace
+2. Interactive shell sessions with PTY semantics
+3. Structured workspace sync and export
+4. Service lifecycle and readiness checks
+5. Snapshot and reset workflows
+6. Explicit secrets and network policy
+7. Secondary disk-level import/export and inspection tools
+
+## Naming Guidance
+
+Prefer language that reinforces the workspace model:
+
+- `workspace`
+- `sandbox`
+- `shell`
+- `service`
+- `snapshot`
+- `reset`
+
+Avoid centering language that makes the product feel like CI infrastructure:
+
+- `job`
+- `runner`
+- `pipeline`
+- `worker`
+- `queue`
+- `build matrix`
+
+## Litmus Test
+
+When evaluating a new feature, ask:
+
+"Does this help an agent inhabit a safe disposable workspace and do real
+software work inside it?"
+
+If the better description is "it helps submit, schedule, and report jobs", the
+feature is probably pushing the product in the wrong direction.