code.execute.ready — Seed Plan
What It Is
Root capability gate: "can this agent execute code and observe results?"
Not "can this agent write code" (that's the model's job). This is about runtime execution — does the agent have access to a code execution environment, and can it use it safely?
The Landscape (2026)
Execution Environments Agents Actually Use
1. Cloud Sandboxes (purpose-built for agents)
- E2B — Firecracker microVMs, Python/JS SDKs, ~150ms cold start, 24h session cap
- Modal — gVisor isolation, Python-first, sub-second cold starts, scales to 20k concurrent
- Daytona — 90ms creation, Git/LSP built-in, unlimited sessions
- CodeSandbox — snapshot/forking model, backed by Together AI
- Northflank — Kata/Firecracker/gVisor, BYOC, unlimited sessions
- Vercel Sandbox — beta, Firecracker, 45min cap
- Blaxel — 25ms resume from standby
- Cloudflare Sandbox — edge execution, sub-50ms cold starts
2. Local Shell / REPL
- Direct shell access (bash, zsh, PowerShell)
- Language REPLs (python, node, irb, ghci)
- Agent frameworks with exec tools (OpenClaw, LangChain, etc.)
3. MCP Code Runners
- Pydantic Python Code Execution (336k est. visitors — most popular)
- Runno (WASM sandbox — Python, JS, C, C++, Ruby, PHP)
- Cloudflare Sandbox Container (official MCP server)
- E2B MCP servers (official)
- Code Runner MCP by Jun Han (30+ languages, 222k visitors)
- Docker-based sandboxes (numerous community servers)
- Deno-based runners (permission-controlled JS/TS)
4. Browser-based / WASM
- Pyodide (Python in WebAssembly)
- Deno (permission-gated JS/TS)
- Browser isolates (V8-based, Cloudflare Workers style)
Language Priorities
Tier 1 — Must verify:
- Python — dominant agent language, data analysis, scripting, ML
- JavaScript/TypeScript — API integrations, web automation, Node.js ecosystem
Tier 2 — Nice to verify but not gate-blocking:
- Bash/shell — system automation
- SQL — data queries (different execution context)
Rationale: ~95% of agent code execution is Python or JS/TS. The seed should verify the execution capability, not catalog every supported language. An agent that can run Python and get output has code execution ready.
Security Constraints That Matter
- Isolation level — Is code sandboxed or running on host? (microVM > gVisor > Docker > bare shell)
- Network egress — Can executed code make outbound requests? Is it filtered?
- Filesystem scope — Read/write access boundaries. Ephemeral vs persistent.
- Resource limits — CPU/memory/time caps to prevent runaway execution
- No secrets exposure — Environment variables, API keys not leaked to sandbox
- Human approval gates — For destructive or external-facing actions
The seed should NOT prescribe a specific isolation level. It should verify that:
- Code runs in some execution environment
- The agent can observe output (stdout, stderr, return values)
- The agent understands the security boundary it's operating within
Minimal Verification (Contract)
The contract needs to assert a testable world-state. Candidates:
Option A (tight): "a code snippet has executed in an available runtime and produced observable output"
- Pro: Concrete, testable, minimal
- Con: Doesn't assert security awareness
Option B (with security awareness): "a code snippet has executed in an identified runtime, produced observable output, and the execution environment's constraints are known"
- Pro: Forces the agent to reason about what it can/can't do
- Con: "constraints are known" is fuzzy
Recommendation: Option A. Keep it atomic. Security awareness is a separate concern (could be a child seed: code.sandbox.constraints or similar).
What the Prompt Should Cover
- Discover the execution method — What tool/API/shell does the agent have access to? (Don't prescribe — discover.)
- Run a minimal test — Execute a trivial snippet (e.g.,
print(1+1)orconsole.log(1+1)) and observe output - Confirm round-trip — Agent sent code → runtime executed → agent received output
- Identify the runtime — What language(s) are available? What's the execution surface? (local shell, cloud sandbox, MCP tool, etc.)
The prompt should be environment-agnostic. It shouldn't say "use E2B" or "open a terminal." It should guide the agent to discover what execution capability exists and verify it works.
Slug Analysis
code.execute.ready← recommended (3 independent axes, most general)code.runtime.ready← slightly more specific to runtime vs just "execute"code.sandbox.ready← implies sandboxing, too narrow (not all agents run sandboxed)
Parent/Child Potential
As a root seed, it has no parent. Potential children:
code.sandbox.audit— verify isolation level and constraintscode.data.analysis— use execution for data analysis taskscode.test.suite— execute and verify test suitescode.install.dependency— install packages in the runtimecode.output.parse— extract structured data from execution output
Open Questions
- Multi-language gate? Should the seed force both Python AND JS verification, or is one language sufficient to prove capability?
- Security in root or child? Should security constraints be baked into this root or split into a child seed like
code.sandbox.audit? - Discovery vs prescription? How environment-agnostic should the prompt be? "Discover your tools" vs "try these common patterns"?