🔒 Protected via Cloudflare Access

code.execute.ready — Seed Plan

What It Is

Root capability gate: "can this agent execute code and observe results?"

Not "can this agent write code" (that's the model's job). This is about runtime execution — does the agent have access to a code execution environment, and can it use it safely?

The Landscape (2026)

Execution Environments Agents Actually Use

1. Cloud Sandboxes (purpose-built for agents)

E2B — Firecracker microVMs, Python/JS SDKs, ~150ms cold start, 24h session cap
Modal — gVisor isolation, Python-first, sub-second cold starts, scales to 20k concurrent
Daytona — 90ms creation, Git/LSP built-in, unlimited sessions
CodeSandbox — snapshot/forking model, backed by Together AI
Northflank — Kata/Firecracker/gVisor, BYOC, unlimited sessions
Vercel Sandbox — beta, Firecracker, 45min cap
Blaxel — 25ms resume from standby
Cloudflare Sandbox — edge execution, sub-50ms cold starts

2. Local Shell / REPL

Direct shell access (bash, zsh, PowerShell)
Language REPLs (python, node, irb, ghci)
Agent frameworks with exec tools (OpenClaw, LangChain, etc.)

3. MCP Code Runners

Pydantic Python Code Execution (336k est. visitors — most popular)
Runno (WASM sandbox — Python, JS, C, C++, Ruby, PHP)
Cloudflare Sandbox Container (official MCP server)
E2B MCP servers (official)
Code Runner MCP by Jun Han (30+ languages, 222k visitors)
Docker-based sandboxes (numerous community servers)
Deno-based runners (permission-controlled JS/TS)

4. Browser-based / WASM

Pyodide (Python in WebAssembly)
Deno (permission-gated JS/TS)
Browser isolates (V8-based, Cloudflare Workers style)

Language Priorities

Tier 1 — Must verify:

Python — dominant agent language, data analysis, scripting, ML
JavaScript/TypeScript — API integrations, web automation, Node.js ecosystem

Tier 2 — Nice to verify but not gate-blocking:

Bash/shell — system automation
SQL — data queries (different execution context)

Rationale: ~95% of agent code execution is Python or JS/TS. The seed should verify the execution capability, not catalog every supported language. An agent that can run Python and get output has code execution ready.

Security Constraints That Matter

Isolation level — Is code sandboxed or running on host? (microVM > gVisor > Docker > bare shell)
Network egress — Can executed code make outbound requests? Is it filtered?
Filesystem scope — Read/write access boundaries. Ephemeral vs persistent.
Resource limits — CPU/memory/time caps to prevent runaway execution
No secrets exposure — Environment variables, API keys not leaked to sandbox
Human approval gates — For destructive or external-facing actions

The seed should NOT prescribe a specific isolation level. It should verify that:

Code runs in some execution environment
The agent can observe output (stdout, stderr, return values)
The agent understands the security boundary it's operating within

Minimal Verification (Contract)

The contract needs to assert a testable world-state. Candidates:

Option A (tight): "a code snippet has executed in an available runtime and produced observable output"

Pro: Concrete, testable, minimal
Con: Doesn't assert security awareness

Option B (with security awareness): "a code snippet has executed in an identified runtime, produced observable output, and the execution environment's constraints are known"

Pro: Forces the agent to reason about what it can/can't do
Con: "constraints are known" is fuzzy

Recommendation: Option A. Keep it atomic. Security awareness is a separate concern (could be a child seed: code.sandbox.constraints or similar).

What the Prompt Should Cover

Discover the execution method — What tool/API/shell does the agent have access to? (Don't prescribe — discover.)
Run a minimal test — Execute a trivial snippet (e.g., print(1+1) or console.log(1+1)) and observe output
Confirm round-trip — Agent sent code → runtime executed → agent received output
Identify the runtime — What language(s) are available? What's the execution surface? (local shell, cloud sandbox, MCP tool, etc.)

The prompt should be environment-agnostic. It shouldn't say "use E2B" or "open a terminal." It should guide the agent to discover what execution capability exists and verify it works.

Slug Analysis

code.execute.ready ← recommended (3 independent axes, most general)
code.runtime.ready ← slightly more specific to runtime vs just "execute"
code.sandbox.ready ← implies sandboxing, too narrow (not all agents run sandboxed)

Parent/Child Potential

As a root seed, it has no parent. Potential children:

code.sandbox.audit — verify isolation level and constraints
code.data.analysis — use execution for data analysis tasks
code.test.suite — execute and verify test suites
code.install.dependency — install packages in the runtime
code.output.parse — extract structured data from execution output

Open Questions

Multi-language gate? Should the seed force both Python AND JS verification, or is one language sufficient to prove capability?
Security in root or child? Should security constraints be baked into this root or split into a child seed like code.sandbox.audit?
Discovery vs prescription? How environment-agnostic should the prompt be? "Discover your tools" vs "try these common patterns"?