🔒 Protected via Cloudflare Access

code.execute.ready — Seed Plan

What It Is

Root capability gate: "can this agent execute code and observe results?"

Not "can this agent write code" (that's the model's job). This is about runtime execution — does the agent have access to a code execution environment, and can it use it safely?

The Landscape (2026)

Execution Environments Agents Actually Use

1. Cloud Sandboxes (purpose-built for agents)

2. Local Shell / REPL

3. MCP Code Runners

4. Browser-based / WASM

Language Priorities

Tier 1 — Must verify:

Tier 2 — Nice to verify but not gate-blocking:

Rationale: ~95% of agent code execution is Python or JS/TS. The seed should verify the execution capability, not catalog every supported language. An agent that can run Python and get output has code execution ready.

Security Constraints That Matter

  1. Isolation level — Is code sandboxed or running on host? (microVM > gVisor > Docker > bare shell)
  2. Network egress — Can executed code make outbound requests? Is it filtered?
  3. Filesystem scope — Read/write access boundaries. Ephemeral vs persistent.
  4. Resource limits — CPU/memory/time caps to prevent runaway execution
  5. No secrets exposure — Environment variables, API keys not leaked to sandbox
  6. Human approval gates — For destructive or external-facing actions

The seed should NOT prescribe a specific isolation level. It should verify that:

Minimal Verification (Contract)

The contract needs to assert a testable world-state. Candidates:

Option A (tight): "a code snippet has executed in an available runtime and produced observable output"

Option B (with security awareness): "a code snippet has executed in an identified runtime, produced observable output, and the execution environment's constraints are known"

Recommendation: Option A. Keep it atomic. Security awareness is a separate concern (could be a child seed: code.sandbox.constraints or similar).

What the Prompt Should Cover

  1. Discover the execution method — What tool/API/shell does the agent have access to? (Don't prescribe — discover.)
  2. Run a minimal test — Execute a trivial snippet (e.g., print(1+1) or console.log(1+1)) and observe output
  3. Confirm round-trip — Agent sent code → runtime executed → agent received output
  4. Identify the runtime — What language(s) are available? What's the execution surface? (local shell, cloud sandbox, MCP tool, etc.)

The prompt should be environment-agnostic. It shouldn't say "use E2B" or "open a terminal." It should guide the agent to discover what execution capability exists and verify it works.

Slug Analysis

Parent/Child Potential

As a root seed, it has no parent. Potential children:

Open Questions

  1. Multi-language gate? Should the seed force both Python AND JS verification, or is one language sufficient to prove capability?
  2. Security in root or child? Should security constraints be baked into this root or split into a child seed like code.sandbox.audit?
  3. Discovery vs prescription? How environment-agnostic should the prompt be? "Discover your tools" vs "try these common patterns"?