Context Optimization in AI Agents: From Sub-Agents to TypeScript Interfaces

The Hidden Cost of Context in AI Agent Systems

As we’ve integrated more Model Context Protocol (MCP) servers into our AI agents, a critical challenge has emerged: context memory consumption. Every MCP server loaded into an LLM consumes precious context window space, forcing developers to make difficult trade-offs between functionality and performance. This constraint has driven the evolution of sophisticated context management patterns, from sub-agent architectures to, most recently, Cloudflare’s groundbreaking “Code Mode” approach.

In a fascinating article titled “Code Mode: the better way to use MCP”, Kenton Varda and Sunil Pai from Cloudflare present a paradigm shift in how we should think about AI agents and tool usage. Their insight? LLMs are better at writing code to call APIs than at calling them directly through tool functions.

The Sub-Agent Pattern: Non-Deterministic to Non-Deterministic

Before diving into Cloudflare’s innovation, let’s examine how we’ve traditionally handled context optimization through sub-agent architectures. In supervisor-and-task-agent systems, we’ve employed a hierarchical approach where:

The Supervisor Agent maintains minimal context, holding only high-level orchestration logic
Specialized Task Agents are spawned with focused contexts for specific tasks
Context is ephemeral, existing only for the duration of each task agent’s execution

This pattern works because it compartmentalizes context consumption. Instead of loading every possible MCP server and tool into a single agent’s context, we dynamically instantiate task agents with only the tools they need for their specific job.

graph TD Start([User Query]) --> Supervisor{Supervisor Agent
~1000 tokens} Supervisor -->|"Task: Search"| SearchAgent[Search Agent
Context: 1300 tokens
Tools: web-search, vector-db] Supervisor -->|"Task: Analysis"| AnalysisAgent[Analysis Agent
Context: 1300 tokens
Tools: data-processor] Supervisor -->|"Task: Code"| CodeAgent[Code Agent
Context: 1300 tokens
Tools: linter, compiler] SearchAgent --> Cleanup1[Release Context] AnalysisAgent --> Cleanup2[Release Context] CodeAgent --> Cleanup3[Release Context] Cleanup1 --> Result([Return Result]) Cleanup2 --> Result Cleanup3 --> Result style Supervisor fill:#2196F3,color:#fff style SearchAgent fill:#FF9800,color:#fff style AnalysisAgent fill:#9C27B0,color:#fff style CodeAgent fill:#4CAF50,color:#fff style Cleanup1 fill:#F44336,color:#fff style Cleanup2 fill:#F44336,color:#fff style Cleanup3 fill:#F44336,color:#fff

This diagram illustrates the supervisor-task agent pattern where:

The Supervisor Agent maintains minimal context (~1000 tokens)
Task Agents are spawned with only necessary tools and context (~1300 tokens each)
Context is released after each task completes, freeing memory
Only one task agent is active at a time, minimizing total context usage

However, this approach has a fundamental characteristic: it’s non-deterministic all the way down. Both the supervisor agent and its task agents operate in the probabilistic realm of LLM reasoning, making decisions based on learned patterns rather than explicit logic.

Code Mode: The Deterministic Bridge

Cloudflare’s Code Mode represents a fundamentally different approach to context optimization. Instead of exposing MCP tools directly to the LLM as function calls, they:

Convert MCP tools into TypeScript APIs with full type definitions and documentation
Ask the LLM to write TypeScript code that calls these APIs
Execute the generated code in a secure sandbox (V8 isolates)

This is revolutionary because it leverages what LLMs are genuinely excellent at: writing code based on the millions of real-world examples in their training data.

Why This Works Better

As Varda and Pai eloquently put it:

“Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it. It’s just not going to be his best work.”

LLMs have seen countless examples of TypeScript code calling APIs, handling responses, chaining operations, and implementing complex logic. They’ve seen far fewer examples of the synthetic “tool call” format that most agent frameworks use.

The Uniqueness of TypeScript Interfaces for LLMs

What makes Cloudflare’s approach particularly compelling is their choice of TypeScript as the target language. This isn’t arbitrary – TypeScript offers several advantages for LLM-generated code:

1. Rich Training Data

TypeScript has exploded in popularity over the past decade, meaning LLMs have been trained on vast quantities of high-quality TypeScript code from open-source projects.

2. Type Safety as Guardrails

The type system provides natural constraints that guide the LLM toward correct implementations:

// The LLM sees this interface
interface WeatherAPI {
  getCurrentWeather(input: {
    location: string;
    units?: 'fahrenheit' | 'celsius';
  }): Promise<WeatherResult>;
}

// And naturally writes correct code
const weather = await api.getCurrentWeather({
  location: "Austin, TX",
  units: "fahrenheit"
});

3. Deterministic Execution

Once the TypeScript code is generated, its execution is completely deterministic. This creates a fascinating hybrid:

Non-deterministic LLM generates the code
Deterministic runtime executes it
Predictable API calls interact with MCP servers

Context Efficiency: A Comparative Analysis

Let’s compare context consumption across these approaches:

Traditional Tool Calling

Context includes:
- System prompt (~500 tokens)
- Tool definitions for 10 MCP servers (~5000 tokens)
- Conversation history (~2000 tokens)
- Tool call/response pairs (~3000 tokens per interaction)
Total: ~10,500+ tokens

Sub-Agent Pattern

Supervisor Agent Context:
- Orchestration prompt (~300 tokens)
- Task routing logic (~200 tokens)
- High-level state (~500 tokens)
Total: ~1,000 tokens

Per Task Agent:
- Specialized prompt (~300 tokens)
- 2-3 relevant tools (~500 tokens)
- Focused context (~500 tokens)
Total: ~1,300 tokens

Code Mode

Context includes:
- System prompt with coding instructions (~400 tokens)
- TypeScript API definitions (~2000 tokens)
- Generated code (~500 tokens)
- Execution results only (~200 tokens)
Total: ~3,100 tokens

The efficiency gains are dramatic, especially when chaining multiple operations. In traditional tool calling, each intermediate result must pass through the LLM’s context. With Code Mode, only the final output needs to return.

The Deterministic Advantage

Perhaps the most profound insight from Cloudflare’s approach is how it bridges the deterministic and non-deterministic worlds. Traditional sub-agent patterns are “non-deterministic to non-deterministic” – one LLM coordinating other LLMs. Code Mode is “non-deterministic to deterministic” – an LLM generating code that executes predictably.

This has several advantages:

Debugging: Generated TypeScript code can be inspected, logged, and debugged
Reliability: Deterministic execution means consistent behavior
Performance: No token processing for intermediate results
Security: Code runs in isolated sandboxes with explicit permissions

Implementation Patterns

Here’s how Code Mode works in practice, using the actual implementation from Cloudflare’s article:

import { codemode } from "agents/codemode/ai";

// Original approach with direct tool calling
const stream = streamText({
  model: openai("gpt-4"),
  system: "You are a helpful assistant",
  messages: [
    { role: "user", content: "What's the weather in Austin, TX?" }
  ],
  tools: {
    // tool definitions exposed directly to LLM
  }
});

// New approach with Code Mode - wrap tools and system prompt
const { system, tools } = codemode({
  system: "You are a helpful assistant",
  tools: {
    // tool definitions converted to TypeScript API
  },
  // ...config
});

// Now the LLM writes TypeScript code instead of tool calls
const stream = streamText({
  model: openai("gpt-4"),
  system,
  tools,
  messages: [
    { role: "user", content: "What's the weather in Austin, TX?" }
  ]
});

The generated TypeScript API looks like this (from the actual MCP server):

interface FetchAgentsDocumentationInput {
  [k: string]: unknown;
}

interface SearchAgentsCodeInput {
  /**
   * The search query to find relevant code files
   */
  query: string;
  /**
   * Page number to retrieve (starting from 1). Each page contains 30
   * results.
   */
  page?: number;
}

declare const codemode: {
  /**
   * Fetch entire documentation file from GitHub repository:
   * cloudflare/agents. Useful for general questions. Always call
   * this tool first if asked about cloudflare/agents.
   */
  fetch_agents_documentation: (
    input: FetchAgentsDocumentationInput
  ) => Promise<FetchAgentsDocumentationOutput>;

  /**
   * Search for code within the GitHub repository: "cloudflare/agents"
   * using the GitHub Search API (exact match). Returns matching files
   * for you to query further if relevant.
   */
  search_agents_code: (
    input: SearchAgentsCodeInput
  ) => Promise<SearchAgentsCodeOutput>;
};

And the LLM generates code like:

// Generated by the LLM to answer a question about the agents SDK
const docs = await codemode.fetch_agents_documentation({});
const searchResults = await codemode.search_agents_code({
  query: "codemode implementation",
  page: 1
});
console.log("Documentation:", docs);
console.log("Code examples found:", searchResults);

The Future of Agent Architecture

Cloudflare’s Code Mode suggests a future where AI agents are less about teaching LLMs new tricks (tool calling) and more about leveraging what they already excel at (code generation). This approach could fundamentally reshape how we build agent systems:

Hybrid Architectures: Combining sub-agent patterns with Code Mode for maximum flexibility
Language-Specific Optimization: Different programming languages for different tasks
Composable Sandboxes: Chaining isolated execution environments
Deterministic Workflows: Predictable, auditable agent behaviors

Conclusion

The evolution from direct tool calling to sub-agent patterns to Code Mode represents a maturing understanding of how to effectively leverage LLMs. By recognizing that LLMs are fundamentally better at writing code than at learning new interaction patterns, Cloudflare has opened a new frontier in agent development.

The key insight isn’t just about context optimization – it’s about playing to the strengths of our tools. LLMs have been trained on vast corpora of code. By asking them to write code rather than perform unfamiliar tool-calling patterns, we’re not teaching Shakespeare Mandarin; we’re asking him to write in the language he knows best.

As we continue building increasingly sophisticated agent systems, patterns like Code Mode will become essential for managing complexity while maintaining performance. The future of AI agents may well be less about artificial intelligence and more about intelligent architecture – systems that cleverly combine the non-deterministic creativity of LLMs with the deterministic reliability of traditional programming.

For more details on implementing Code Mode, check out Cloudflare’s documentation and their Worker Loader API.