Claude Agent SDK vs OpenAI Agents SDK: Building AI Tools in TypeScript
Wednesday 01/04/2026
·11 min readYou're building an AI agent in TypeScript and you've got two serious options on the table: Anthropic's Claude Agent SDK and OpenAI's Agents SDK. Both promise autonomous tool-calling loops, both ship TypeScript-first, and both launched in early 2026. But the docs for each read like marketing material, and you can't tell which one will actually feel right in your codebase until you've built something real.
I built the same project — a file-analyzing coding assistant — in both SDKs. Here's what I found, with all the code so you can judge for yourself.
What we're building
A CLI coding assistant that can:
- Read files from disk
- Search for patterns in a codebase
- Suggest fixes and explain code
Same tools, same behavior, two different SDKs. This gives us an apples-to-apples comparison of the developer experience, agent loop, and debugging story.
Setting up the project
mkdir agent-comparison && cd agent-comparison
pnpm init
pnpm add @anthropic-ai/claude-agent-sdk openai-agents zod typescript tsx
pnpm add -D @types/node
Both SDKs need API keys:
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"
Defining tools: where the differences start
Tools are the core of any agent. Let's define two simple ones — reading a file and searching for a pattern — and see how each SDK handles them.
Claude Agent SDK tools
// src/claude-agent.ts
import { Agent, Tool } from '@anthropic-ai/claude-agent-sdk'
import { z } from 'zod'
import { readFileSync, existsSync } from 'fs'
import { execSync } from 'child_process'
const readFileTool = new Tool({
name: 'read_file',
description: 'Read the contents of a file at the given path',
parameters: z.object({
path: z.string().describe('Absolute or relative file path'),
}),
async execute({ path }) {
if (!existsSync(path)) {
return { error: `File not found: ${path}` }
}
try {
const content = readFileSync(path, 'utf-8')
return { content, lines: content.split('\n').length }
} catch (err) {
return { error: `Failed to read file: ${(err as Error).message}` }
}
},
})
const searchCodeTool = new Tool({
name: 'search_code',
description: 'Search for a regex pattern in files under a directory',
parameters: z.object({
pattern: z.string().describe('Regex pattern to search for'),
directory: z.string().default('.').describe('Directory to search in'),
}),
async execute({ pattern, directory }) {
try {
const result = execSync(
`grep -rn "${pattern.replace(/"/g, '\\"')}" "${directory}" --include="*.ts" --include="*.tsx" --include="*.js" -l`,
{ encoding: 'utf-8', timeout: 5000 }
)
return { matches: result.trim().split('\n').filter(Boolean) }
} catch {
return { matches: [] }
}
},
})
The Claude Agent SDK uses Zod schemas directly for parameter validation. The execute function receives the parsed, typed parameters. If your tool returns an object, the SDK serializes it to JSON automatically and sends it back to the model.
OpenAI Agents SDK tools
// src/openai-agent.ts
import { Agent, tool } from 'openai-agents'
import { z } from 'zod'
import { readFileSync, existsSync } from 'fs'
import { execSync } from 'child_process'
const readFileTool = tool({
name: 'read_file',
description: 'Read the contents of a file at the given path',
parameters: z.object({
path: z.string().describe('Absolute or relative file path'),
}),
async run({ path }) {
if (!existsSync(path)) {
return JSON.stringify({ error: `File not found: ${path}` })
}
try {
const content = readFileSync(path, 'utf-8')
return JSON.stringify({ content, lines: content.split('\n').length })
} catch (err) {
return JSON.stringify({
error: `Failed to read file: ${(err as Error).message}`,
})
}
},
})
const searchCodeTool = tool({
name: 'search_code',
description: 'Search for a regex pattern in files under a directory',
parameters: z.object({
pattern: z.string().describe('Regex pattern to search for'),
directory: z.string().default('.').describe('Directory to search in'),
}),
async run({ pattern, directory }) {
try {
const result = execSync(
`grep -rn "${pattern.replace(/"/g, '\\"')}" "${directory}" --include="*.ts" --include="*.tsx" --include="*.js" -l`,
{ encoding: 'utf-8', timeout: 5000 }
)
return JSON.stringify({
matches: result.trim().split('\n').filter(Boolean),
})
} catch {
return JSON.stringify({ matches: [] })
}
},
})
Notice the key difference: OpenAI's SDK requires you to return a string. You have to JSON.stringify everything yourself. The Claude Agent SDK accepts objects and handles serialization. It's a small thing, but across a dozen tools it adds up.
Both use Zod for schema definition, which is great — you get runtime validation and type inference in one place.
Building the agent loop
This is where the SDKs diverge the most.
Claude Agent SDK: explicit loop control
// src/claude-agent.ts (continued)
const agent = new Agent({
model: 'claude-sonnet-4-6',
system: `You are a coding assistant. You can read files and search codebases to help developers understand and improve their code. Be concise and specific.`,
tools: [readFileTool, searchCodeTool],
maxTurns: 10,
})
async function run(userMessage: string): Promise<void> {
const result = await agent.run(userMessage)
for await (const event of result) {
switch (event.type) {
case 'text':
process.stdout.write(event.text)
break
case 'tool_use':
console.log(`\n[Tool: ${event.name}(${JSON.stringify(event.input)})]`)
break
case 'tool_result':
console.log(`[Result: ${event.output.substring(0, 100)}...]`)
break
case 'error':
console.error(`\nAgent error: ${event.message}`)
break
}
}
console.log()
}
run(process.argv[2] || 'Read package.json and tell me about this project')
The Claude Agent SDK gives you a streaming event loop. You get granular events — text chunks, tool calls, tool results, errors — and you decide what to do with each one. The maxTurns parameter caps the agent loop so you don't burn money on runaway agents.
OpenAI Agents SDK: runner pattern
// src/openai-agent.ts (continued)
import { Runner } from 'openai-agents'
const agent = new Agent({
name: 'coding-assistant',
model: 'gpt-4o',
instructions: `You are a coding assistant. You can read files and search codebases to help developers understand and improve their code. Be concise and specific.`,
tools: [readFileTool, searchCodeTool],
})
async function run(userMessage: string): Promise<void> {
const runner = new Runner()
const result = await runner.run(agent, userMessage, {
maxTurns: 10,
})
// Streaming via event callbacks
result.on('agent_text', (text: string) => {
process.stdout.write(text)
})
result.on('tool_call', (name: string, args: string) => {
console.log(`\n[Tool: ${name}(${args})]`)
})
result.on('tool_output', (output: string) => {
console.log(`[Result: ${output.substring(0, 100)}...]`)
})
await result.completed()
console.log()
}
run(process.argv[2] || 'Read package.json and tell me about this project')
OpenAI separates the Agent (configuration) from the Runner (execution). This is a deliberate design choice — the same agent definition can be run with different runner configurations. The streaming is event-emitter based rather than async-iterator based.
Agent handoffs: the killer feature gap
This is where the comparison gets interesting. Both SDKs support agent handoffs — one agent delegating to another — but the implementations reflect different philosophies.
Claude Agent SDK handoffs
// src/claude-handoff.ts
import { Agent, Tool } from '@anthropic-ai/claude-agent-sdk'
import { z } from 'zod'
const reviewerAgent = new Agent({
model: 'claude-sonnet-4-6',
system: 'You review code for bugs, security issues, and style problems.',
tools: [readFileTool],
})
const mainAgent = new Agent({
model: 'claude-sonnet-4-6',
system: 'You are a coding assistant. Delegate code review tasks to the reviewer.',
tools: [readFileTool, searchCodeTool],
handoffs: [
{
agent: reviewerAgent,
description: 'Hand off to the code reviewer for detailed code review',
filter: z.object({
filePath: z.string().describe('The file to review'),
focus: z.string().optional().describe('What to focus the review on'),
}),
},
],
})
Claude's SDK treats handoffs as typed, schema-validated transitions. The filter parameter lets you control exactly what context gets passed to the child agent. This is important — without it, you're dumping the entire conversation into the next agent and paying for all those tokens again.
OpenAI Agents SDK handoffs
// src/openai-handoff.ts
import { Agent, handoff } from 'openai-agents'
const reviewerAgent = new Agent({
name: 'code-reviewer',
model: 'gpt-4o',
instructions: 'You review code for bugs, security issues, and style problems.',
tools: [readFileTool],
})
const mainAgent = new Agent({
name: 'coding-assistant',
model: 'gpt-4o',
instructions: `You are a coding assistant. Delegate code review tasks to the reviewer.`,
tools: [readFileTool, searchCodeTool],
handoffs: [
handoff(reviewerAgent, {
description: 'Hand off to the code reviewer for detailed code review',
}),
],
})
OpenAI's approach is simpler — handoff() is a function call, not a schema definition. There's no built-in context filtering; the full conversation transfers. If you need to filter context, you do it manually in a wrapper.
My take: Claude's typed handoffs are better for production systems where you want strict control over agent boundaries. OpenAI's are faster to prototype with. Pick based on where you are in the build cycle.
Debugging and tracing
When your agent does something unexpected (and it will), tracing is everything.
Claude Agent SDK
// src/claude-traced.ts
const result = await agent.run(userMessage, {
trace: true,
})
// After completion, inspect the full trace
const trace = result.getTrace()
console.log(JSON.stringify(trace, null, 2))
// {
// turns: [
// { role: 'user', content: '...' },
// { role: 'assistant', content: '...', tool_calls: [...] },
// { role: 'tool', results: [...] },
// ...
// ],
// tokenUsage: { input: 1523, output: 487 },
// duration: 3200
// }
The Claude SDK gives you a structured trace object with token counts and timing per turn. You can serialize the whole thing to JSON and send it to your logging pipeline.
OpenAI Agents SDK
// src/openai-traced.ts
const result = await runner.run(agent, userMessage, {
maxTurns: 10,
tracing: true,
})
await result.completed()
// Traces go to OpenAI's dashboard by default
// For local access:
const steps = result.getSteps()
for (const step of steps) {
console.log(`${step.type}: ${step.model} (${step.tokens.total} tokens)`)
}
OpenAI's tracing integrates with their hosted dashboard out of the box, which is nice if you're already in their ecosystem. For local debugging, the getSteps() API gives you similar data.
Gotcha: OpenAI's tracing sends data to their servers by default. If you're working with sensitive codebases, explicitly disable remote tracing with tracing: { remote: false }.
Error handling patterns
Both SDKs need you to think about three failure modes: API errors, tool execution errors, and agent loops that never terminate.
// src/error-handling.ts
// Works with either SDK — same patterns apply
async function runWithGuardrails(userMessage: string): Promise<string> {
const controller = new AbortController()
const timeout = setTimeout(() => controller.abort(), 30_000)
try {
const result = await agent.run(userMessage, {
maxTurns: 10,
signal: controller.signal,
})
let output = ''
for await (const event of result) {
if (event.type === 'text') {
output += event.text
}
}
return output
} catch (err) {
if ((err as Error).name === 'AbortError') {
return 'Agent timed out after 30 seconds'
}
if ((err as Error).message.includes('rate_limit')) {
// Both SDKs throw on 429s — add your retry logic here
await new Promise((r) => setTimeout(r, 5000))
return runWithGuardrails(userMessage)
}
throw err
} finally {
clearTimeout(timeout)
}
}
Both SDKs support AbortController for cancellation. Both throw on rate limits rather than retrying automatically. If you need retry logic, you're building it yourself (or check out my post on handling AI API rate limits in production).
Head-to-head comparison
Here's what I found after building the same assistant in both:
| Feature | Claude Agent SDK | OpenAI Agents SDK | |---|---|---| | Tool definitions | Zod schemas, return objects | Zod schemas, return strings | | Agent loop | Async iterator (streaming events) | Event emitter pattern | | Handoffs | Typed with context filtering | Simple function call, full context | | Tracing | Local-first structured traces | Dashboard-first, local optional | | Cancellation | AbortController | AbortController | | Max turns | Agent-level config | Runner-level config | | Model lock-in | Claude models only | OpenAI models only | | TypeScript types | Excellent, inferred from Zod | Good, some manual typing needed | | Bundle size | ~45KB | ~38KB |
When to use which
Pick Claude Agent SDK if:
- You're already using Claude models (obviously)
- You need typed handoffs with context filtering for multi-agent systems
- You want local-first tracing without data leaving your infrastructure
- Your tools return complex objects and you don't want to stringify everything
Pick OpenAI Agents SDK if:
- You're in the OpenAI ecosystem and want dashboard integration
- You prefer the runner/agent separation for testing different configurations
- You're prototyping and want the simplest possible handoff pattern
- You need GPT-4o's specific strengths (vision tasks, for example)
Pick neither if:
- You need to switch between providers. Both SDKs lock you into their respective model. If provider flexibility matters, use the Vercel AI SDK instead — it wraps both providers with a unified tool-calling interface.
The honest take
Both SDKs are good. Both are under active development and APIs will change. The differences are real but not dramatic — if you already know which model provider you prefer, just use their SDK.
The biggest gap I see: neither SDK has a great story for testing. Mocking the agent loop for unit tests requires significant setup in both cases. If you're building production agents, plan to invest time in your test harness (see my post on testing AI features).
If I were starting a new agent project today and didn't have a provider preference, I'd go with the Claude Agent SDK. The typed handoffs and object returns feel more production-ready. But I wouldn't fight anyone who picked OpenAI's — especially if they want that hosted tracing dashboard.
What's next
If you want to build something more ambitious with these SDKs, check out the next post in the series: Build an Agentic RAG Pipeline That Retries and Reformulates Queries — where we use the agent loop pattern to build a retrieval system that's smarter than naive vector search.