Build a Multi-Step AI Agent with Tool Use in TypeScript

Friday 20/02/2026

·12 min read

You want your AI to do more than answer questions. You want it to look things up, call APIs, run calculations, and chain those results together to solve real problems. But Claude doesn't have access to your database, your APIs, or the current weather — unless you give it tools. The concept sounds simple, but building a reliable agent loop that handles multiple tool calls, errors, and conversation state is where most tutorials fall apart.

Here's how to build an AI agent with tool use in TypeScript that actually works — a complete agent loop that lets Claude call your functions, process the results, and decide what to do next, step by step.

How Claude's tool use works

Before writing code, here's the mental model. When you send Claude a message with a list of available tools, it doesn't call functions directly. Instead, it responds with a tool_use content block — basically saying "I'd like to call this function with these arguments." Your code runs the function, sends the result back, and Claude continues reasoning. This back-and-forth loop is the agent.

The flow looks like this:

You send a user message + tool definitions
Claude responds with either text or a tool_use request
Your code executes the tool and returns the result
Claude sees the result and either responds with text or requests another tool
Repeat until Claude responds with just text (no more tool calls)

Defining tools

Tools are JSON Schema descriptions of functions Claude can call. Let's build an agent that can search the web, get the current weather, and do math — enough to show multi-step reasoning.

// src/agent/tools.ts
import Anthropic from '@anthropic-ai/sdk'

export const tools: Anthropic.Tool[] = [
    {
        name: 'search_web',
        description:
            'Search the web for current information. Use this when you need up-to-date facts, news, or data.',
        input_schema: {
            type: 'object' as const,
            properties: {
                query: {
                    type: 'string',
                    description: 'The search query',
                },
            },
            required: ['query'],
        },
    },
    {
        name: 'get_weather',
        description:
            'Get the current weather for a location. Returns temperature in Celsius and conditions.',
        input_schema: {
            type: 'object' as const,
            properties: {
                location: {
                    type: 'string',
                    description: 'City name, e.g. "London" or "San Francisco, CA"',
                },
            },
            required: ['location'],
        },
    },
    {
        name: 'calculate',
        description:
            'Evaluate a mathematical expression. Use this for any calculations instead of doing math yourself.',
        input_schema: {
            type: 'object' as const,
            properties: {
                expression: {
                    type: 'string',
                    description:
                        'A mathematical expression to evaluate, e.g. "15 * 24 + 300"',
                },
            },
            required: ['expression'],
        },
    },
]

Gotcha: The description field matters more than you'd think. Claude uses it to decide when to call a tool. Vague descriptions like "does stuff with data" lead to wrong tool selections. Be specific about what the tool does and when to use it.

Implementing tool handlers

Each tool needs an actual implementation. In a real app, search_web would hit a search API (like SerpAPI or Brave Search), and get_weather would call a weather API. Here's a working setup:

// src/agent/handlers.ts
interface ToolResult {
    result: string
    isError: boolean
}

type ToolInput = Record<string, string>

async function searchWeb(query: string): Promise<string> {
    // Replace with your preferred search API
    const response = await fetch(
        `https://api.search1api.com/search?query=${encodeURIComponent(query)}&max_results=3`,
        {
            headers: {
                Authorization: `Bearer ${process.env.SEARCH_API_KEY}`,
            },
        }
    )

    if (!response.ok) {
        throw new Error(`Search API error: ${response.status}`)
    }

    const data = (await response.json()) as {
        results: { title: string; description: string; url: string }[]
    }
    return data.results
        .map((r) => `${r.title}: ${r.description} (${r.url})`)
        .join('\n\n')
}

async function getWeather(location: string): Promise<string> {
    const response = await fetch(
        `https://wttr.in/${encodeURIComponent(location)}?format=j1`
    )

    if (!response.ok) {
        throw new Error(`Weather API error: ${response.status}`)
    }

    const data = (await response.json()) as {
        current_condition: {
            temp_C: string
            weatherDesc: { value: string }[]
            humidity: string
        }[]
    }
    const current = data.current_condition[0]
    return `${location}: ${current.temp_C}°C, ${current.weatherDesc[0].value}, humidity ${current.humidity}%`
}

function calculate(expression: string): string {
    // Using Function constructor for math evaluation
    // In production, use a proper math parser like mathjs
    const sanitized = expression.replace(/[^0-9+\-*/().%\s]/g, '')
    if (sanitized !== expression.trim()) {
        throw new Error(`Invalid characters in expression: ${expression}`)
    }
    const result = new Function(`return (${sanitized})`)() as number
    return String(result)
}

export async function executeTool(
    name: string,
    input: ToolInput
): Promise<ToolResult> {
    try {
        switch (name) {
            case 'search_web': {
                const result = await searchWeb(input.query)
                return { result, isError: false }
            }
            case 'get_weather': {
                const result = await getWeather(input.location)
                return { result, isError: false }
            }
            case 'calculate': {
                const result = calculate(input.expression)
                return { result, isError: false }
            }
            default:
                return {
                    result: `Unknown tool: ${name}`,
                    isError: true,
                }
        }
    } catch (error) {
        return {
            result: error instanceof Error ? error.message : 'Tool execution failed',
            isError: true,
        }
    }
}

Notice the executeTool wrapper catches errors and returns them as ToolResult objects instead of throwing. This is important — you want Claude to see tool errors and reason about them ("the search failed, let me try a different query") rather than crashing your entire agent loop.

The agent loop

This is the core of the whole thing. The loop sends messages to Claude, checks if it wants to use tools, executes them, feeds results back, and repeats until Claude gives a final text response.

// src/agent/index.ts
import Anthropic from '@anthropic-ai/sdk'
import { tools } from './tools'
import { executeTool } from './handlers'

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })

const MAX_ITERATIONS = 10

interface AgentResponse {
    text: string
    toolCalls: { name: string; input: Record<string, string> }[]
}

export async function runAgent(userMessage: string): Promise<AgentResponse> {
    const messages: Anthropic.MessageParam[] = [
        { role: 'user', content: userMessage },
    ]

    const allToolCalls: AgentResponse['toolCalls'] = []

    for (let i = 0; i < MAX_ITERATIONS; i++) {
        const response = await client.messages.create({
            model: 'claude-sonnet-4-5-20250929',
            max_tokens: 4096,
            system: 'You are a helpful assistant with access to tools. Use them when needed to answer the user\'s question accurately. Think step by step about which tools to use.',
            tools,
            messages,
        })

        // Check if Claude wants to use any tools
        const toolUseBlocks = response.content.filter(
            (block): block is Anthropic.ToolUseBlock =>
                block.type === 'tool_use'
        )

        // If no tool calls, we're done — extract the final text
        if (toolUseBlocks.length === 0) {
            const textBlocks = response.content.filter(
                (block): block is Anthropic.TextBlock =>
                    block.type === 'text'
            )
            return {
                text: textBlocks.map((b) => b.text).join('\n'),
                toolCalls: allToolCalls,
            }
        }

        // Claude wants to use tools — add its response to the conversation
        messages.push({ role: 'assistant', content: response.content })

        // Execute each tool and collect results
        const toolResults: Anthropic.ToolResultBlockParam[] = []

        for (const toolUse of toolUseBlocks) {
            const input = toolUse.input as Record<string, string>
            allToolCalls.push({ name: toolUse.name, input })

            console.log(`→ Calling tool: ${toolUse.name}`, input)
            const result = await executeTool(toolUse.name, input)
            console.log(`← Result: ${result.result.substring(0, 100)}...`)

            toolResults.push({
                type: 'tool_result',
                tool_use_id: toolUse.id,
                content: result.result,
                is_error: result.isError,
            })
        }

        // Send tool results back to Claude
        messages.push({ role: 'user', content: toolResults })
    }

    // If we hit the iteration limit, return what we have
    const lastAssistant = messages
        .filter((m) => m.role === 'assistant')
        .pop()

    const lastText =
        lastAssistant && Array.isArray(lastAssistant.content)
            ? (lastAssistant.content as Anthropic.ContentBlock[])
                  .filter(
                      (b): b is Anthropic.TextBlock => b.type === 'text'
                  )
                  .map((b) => b.text)
                  .join('\n')
            : 'Agent reached maximum iterations without a final response.'

    return { text: lastText, toolCalls: allToolCalls }
}

Key design decisions in this loop:

MAX_ITERATIONS = 10 prevents runaway agents. Without this, a confused model could loop forever calling tools that don't help. Ten iterations is enough for complex multi-step tasks while keeping costs bounded.
Tool results go in a user message with tool_result content blocks. This is how Claude's API expects them — each result references the tool_use_id from the assistant's request.
The is_error flag tells Claude a tool failed. Claude handles this gracefully — it'll often try a different approach or explain to the user what happened.
We track all tool calls in allToolCalls so the caller can log or display what the agent did. Observability matters when your AI is making decisions.

Running it

Here's a simple script that demonstrates the agent handling a multi-step task:

// src/agent/demo.ts
import { runAgent } from './index'

async function main() {
    const result = await runAgent(
        'What\'s the weather like in Tokyo right now? If it\'s above 20°C, ' +
        'calculate how many degrees above 20 it is and search for ' +
        '"things to do in Tokyo in warm weather".'
    )

    console.log('\n=== Agent Response ===')
    console.log(result.text)
    console.log('\n=== Tool Calls Made ===')
    result.toolCalls.forEach((call, i) => {
        console.log(`${i + 1}. ${call.name}(${JSON.stringify(call.input)})`)
    })
}

main().catch(console.error)

This prompt requires the agent to:

Call get_weather for Tokyo
Check the temperature and make a decision
Conditionally call calculate and search_web

Claude handles the conditional logic itself — you don't need if/else in your code. The agent might make 1 tool call or 3, depending on the weather data it gets back.

Parallel tool calls

Claude can request multiple tools in a single response. When it returns two tool_use blocks at once, it wants both results before continuing. The agent loop above already handles this — it iterates over all toolUseBlocks and sends all results back together.

If your tools are independent (like searching and getting weather at the same time), you can speed things up by running them in parallel:

// src/agent/parallel-execution.ts
import { executeTool } from './handlers'
import Anthropic from '@anthropic-ai/sdk'

export async function executeToolsInParallel(
    toolUseBlocks: Anthropic.ToolUseBlock[]
): Promise<Anthropic.ToolResultBlockParam[]> {
    const results = await Promise.allSettled(
        toolUseBlocks.map(async (toolUse) => {
            const input = toolUse.input as Record<string, string>
            const result = await executeTool(toolUse.name, input)
            return {
                type: 'tool_result' as const,
                tool_use_id: toolUse.id,
                content: result.result,
                is_error: result.isError,
            }
        })
    )

    return results.map((result, index) => {
        if (result.status === 'fulfilled') {
            return result.value
        }
        return {
            type: 'tool_result' as const,
            tool_use_id: toolUseBlocks[index].id,
            content: `Tool execution failed: ${result.reason}`,
            is_error: true,
        }
    })
}

Using Promise.allSettled instead of Promise.all is critical here. If one tool fails, you still want to return results for the others. Claude needs all the tool_result blocks — if you skip one, the API will reject your request.

Error handling in the agent loop

Real agents hit errors. APIs go down, tools return garbage, Claude asks for a tool that doesn't exist. Here's what to watch for.

Tool execution errors are already handled — our executeTool wrapper catches them and returns isError: true. Claude sees the error message and adapts.

API errors from Claude itself need retry logic. If you've read my post on handling AI API rate limits and errors in production, you already have the withRetry utility. Wrap the client.messages.create call with it:

// Replace the bare API call with retry-wrapped version
import { withRetry } from '../lib/retry'

const response = await withRetry(() =>
    client.messages.create({
        model: 'claude-sonnet-4-5-20250929',
        max_tokens: 4096,
        system: 'You are a helpful assistant with access to tools.',
        tools,
        messages,
    })
)

Infinite loops are the sneaky one. Sometimes Claude calls the same tool with the same arguments over and over. The MAX_ITERATIONS cap helps, but you can also detect repeated calls:

// src/agent/loop-detection.ts
export function detectLoop(
    toolCalls: { name: string; input: Record<string, string> }[]
): boolean {
    if (toolCalls.length < 3) return false

    const recent = toolCalls.slice(-3)
    const keys = recent.map(
        (c) => `${c.name}:${JSON.stringify(c.input)}`
    )
    return keys[0] === keys[1] && keys[1] === keys[2]
}

If detectLoop returns true, inject a message telling Claude to try a different approach or give a final answer with what it has.

Production considerations

A few things to think about before shipping an agent to users:

Cost control. Each iteration of the agent loop is a separate API call. A 5-step agent with claude-sonnet-4-5-20250929 might cost $0.05-0.15 per run depending on context size. Multiply that by 1,000 users and you need the token budgeting from the rate limits post. Set a max_tokens that makes sense for your use case — 4096 is generous for most tool-use scenarios.

Streaming. The agent loop above waits for the full response before checking for tool calls. If you want to show partial text to users while the agent thinks, you'll need to use Claude's streaming API and buffer tool_use blocks as they arrive. I covered the streaming setup in how to stream Claude API responses in Next.js.

Logging. Log every tool call, every result, and every Claude response. When an agent does something weird in production, you need the full conversation history to debug it. The allToolCalls array in our implementation is a start, but consider sending structured logs to your monitoring system.

Timeouts. A multi-step agent can take 10-30 seconds for complex queries. Set appropriate timeouts on your HTTP endpoints, and show users a progress indicator. Nobody will wait for a blank screen.

What's next

This agent pattern is the foundation for more complex AI features. Once you have the loop working, you can add tools for anything — database queries, file operations, API integrations, even other AI models.

Next up, I'll cover how to build an AI-powered form that extracts data from PDFs — a practical use case where Claude's vision capabilities and tool use come together to turn unstructured documents into structured data.

If you're building agents that use external data, check out building a RAG chatbot in 100 lines of TypeScript — you can turn the RAG pipeline into a tool that your agent calls when it needs to search your documents.

Build a Multi-Step AI Agent with Tool Use in TypeScript

How Claude's tool use works

Defining tools

Implementing tool handlers

The agent loop

Running it

Parallel tool calls

Error handling in the agent loop

Production considerations

What's next

Vadim Alakhverdov

Related Posts

How to Structure Your TypeScript Codebase So AI Coding Agents Work Better

How to Migrate from OpenAI Assistants API to Responses API in TypeScript

Streaming AI UX in React: Handle Partial Markdown, Citations, and Error States