How to Stream Claude API Responses in a Next.js App (With Full Code)

Monday 09/02/2026

·8 min read

You've built a chatbot UI, hooked it up to Claude, and hit send — only to stare at a blank screen for 10 seconds while the model generates its entire response before anything shows up. Your users think the app is broken. You know the fix is streaming, but every tutorial you find is either for OpenAI or glosses over the Next.js-specific parts.

Here's how to stream Claude API responses token-by-token in a Next.js app, from the API route to the React component. Everything below is working code you can copy into your project.

What you'll need

Before we start, install the Anthropic TypeScript SDK:

pnpm add @anthropic-ai/sdk

You'll also need an API key from console.anthropic.com. Add it to your .env.local:

# .env.local
ANTHROPIC_API_KEY=sk-ant-...

The API route: streaming with Server-Sent Events

The core idea is simple: your Next.js API route calls Claude with streaming enabled, then forwards each text chunk to the browser as a Server-Sent Event. The browser reads these chunks and renders them incrementally.

Here's the API route:

// src/pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY,
})

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    if (req.method !== 'POST') {
        return res.status(405).json({ error: 'Method not allowed' })
    }

    const { message } = req.body

    if (!message || typeof message !== 'string') {
        return res.status(400).json({ error: 'Missing or invalid message' })
    }

    res.writeHead(200, {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache, no-transform',
        Connection: 'keep-alive',
    })

    try {
        const stream = client.messages.stream({
            model: 'claude-sonnet-4-5-20250929',
            max_tokens: 1024,
            messages: [{ role: 'user', content: message }],
        })

        for await (const event of stream) {
            if (
                event.type === 'content_block_delta' &&
                event.delta.type === 'text_delta'
            ) {
                res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`)
            }
        }

        res.write('data: [DONE]\n\n')
        res.end()
    } catch (error) {
        const errorMessage =
            error instanceof Anthropic.APIError
                ? `Claude API error: ${error.status} - ${error.message}`
                : 'An unexpected error occurred'

        res.write(`data: ${JSON.stringify({ error: errorMessage })}\n\n`)
        res.end()
    }
}

A few things to note:

client.messages.stream() is the high-level streaming method from the Anthropic SDK. It returns an async iterable that yields typed events — you don't need to parse raw SSE yourself on the server side.
content_block_delta with text_delta is the event type that carries the actual generated text. There are other event types (like message_start, message_stop) but we only care about the text chunks.
Error handling matters. The Anthropic.APIError class gives you structured error info — status codes, rate limit details, etc. Don't just catch (e) and call it a day.
The [DONE] sentinel tells the client the stream is finished. This is a convention borrowed from OpenAI's SSE format, and it works well.

Gotcha: don't use Edge Runtime for this

You might be tempted to use Next.js Edge Runtime (export const config = { runtime: 'edge' }) for lower latency. It works, but there's a catch — the Edge Runtime doesn't support the full Node.js stream module. The Anthropic SDK works fine in Edge, but if you later need to do anything with Node streams (logging, piping to a file, etc.), you'll hit walls. Stick with the default Node.js runtime unless you have a specific reason not to.

The React component: reading the stream

On the client side, we use the browser's fetch API to make a POST request and read the response as a stream using getReader(). No extra libraries needed.

// src/components/ChatStream.tsx
import { useState, useCallback } from 'react'

interface ChatMessage {
    role: 'user' | 'assistant'
    content: string
}

export function ChatStream() {
    const [messages, setMessages] = useState<ChatMessage[]>([])
    const [input, setInput] = useState('')
    const [isStreaming, setIsStreaming] = useState(false)

    const sendMessage = useCallback(async () => {
        if (!input.trim() || isStreaming) return

        const userMessage: ChatMessage = { role: 'user', content: input }
        setMessages((prev) => [...prev, userMessage])
        setInput('')
        setIsStreaming(true)

        // Add an empty assistant message that we'll fill as chunks arrive
        setMessages((prev) => [...prev, { role: 'assistant', content: '' }])

        try {
            const response = await fetch('/api/chat', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ message: input }),
            })

            if (!response.ok) {
                throw new Error(`HTTP ${response.status}`)
            }

            const reader = response.body?.getReader()
            const decoder = new TextDecoder()

            if (!reader) {
                throw new Error('No response body')
            }

            let buffer = ''

            while (true) {
                const { done, value } = await reader.read()
                if (done) break

                buffer += decoder.decode(value, { stream: true })
                const lines = buffer.split('\n\n')
                buffer = lines.pop() || ''

                for (const line of lines) {
                    const data = line.replace(/^data: /, '')
                    if (data === '[DONE]') break
                    try {
                        const parsed = JSON.parse(data) as
                            | { text: string }
                            | { error: string }
                        if ('error' in parsed) {
                            throw new Error(parsed.error)
                        }
                        setMessages((prev) => {
                            const updated = [...prev]
                            const last = updated[updated.length - 1]
                            if (last && last.role === 'assistant') {
                                last.content += parsed.text
                            }
                            return updated
                        })
                    } catch {
                        // skip malformed chunks
                    }
                }
            }
        } catch (error) {
            setMessages((prev) => {
                const updated = [...prev]
                const last = updated[updated.length - 1]
                if (last && last.role === 'assistant') {
                    last.content =
                        error instanceof Error
                            ? `Error: ${error.message}`
                            : 'Something went wrong'
                }
                return updated
            })
        } finally {
            setIsStreaming(false)
        }
    }, [input, isStreaming])

    return (
        <div className="max-w-2xl mx-auto p-4">
            <div className="space-y-4 mb-4">
                {messages.map((msg, i) => (
                    <div
                        key={i}
                        className={`p-3 rounded-lg ${
                            msg.role === 'user'
                                ? 'bg-blue-100 ml-12'
                                : 'bg-gray-100 mr-12'
                        }`}
                    >
                        <p className="text-sm font-medium mb-1">
                            {msg.role === 'user' ? 'You' : 'Claude'}
                        </p>
                        <p className="whitespace-pre-wrap">{msg.content}</p>
                    </div>
                ))}
            </div>
            <div className="flex gap-2">
                <input
                    type="text"
                    value={input}
                    onChange={(e) => setInput(e.target.value)}
                    onKeyDown={(e) => e.key === 'Enter' && sendMessage()}
                    placeholder="Ask Claude something..."
                    className="flex-1 border rounded-lg px-4 py-2"
                    disabled={isStreaming}
                />
                <button
                    onClick={sendMessage}
                    disabled={isStreaming}
                    className="bg-blue-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
                >
                    {isStreaming ? '...' : 'Send'}
                </button>
            </div>
        </div>
    )
}

The important part is the buffer management. SSE chunks don't always arrive as complete data: ...\n\n lines — they can be split across multiple reads. The buffer variable accumulates partial data and only processes complete lines. Skip this step and you'll get random JSON parse errors in production.

Adding a system prompt and conversation history

The example above sends a single message, but a real chat app needs conversation history. Here's how to extend the API route:

// src/pages/api/chat.ts — updated body parsing
interface ChatRequestBody {
    messages: Array<{ role: 'user' | 'assistant'; content: string }>
    system?: string
}

// Inside the handler, replace the single message with:
const { messages: chatMessages, system } = req.body as ChatRequestBody

const stream = client.messages.stream({
    model: 'claude-sonnet-4-5-20250929',
    max_tokens: 1024,
    system: system || 'You are a helpful assistant.',
    messages: chatMessages,
})

On the client side, send the full messages array instead of a single string:

// In ChatStream.tsx, update the fetch call
const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        messages: [...messages, userMessage].map(({ role, content }) => ({
            role,
            content,
        })),
        system: 'You are a helpful coding assistant.',
    }),
})

Gotcha: token limits with long conversations

Claude has a context window (200K tokens for Sonnet), but you're paying for every token in every request. A conversation with 50 back-and-forth messages means you're sending all 50 messages each time. Two options:

Truncate old messages — keep the last N messages and summarize older ones
Use prompt caching — Anthropic's prompt caching feature lets you cache the system prompt and early messages so you only pay for them once. Check the docs for how to set cache breakpoints.

Handling cancellation

Users will click away or hit stop mid-stream. If you don't handle this, you'll keep streaming tokens you're paying for but nobody's reading.

On the client side, use an AbortController:

// src/hooks/useStreamAbort.ts
import { useRef, useCallback } from 'react'

export function useStreamAbort() {
    const controllerRef = useRef<AbortController | null>(null)

    const startStream = useCallback((url: string, body: unknown) => {
        controllerRef.current?.abort()
        controllerRef.current = new AbortController()

        return fetch(url, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify(body),
            signal: controllerRef.current.signal,
        })
    }, [])

    const stopStream = useCallback(() => {
        controllerRef.current?.abort()
        controllerRef.current = null
    }, [])

    return { startStream, stopStream }
}

On the API side, listen for the client disconnect:

// Inside the handler, before the streaming loop
req.on('close', () => {
    stream.controller.abort()
})

The stream.controller.abort() call tells the Anthropic SDK to cancel the request, which stops token generation and billing.

The full picture

Here's what the data flow looks like:

User types a message and hits Enter
React component sends a POST request to /api/chat
API route creates a streaming request to Claude via the Anthropic SDK
As Claude generates tokens, the SDK yields content_block_delta events
The API route writes each text chunk as an SSE event
The browser reads chunks via getReader() and updates the UI
When Claude finishes (or the user cancels), the stream closes

No WebSockets, no third-party streaming libraries, no complicated infrastructure. Just HTTP, SSE, and the Anthropic SDK.

What's next

Streaming text is just the beginning. In an upcoming post, I'll cover how to build a multi-step AI agent with tool use in TypeScript — where Claude doesn't just generate text, but calls functions, queries databases, and takes actions based on the results. The streaming patterns from this post will be the foundation for that.

How to Stream Claude API Responses in a Next.js App (With Full Code)

What you'll need

The API route: streaming with Server-Sent Events

Gotcha: don't use Edge Runtime for this

The React component: reading the stream

Adding a system prompt and conversation history

Gotcha: token limits with long conversations

Handling cancellation

The full picture

What's next

Vadim Alakhverdov

Related Posts

Build a Slack Bot That Answers Questions About Your Codebase

How to Cache AI Responses Without Breaking Your App

The Real Cost of Running an AI Feature in Production (With Math)