How to Stream Claude API Responses in a Next.js App (With Full Code)
Monday 09/02/2026
·8 min readYou've built a chatbot UI, hooked it up to Claude, and hit send — only to stare at a blank screen for 10 seconds while the model generates its entire response before anything shows up. Your users think the app is broken. You know the fix is streaming, but every tutorial you find is either for OpenAI or glosses over the Next.js-specific parts.
Here's how to stream Claude API responses token-by-token in a Next.js app, from the API route to the React component. Everything below is working code you can copy into your project.
What you'll need
Before we start, install the Anthropic TypeScript SDK:
pnpm add @anthropic-ai/sdk
You'll also need an API key from console.anthropic.com. Add it to your .env.local:
# .env.local
ANTHROPIC_API_KEY=sk-ant-...
The API route: streaming with Server-Sent Events
The core idea is simple: your Next.js API route calls Claude with streaming enabled, then forwards each text chunk to the browser as a Server-Sent Event. The browser reads these chunks and renders them incrementally.
Here's the API route:
// src/pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
})
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
if (req.method !== 'POST') {
return res.status(405).json({ error: 'Method not allowed' })
}
const { message } = req.body
if (!message || typeof message !== 'string') {
return res.status(400).json({ error: 'Missing or invalid message' })
}
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache, no-transform',
Connection: 'keep-alive',
})
try {
const stream = client.messages.stream({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: [{ role: 'user', content: message }],
})
for await (const event of stream) {
if (
event.type === 'content_block_delta' &&
event.delta.type === 'text_delta'
) {
res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`)
}
}
res.write('data: [DONE]\n\n')
res.end()
} catch (error) {
const errorMessage =
error instanceof Anthropic.APIError
? `Claude API error: ${error.status} - ${error.message}`
: 'An unexpected error occurred'
res.write(`data: ${JSON.stringify({ error: errorMessage })}\n\n`)
res.end()
}
}
A few things to note:
client.messages.stream()is the high-level streaming method from the Anthropic SDK. It returns an async iterable that yields typed events — you don't need to parse raw SSE yourself on the server side.content_block_deltawithtext_deltais the event type that carries the actual generated text. There are other event types (likemessage_start,message_stop) but we only care about the text chunks.- Error handling matters. The
Anthropic.APIErrorclass gives you structured error info — status codes, rate limit details, etc. Don't justcatch (e)and call it a day. - The
[DONE]sentinel tells the client the stream is finished. This is a convention borrowed from OpenAI's SSE format, and it works well.
Gotcha: don't use Edge Runtime for this
You might be tempted to use Next.js Edge Runtime (export const config = { runtime: 'edge' }) for lower latency. It works, but there's a catch — the Edge Runtime doesn't support the full Node.js stream module. The Anthropic SDK works fine in Edge, but if you later need to do anything with Node streams (logging, piping to a file, etc.), you'll hit walls. Stick with the default Node.js runtime unless you have a specific reason not to.
The React component: reading the stream
On the client side, we use the browser's fetch API to make a POST request and read the response as a stream using getReader(). No extra libraries needed.
// src/components/ChatStream.tsx
import { useState, useCallback } from 'react'
interface ChatMessage {
role: 'user' | 'assistant'
content: string
}
export function ChatStream() {
const [messages, setMessages] = useState<ChatMessage[]>([])
const [input, setInput] = useState('')
const [isStreaming, setIsStreaming] = useState(false)
const sendMessage = useCallback(async () => {
if (!input.trim() || isStreaming) return
const userMessage: ChatMessage = { role: 'user', content: input }
setMessages((prev) => [...prev, userMessage])
setInput('')
setIsStreaming(true)
// Add an empty assistant message that we'll fill as chunks arrive
setMessages((prev) => [...prev, { role: 'assistant', content: '' }])
try {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: input }),
})
if (!response.ok) {
throw new Error(`HTTP ${response.status}`)
}
const reader = response.body?.getReader()
const decoder = new TextDecoder()
if (!reader) {
throw new Error('No response body')
}
let buffer = ''
while (true) {
const { done, value } = await reader.read()
if (done) break
buffer += decoder.decode(value, { stream: true })
const lines = buffer.split('\n\n')
buffer = lines.pop() || ''
for (const line of lines) {
const data = line.replace(/^data: /, '')
if (data === '[DONE]') break
try {
const parsed = JSON.parse(data) as
| { text: string }
| { error: string }
if ('error' in parsed) {
throw new Error(parsed.error)
}
setMessages((prev) => {
const updated = [...prev]
const last = updated[updated.length - 1]
if (last && last.role === 'assistant') {
last.content += parsed.text
}
return updated
})
} catch {
// skip malformed chunks
}
}
}
} catch (error) {
setMessages((prev) => {
const updated = [...prev]
const last = updated[updated.length - 1]
if (last && last.role === 'assistant') {
last.content =
error instanceof Error
? `Error: ${error.message}`
: 'Something went wrong'
}
return updated
})
} finally {
setIsStreaming(false)
}
}, [input, isStreaming])
return (
<div className="max-w-2xl mx-auto p-4">
<div className="space-y-4 mb-4">
{messages.map((msg, i) => (
<div
key={i}
className={`p-3 rounded-lg ${
msg.role === 'user'
? 'bg-blue-100 ml-12'
: 'bg-gray-100 mr-12'
}`}
>
<p className="text-sm font-medium mb-1">
{msg.role === 'user' ? 'You' : 'Claude'}
</p>
<p className="whitespace-pre-wrap">{msg.content}</p>
</div>
))}
</div>
<div className="flex gap-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Ask Claude something..."
className="flex-1 border rounded-lg px-4 py-2"
disabled={isStreaming}
/>
<button
onClick={sendMessage}
disabled={isStreaming}
className="bg-blue-500 text-white px-6 py-2 rounded-lg disabled:opacity-50"
>
{isStreaming ? '...' : 'Send'}
</button>
</div>
</div>
)
}
The important part is the buffer management. SSE chunks don't always arrive as complete data: ...\n\n lines — they can be split across multiple reads. The buffer variable accumulates partial data and only processes complete lines. Skip this step and you'll get random JSON parse errors in production.
Adding a system prompt and conversation history
The example above sends a single message, but a real chat app needs conversation history. Here's how to extend the API route:
// src/pages/api/chat.ts — updated body parsing
interface ChatRequestBody {
messages: Array<{ role: 'user' | 'assistant'; content: string }>
system?: string
}
// Inside the handler, replace the single message with:
const { messages: chatMessages, system } = req.body as ChatRequestBody
const stream = client.messages.stream({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
system: system || 'You are a helpful assistant.',
messages: chatMessages,
})
On the client side, send the full messages array instead of a single string:
// In ChatStream.tsx, update the fetch call
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
messages: [...messages, userMessage].map(({ role, content }) => ({
role,
content,
})),
system: 'You are a helpful coding assistant.',
}),
})
Gotcha: token limits with long conversations
Claude has a context window (200K tokens for Sonnet), but you're paying for every token in every request. A conversation with 50 back-and-forth messages means you're sending all 50 messages each time. Two options:
- Truncate old messages — keep the last N messages and summarize older ones
- Use prompt caching — Anthropic's prompt caching feature lets you cache the system prompt and early messages so you only pay for them once. Check the docs for how to set cache breakpoints.
Handling cancellation
Users will click away or hit stop mid-stream. If you don't handle this, you'll keep streaming tokens you're paying for but nobody's reading.
On the client side, use an AbortController:
// src/hooks/useStreamAbort.ts
import { useRef, useCallback } from 'react'
export function useStreamAbort() {
const controllerRef = useRef<AbortController | null>(null)
const startStream = useCallback((url: string, body: unknown) => {
controllerRef.current?.abort()
controllerRef.current = new AbortController()
return fetch(url, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body),
signal: controllerRef.current.signal,
})
}, [])
const stopStream = useCallback(() => {
controllerRef.current?.abort()
controllerRef.current = null
}, [])
return { startStream, stopStream }
}
On the API side, listen for the client disconnect:
// Inside the handler, before the streaming loop
req.on('close', () => {
stream.controller.abort()
})
The stream.controller.abort() call tells the Anthropic SDK to cancel the request, which stops token generation and billing.
The full picture
Here's what the data flow looks like:
- User types a message and hits Enter
- React component sends a POST request to
/api/chat - API route creates a streaming request to Claude via the Anthropic SDK
- As Claude generates tokens, the SDK yields
content_block_deltaevents - The API route writes each text chunk as an SSE event
- The browser reads chunks via
getReader()and updates the UI - When Claude finishes (or the user cancels), the stream closes
No WebSockets, no third-party streaming libraries, no complicated infrastructure. Just HTTP, SSE, and the Anthropic SDK.
What's next
Streaming text is just the beginning. In an upcoming post, I'll cover how to build a multi-step AI agent with tool use in TypeScript — where Claude doesn't just generate text, but calls functions, queries databases, and takes actions based on the results. The streaming patterns from this post will be the foundation for that.