Streaming AI UX in React: Handle Partial Markdown, Citations, and Error States

Friday 01/05/2026

·12 min read

Your chat UI works on the demo. It breaks in production. The first token streams in, the markdown half-parses into a mangled <h1>, then re-flows three times as more tokens arrive. A code block opens but never closes, so the whole rest of the message renders as <pre>. Citations appear as [1] floating in plain text. Then the network blips at token 200 and the user is staring at a half-finished sentence with no way to retry.

This post is about fixing all of that. Streaming AI UI in React looks easy until you actually ship it — partial markdown, mid-stream citations, dropped connections, and scroll behavior are where the real work hides. Here are the patterns I use, with copy-pasteable components that work with any streaming source (Anthropic, OpenAI, Vercel AI SDK, raw SSE — doesn't matter).

The streaming source: a provider-agnostic hook

Most AI SDKs ship their own React hooks. They're fine, but they couple your UI to one provider, and the rendering problems below are identical regardless of source. Start with a tiny hook that takes a stream of text chunks and exposes the accumulated text, status, and an abort handle.

pnpm install react react-markdown remark-gfm rehype-raw

// src/hooks/useChatStream.ts
import { useCallback, useRef, useState } from 'react'

export type StreamStatus = 'idle' | 'streaming' | 'done' | 'error'

export type ChatStreamState = {
    text: string
    status: StreamStatus
    error: Error | null
}

export type StartArgs = {
    url: string
    body: unknown
    signal?: AbortSignal
}

export function useChatStream() {
    const [state, setState] = useState<ChatStreamState>({
        text: '',
        status: 'idle',
        error: null,
    })
    const controllerRef = useRef<AbortController | null>(null)

    const start = useCallback(async ({ url, body }: StartArgs) => {
        controllerRef.current?.abort()
        const controller = new AbortController()
        controllerRef.current = controller

        setState({ text: '', status: 'streaming', error: null })

        try {
            const res = await fetch(url, {
                method: 'POST',
                headers: { 'content-type': 'application/json' },
                body: JSON.stringify(body),
                signal: controller.signal,
            })
            if (!res.ok || !res.body) {
                throw new Error(`HTTP ${res.status}`)
            }

            const reader = res.body.getReader()
            const decoder = new TextDecoder()
            let buf = ''

            while (true) {
                const { value, done } = await reader.read()
                if (done) break
                buf += decoder.decode(value, { stream: true })
                setState((s) => ({ ...s, text: s.text + decoder.decode(value) }))
            }

            setState((s) => ({ ...s, status: 'done' }))
        } catch (err) {
            if ((err as Error).name === 'AbortError') return
            setState((s) => ({
                ...s,
                status: 'error',
                error: err as Error,
            }))
        }
    }, [])

    const stop = useCallback(() => {
        controllerRef.current?.abort()
        setState((s) => ({ ...s, status: 'idle' }))
    }, [])

    return { ...state, start, stop }
}

The hook is provider-agnostic on purpose. Your /api/chat endpoint can stream from Anthropic, OpenAI, or anything else — the UI doesn't care. If you need a primer on the server side, my earlier post on streaming Claude responses in Next.js walks through the route handler.

Rendering partial markdown without layout jank

Here's the problem: react-markdown parses the whole string on every render. While streaming, you're feeding it incomplete markdown. A half-arrived **bold becomes literal asterisks. A code fence opened with ```ts but not yet closed turns the rest of the document into a code block. When the closing fence arrives, the entire DOM shifts.

The fix is not "wait for the full message" — that defeats streaming. It's a tiny preprocessor that closes obviously-unfinished structures for rendering only, then re-parses cleanly when the stream completes.

// src/lib/closeOpenMarkdown.ts
const FENCE = /```/g

export function closeOpenMarkdown(text: string): string {
    let out = text

    // Close unterminated code fences. Count fences; if odd, append one.
    const fenceCount = (out.match(FENCE) || []).length
    if (fenceCount % 2 === 1) {
        out += '\n```'
    }

    // Close unterminated inline code (single backticks on the same line).
    const lines = out.split('\n')
    const fixed = lines.map((line) => {
        if (line.startsWith('```')) return line
        const ticks = (line.match(/`/g) || []).length
        return ticks % 2 === 1 ? line + '`' : line
    })
    out = fixed.join('\n')

    // Close unterminated bold/italic at end of string.
    const trailingStars = out.match(/\*+$/)?.[0] ?? ''
    if (trailingStars.length === 1) out += '*'
    if (trailingStars.length === 2) out += '**'

    return out
}

This is a heuristic, not a parser. It will misfire on contrived inputs (an asterisk inside a code block, etc.) but in practice it eliminates 95% of the visible jank during streaming and you throw it away the moment the stream finishes.

Now the renderer:

// src/components/StreamingMarkdown.tsx
import ReactMarkdown from 'react-markdown'
import remarkGfm from 'remark-gfm'
import { useMemo } from 'react'
import { closeOpenMarkdown } from '@/src/lib/closeOpenMarkdown'

type Props = {
    text: string
    streaming: boolean
}

export function StreamingMarkdown({ text, streaming }: Props) {
    const safeText = useMemo(
        () => (streaming ? closeOpenMarkdown(text) : text),
        [text, streaming]
    )

    return (
        <div className="prose prose-invert max-w-none">
            <ReactMarkdown
                remarkPlugins={[remarkGfm]}
                components={{
                    code({ inline, className, children, ...props }) {
                        if (inline) {
                            return <code className={className} {...props}>{children}</code>
                        }
                        return (
                            <pre className="overflow-x-auto rounded bg-zinc-900 p-3">
                                <code className={className} {...props}>{children}</code>
                            </pre>
                        )
                    },
                }}
            >
                {safeText}
            </ReactMarkdown>
            {streaming && <span className="inline-block h-4 w-2 animate-pulse bg-current align-middle ml-1" />}
        </div>
    )
}

Two details that matter. First, useMemo on the closed text: without it, you re-run the regex on every keystroke-equivalent re-render and the stream feels sluggish on long messages. Second, the blinking cursor is rendered as a separate inline span. If you shove it inside the markdown source you'll fight the parser forever.

Citations that appear mid-stream

Most providers stream citations as separate events, not inline tokens. With Claude's Citations API you get citation deltas alongside text deltas. The naive approach — appending [1] into the text and listing sources at the bottom — looks ugly and breaks during streaming because the citation might arrive before the sentence it refers to.

Better: track citations as structured data, and render them as superscript markers using a custom component that resolves to a tooltip or footnote on hover.

// src/types/chat.ts
export type Citation = {
    id: string
    title: string
    url: string
    quote: string
    startIndex: number
    endIndex: number
}

export type StreamEvent =
    | { type: 'text'; delta: string }
    | { type: 'citation'; citation: Citation }
    | { type: 'error'; message: string }
    | { type: 'done' }

Extend the hook to emit structured events. Server-side, encode them as SSE lines (one JSON object per line). Client-side, parse and dispatch:

// src/hooks/useStructuredStream.ts
import { useCallback, useRef, useState } from 'react'
import type { Citation, StreamEvent } from '@/src/types/chat'

export type StructuredState = {
    text: string
    citations: Citation[]
    status: 'idle' | 'streaming' | 'done' | 'error'
    error: string | null
}

export function useStructuredStream() {
    const [state, setState] = useState<StructuredState>({
        text: '',
        citations: [],
        status: 'idle',
        error: null,
    })
    const controllerRef = useRef<AbortController | null>(null)

    const start = useCallback(async (url: string, body: unknown) => {
        const controller = new AbortController()
        controllerRef.current = controller
        setState({ text: '', citations: [], status: 'streaming', error: null })

        try {
            const res = await fetch(url, {
                method: 'POST',
                headers: { 'content-type': 'application/json' },
                body: JSON.stringify(body),
                signal: controller.signal,
            })
            if (!res.ok || !res.body) throw new Error(`HTTP ${res.status}`)

            const reader = res.body.getReader()
            const decoder = new TextDecoder()
            let buf = ''

            while (true) {
                const { value, done } = await reader.read()
                if (done) break
                buf += decoder.decode(value, { stream: true })
                const lines = buf.split('\n')
                buf = lines.pop() ?? ''

                for (const line of lines) {
                    if (!line.trim()) continue
                    const evt = JSON.parse(line) as StreamEvent
                    if (evt.type === 'text') {
                        setState((s) => ({ ...s, text: s.text + evt.delta }))
                    } else if (evt.type === 'citation') {
                        setState((s) => ({
                            ...s,
                            citations: [...s.citations, evt.citation],
                        }))
                    } else if (evt.type === 'error') {
                        setState((s) => ({ ...s, status: 'error', error: evt.message }))
                    }
                }
            }

            setState((s) => ({ ...s, status: 'done' }))
        } catch (err) {
            if ((err as Error).name === 'AbortError') return
            setState((s) => ({ ...s, status: 'error', error: (err as Error).message }))
        }
    }, [])

    return { ...state, start, stop: () => controllerRef.current?.abort() }
}

Now the citation marker. Render [^id] placeholders in the markdown stream and replace them with clickable superscripts via a custom node:

// src/components/CitationMarker.tsx
import type { Citation } from '@/src/types/chat'

type Props = {
    citation: Citation
}

export function CitationMarker({ citation }: Props) {
    return (
        <a
            href={citation.url}
            target="_blank"
            rel="noopener noreferrer"
            title={citation.quote}
            className="inline-block align-super text-xs text-blue-400 hover:underline ml-0.5"
        >
            [{citation.title.slice(0, 24)}]
        </a>
    )
}

Wire it into the markdown renderer by intercepting text nodes and replacing [^id] patterns. The trick: do it as a remark transformation, not a regex on the rendered HTML, otherwise you lose markdown context.

If a citation arrives before its anchor text, hold it in state and let the marker render once the anchor catches up. The endIndex field tells you where in the message it belongs.

Mid-stream errors: the part everyone skips

Networks drop. Models hit rate limits. Tokens run out mid-message. The default behavior of fetch + ReadableStream on a dropped connection is silence — the stream just stops, and your UI shows half a sentence forever.

Three things to handle:

// src/components/StreamingMessage.tsx
import { StreamingMarkdown } from './StreamingMarkdown'
import type { StructuredState } from '@/src/hooks/useStructuredStream'

type Props = {
    state: StructuredState
    onRetry: () => void
}

export function StreamingMessage({ state, onRetry }: Props) {
    return (
        <div className="space-y-2">
            <StreamingMarkdown text={state.text} streaming={state.status === 'streaming'} />

            {state.status === 'error' && (
                <div className="rounded border border-red-500/40 bg-red-500/10 p-3 text-sm">
                    <div className="font-medium text-red-300">
                        Stream interrupted
                    </div>
                    <div className="text-red-200/80 mt-1">
                        {state.error ?? 'Connection dropped'}
                    </div>
                    <button
                        onClick={onRetry}
                        className="mt-2 rounded bg-red-500/20 px-3 py-1 text-xs hover:bg-red-500/30"
                    >
                        Retry from here
                    </button>
                </div>
            )}

            {state.status === 'done' && state.citations.length > 0 && (
                <details className="text-xs text-zinc-400">
                    <summary className="cursor-pointer">
                        Sources ({state.citations.length})
                    </summary>
                    <ul className="mt-1 space-y-1">
                        {state.citations.map((c) => (
                            <li key={c.id}>
                                <a href={c.url} className="hover:underline">{c.title}</a>
                            </li>
                        ))}
                    </ul>
                </details>
            )}
        </div>
    )
}

Notice: when the error hits, we keep the partial text visible. The user can read what arrived before the failure. The retry button is honest — "retry from here" — not a generic "Try again" that throws away the partial response.

For the retry itself, send the partial text back as an assistant message with prefill semantics (Claude supports this directly; with OpenAI you concatenate into the next user turn). The model continues from where it left off. Don't restart from scratch unless the partial response is unusable — you've already paid for those tokens.

Smart auto-scroll: the scroll-jacking trap

Default behavior: as text streams in, the message grows, the user scrolls up to read something earlier, the next chunk arrives, and you yank them back to the bottom. Infuriating.

The rule: stick to the bottom only if the user is already near the bottom. Once they scroll up, stop chasing them.

// src/hooks/useStickyScroll.ts
import { useEffect, useRef } from 'react'

export function useStickyScroll(dep: unknown) {
    const ref = useRef<HTMLDivElement | null>(null)
    const stuckRef = useRef(true)

    useEffect(() => {
        const el = ref.current
        if (!el) return
        const onScroll = () => {
            const distance = el.scrollHeight - el.scrollTop - el.clientHeight
            stuckRef.current = distance < 80
        }
        el.addEventListener('scroll', onScroll, { passive: true })
        return () => el.removeEventListener('scroll', onScroll)
    }, [])

    useEffect(() => {
        const el = ref.current
        if (!el || !stuckRef.current) return
        el.scrollTop = el.scrollHeight
    }, [dep])

    return ref
}

Apply it to the scroll container, passing state.text as the dep so it triggers on every chunk:

const scrollRef = useStickyScroll(state.text)
return <div ref={scrollRef} className="overflow-y-auto" />

The 80px threshold is forgiving — a user who scrolls up two lines and back down still gets auto-scroll. A user who scrolls up to read earlier in the conversation does not.

Gotchas worth knowing

A few things that bit me along the way:

useMemo on the markdown preprocessor matters. Without it, every chunk re-runs the regex on the entire accumulated string. On a 2000-token response that's noticeable lag.
Don't setState per byte. Decode the chunk, batch the update. The hook above does this implicitly via React's batching, but if you're manually setting more than once per chunk you'll re-render too often.
AbortError is not an error. When the user closes the tab or you call stop(), you'll get an AbortError from fetch. Filter it out of your error UI or every cancellation looks like a crash.
Citation indices drift. If you do any text post-processing (trimming whitespace, normalizing newlines), citation startIndex/endIndex no longer line up. Apply citations before any text mutation, or carry them in a separate stream that doesn't depend on absolute positions.

If you're building a full RAG chat with sources, the next post in this series — citations and source attribution for RAG chatbots — goes deeper on Claude's Citations API specifically. The streaming UI here also pairs naturally with multi-step agent flows: each tool call from Build a multi-step AI agent with tool use in TypeScript can stream its intermediate thinking through the same hook, so users see "searching documents...", "reading file...", "generating answer..." instead of a blank pending state. For testing this kind of UI without burning real API tokens, my earlier post on testing AI features covers snapshot testing for streaming output.

What's next

Next up: Add AI-Powered Citations and Source Attribution to Your RAG Chatbot — going from streaming UX to wiring real citations from your retrieval pipeline through the UI components above.

Streaming AI UX in React: Handle Partial Markdown, Citations, and Error States

The streaming source: a provider-agnostic hook

Rendering partial markdown without layout jank

Citations that appear mid-stream

Mid-stream errors: the part everyone skips

Smart auto-scroll: the scroll-jacking trap

Gotchas worth knowing

What's next

Vadim Alakhverdov

Related Posts

Build a Text-to-SQL Feature: Let Users Query Your Database in Plain English

Cut Your Claude API Bill by 90% Using Prompt Caching in TypeScript

When Your AI Feature Gets Gamed: Prompt Injection Defense for JavaScript Apps