Add AI-Powered Citations and Source Attribution to Your RAG Chatbot

Friday 08/05/2026

·14 min read
Share:

Your RAG chatbot gives a confident answer about your refund policy, and the user asks "where did you get that from?" You can't tell them. The answer is plausible, the source documents are loaded into the context, but there's no link between any sentence in the response and any chunk you retrieved. Half your users assume the bot is hallucinating, and the other half trust it too much.

Citations fix this — but only if you build them right. Slapping a "[1]" at the end of a paragraph is not source attribution. Real citations point to specific spans of specific documents, render as clickable markers inline with the answer, and let you measure when the model is making things up. Here's how to do it in TypeScript with Claude's Citations API and a small React UI.

Why most RAG chatbots skip citations

The naive approach is to tell the model "include [1], [2] markers and a sources list at the end." This works on demos and falls apart in production for two reasons.

First, the model invents citation numbers. It will write [3] when only two chunks were retrieved, or attribute a sentence to a chunk that has nothing to do with it. There is no mechanism forcing the citation to be grounded in the actual retrieved text.

Second, you get document-level attribution at best. The user clicks "[1]" and lands on a 40-page PDF. The whole point of a citation is that it points to the specific sentence the answer was based on. Without span-level grounding, citations are decorative.

Claude's Citations API (released late 2024 and now stable) solves both. It guarantees that every citation references a real text span, and it gives you exact character offsets to render precise markers in the UI.

What we're building

A RAG chatbot where:

  1. The backend retrieves relevant chunks from a vector store (Supabase pgvector here, but any source works — see AI search with embeddings and Supabase for setting up the embeddings + pgvector layer if you don't already have one).
  2. Claude answers using the Citations API, which returns the answer split into segments — each one tagged with the exact source span it came from.
  3. The frontend streams the response and renders each citation as a clickable marker that opens the original document at the cited passage.
  4. We add a hallucination check that flags sentences without any citation backing.

You can drop this into the RAG chatbot from week 1 or any existing pipeline.

pnpm add @anthropic-ai/sdk @supabase/supabase-js zod

The backend: retrieval + Citations API

Claude's Citations API expects each source to be sent as a document content block with citations.enabled: true. The model then returns text blocks where each piece of text can carry a citations array pointing back to specific spans of those documents.

// src/lib/rag-with-citations.ts
import Anthropic from "@anthropic-ai/sdk"
import { createClient } from "@supabase/supabase-js"

const anthropic = new Anthropic()
const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!)

export type RetrievedChunk = {
    id: string
    documentId: string
    documentTitle: string
    content: string
    url: string
}

export type CitedSpan = {
    text: string
    citations: Array<{
        documentId: string
        documentTitle: string
        url: string
        citedText: string
        startChar: number
        endChar: number
    }>
}

export async function retrieveChunks(
    query: string,
    topK = 5
): Promise<RetrievedChunk[]> {
    const embedding = await embed(query)
    const { data, error } = await supabase.rpc("match_chunks", {
        query_embedding: embedding,
        match_count: topK,
    })
    if (error) throw error
    return data.map((row: any) => ({
        id: row.id,
        documentId: row.document_id,
        documentTitle: row.document_title,
        content: row.content,
        url: row.url,
    }))
}

async function embed(text: string): Promise<number[]> {
    const res = await fetch("https://api.openai.com/v1/embeddings", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
        },
        body: JSON.stringify({
            model: "text-embedding-3-small",
            input: text,
        }),
    })
    if (!res.ok) throw new Error(`Embedding failed: ${res.status}`)
    const json = await res.json()
    return json.data[0].embedding
}

I'm using OpenAI for embeddings and Anthropic for generation — that's the cheap-and-good combination right now. Swap embeddings for Voyage or Cohere if you prefer. The vector store doesn't matter; pgvector is the easiest if you're already on Supabase.

Now the part that actually uses citations:

// src/lib/rag-with-citations.ts (continued)
export async function answerWithCitations(
    question: string,
    chunks: RetrievedChunk[]
): Promise<CitedSpan[]> {
    const documents = chunks.map((chunk, i) => ({
        type: "document" as const,
        source: {
            type: "text" as const,
            media_type: "text/plain" as const,
            data: chunk.content,
        },
        title: chunk.documentTitle,
        context: `Document ID: ${chunk.documentId} | URL: ${chunk.url}`,
        citations: { enabled: true },
    }))

    const response = await anthropic.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        system:
            "Answer the user's question using only the provided documents. " +
            "Be concise. If the documents don't contain the answer, say so explicitly — do not guess.",
        messages: [
            {
                role: "user",
                content: [
                    ...documents,
                    { type: "text", text: question },
                ],
            },
        ],
    })

    const chunkByIndex = new Map(chunks.map((c, i) => [i, c]))
    const result: CitedSpan[] = []

    for (const block of response.content) {
        if (block.type !== "text") continue
        const citations = (block.citations ?? []).map((cit: any) => {
            const chunk = chunkByIndex.get(cit.document_index)!
            return {
                documentId: chunk.documentId,
                documentTitle: chunk.documentTitle,
                url: chunk.url,
                citedText: cit.cited_text,
                startChar: cit.start_char_index ?? 0,
                endChar: cit.end_char_index ?? 0,
            }
        })
        result.push({ text: block.text, citations })
    }

    return result
}

A few things to call out.

The context field on each document is a free-form string Claude won't quote from but uses for disambiguation. Putting the document ID and URL there means you can identify the source later without maintaining a separate map. Though I still keep the map because it's faster than parsing context strings.

The response comes back as multiple text blocks. Each block is one segment of the answer with its own citations array. A typical answer with three sentences might be five blocks — three cited segments interleaved with two short uncited connectors like " and ". You concatenate the text in order; the citation metadata stays attached.

start_char_index and end_char_index refer to character offsets within the original document (the data field), not the response. Save them so the frontend can highlight the exact passage when the user clicks.

The streaming endpoint

Citations work fine with streaming, and you want streaming because token-level latency matters more than ever once the response includes structured metadata. Here's the Next.js Pages Router endpoint:

// src/pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from "next"
import Anthropic from "@anthropic-ai/sdk"
import { retrieveChunks } from "@/src/lib/rag-with-citations"
import { z } from "zod"

const anthropic = new Anthropic()

const bodySchema = z.object({
    question: z.string().min(1).max(2000),
})

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    if (req.method !== "POST") {
        res.status(405).end()
        return
    }

    const parsed = bodySchema.safeParse(req.body)
    if (!parsed.success) {
        res.status(400).json({ error: parsed.error.message })
        return
    }

    res.setHeader("Content-Type", "text/event-stream")
    res.setHeader("Cache-Control", "no-cache, no-transform")
    res.setHeader("Connection", "keep-alive")

    const send = (event: string, data: unknown) => {
        res.write(`event: ${event}\n`)
        res.write(`data: ${JSON.stringify(data)}\n\n`)
    }

    try {
        const chunks = await retrieveChunks(parsed.data.question, 5)
        send("sources", chunks.map((c) => ({
            documentId: c.documentId,
            documentTitle: c.documentTitle,
            url: c.url,
        })))

        const stream = anthropic.messages.stream({
            model: "claude-sonnet-4-6",
            max_tokens: 1024,
            system:
                "Answer the user's question using only the provided documents. " +
                "Be concise. If the documents don't contain the answer, say so explicitly.",
            messages: [
                {
                    role: "user",
                    content: [
                        ...chunks.map((c, i) => ({
                            type: "document" as const,
                            source: {
                                type: "text" as const,
                                media_type: "text/plain" as const,
                                data: c.content,
                            },
                            title: c.documentTitle,
                            context: `Document ID: ${c.documentId} | URL: ${c.url}`,
                            citations: { enabled: true },
                        })),
                        { type: "text", text: parsed.data.question },
                    ],
                },
            ],
        })

        for await (const event of stream) {
            if (event.type === "content_block_delta") {
                if (event.delta.type === "text_delta") {
                    send("text", { text: event.delta.text })
                } else if (event.delta.type === "citations_delta") {
                    const cit = event.delta.citation
                    const chunk = chunks[cit.document_index]
                    send("citation", {
                        documentId: chunk.documentId,
                        documentTitle: chunk.documentTitle,
                        url: chunk.url,
                        citedText: cit.cited_text,
                        startChar: cit.start_char_index ?? 0,
                        endChar: cit.end_char_index ?? 0,
                    })
                }
            }
        }

        send("done", {})
    } catch (err) {
        const message = err instanceof Error ? err.message : "Unknown error"
        send("error", { message })
    } finally {
        res.end()
    }
}

The gotcha here: citation deltas arrive between text deltas, attached to whatever text block is currently streaming. You don't get them as a final batch — you have to associate each citation with the text block it belongs to as it arrives. Track the current block index on the client and append.

The React UI

The frontend has three jobs: render text as it streams, attach citation markers to the right spans, and pop a sidebar when a marker is clicked. I'm keeping it framework-free (no AI SDK) so you can drop it anywhere.

// src/components/RagChat.tsx
import { useState, useRef } from "react"

type Source = { documentId: string; documentTitle: string; url: string }
type Citation = Source & { citedText: string; startChar: number; endChar: number }
type Segment = { text: string; citations: Citation[] }

export function RagChat() {
    const [segments, setSegments] = useState<Segment[]>([])
    const [sources, setSources] = useState<Source[]>([])
    const [active, setActive] = useState<Citation | null>(null)
    const [loading, setLoading] = useState(false)
    const inputRef = useRef<HTMLInputElement>(null)

    async function ask(question: string) {
        setSegments([{ text: "", citations: [] }])
        setSources([])
        setLoading(true)

        const res = await fetch("/api/chat", {
            method: "POST",
            headers: { "Content-Type": "application/json" },
            body: JSON.stringify({ question }),
        })
        if (!res.ok || !res.body) {
            setLoading(false)
            return
        }

        const reader = res.body.getReader()
        const decoder = new TextDecoder()
        let buf = ""
        let currentSegmentText = ""
        let currentCitations: Citation[] = []

        const commitSegment = () => {
            if (!currentSegmentText && currentCitations.length === 0) return
            setSegments((prev) => {
                const next = [...prev]
                next[next.length - 1] = {
                    text: currentSegmentText,
                    citations: currentCitations,
                }
                return next
            })
        }

        while (true) {
            const { done, value } = await reader.read()
            if (done) break
            buf += decoder.decode(value, { stream: true })
            const events = buf.split("\n\n")
            buf = events.pop() ?? ""

            for (const raw of events) {
                const evMatch = raw.match(/^event: (\w+)/m)
                const dataMatch = raw.match(/^data: (.+)$/m)
                if (!evMatch || !dataMatch) continue
                const eventName = evMatch[1]
                const data = JSON.parse(dataMatch[1])

                if (eventName === "sources") {
                    setSources(data)
                } else if (eventName === "text") {
                    currentSegmentText += data.text
                    setSegments((prev) => {
                        const next = [...prev]
                        next[next.length - 1] = {
                            text: currentSegmentText,
                            citations: currentCitations,
                        }
                        return next
                    })
                } else if (eventName === "citation") {
                    currentCitations.push(data)
                    commitSegment()
                    currentSegmentText = ""
                    currentCitations = []
                    setSegments((prev) => [...prev, { text: "", citations: [] }])
                } else if (eventName === "done") {
                    commitSegment()
                    setLoading(false)
                } else if (eventName === "error") {
                    setLoading(false)
                }
            }
        }
    }

    return (
        <div className="flex gap-6">
            <div className="flex-1">
                <form
                    onSubmit={(e) => {
                        e.preventDefault()
                        const q = inputRef.current?.value.trim()
                        if (q) ask(q)
                    }}
                >
                    <input
                        ref={inputRef}
                        className="w-full border rounded px-3 py-2"
                        placeholder="Ask a question..."
                        disabled={loading}
                    />
                </form>

                <div className="mt-6 leading-relaxed">
                    {segments.map((seg, i) => (
                        <span key={i}>
                            {seg.text}
                            {seg.citations.map((cit, j) => (
                                <button
                                    key={j}
                                    onClick={() => setActive(cit)}
                                    className="ml-1 text-xs bg-blue-100 hover:bg-blue-200 text-blue-800 rounded px-1.5 py-0.5 align-super"
                                    title={cit.documentTitle}
                                >
                                    {cit.documentTitle.slice(0, 12)}
                                </button>
                            ))}
                        </span>
                    ))}
                </div>
            </div>

            {active && (
                <aside className="w-80 border-l pl-6">
                    <button onClick={() => setActive(null)} className="text-sm text-gray-500">
                        Close
                    </button>
                    <h3 className="font-semibold mt-2">{active.documentTitle}</h3>
                    <a
                        href={`${active.url}#:~:text=${encodeURIComponent(active.citedText.slice(0, 60))}`}
                        target="_blank"
                        rel="noreferrer"
                        className="text-blue-600 text-sm underline"
                    >
                        Open source
                    </a>
                    <blockquote className="mt-3 text-sm border-l-4 border-blue-400 pl-3 text-gray-700">
                        {active.citedText}
                    </blockquote>
                </aside>
            )}
        </div>
    )
}

A few details that matter.

Each text+citation pair becomes its own segment, so cited and uncited prose render correctly side by side. When a citation arrives, we commit the current segment and start a new one. This keeps the markers anchored to the right text instead of all clustering at the end.

The "Open source" link uses a Text Fragment URL (#:~:text=...). In Chrome and Edge this scrolls the linked page directly to the cited passage and highlights it — no extra work needed if your sources are HTML. For PDFs you'd swap in a viewer that accepts a page and highlight query param.

I trim the citation marker label to 12 chars. With longer document titles the inline buttons start shoving the text around. Pick a label scheme that fits your docs — initials, short codes, or even just numbers if you keep a legend somewhere.

Detecting hallucinations with citation coverage

A useful side effect of structured citations: any sentence in the response without at least one citation is a hallucination candidate. Not always a hallucination — sometimes Claude writes a connector sentence that shouldn't be cited — but it's a strong signal worth surfacing.

// src/lib/citation-coverage.ts
import type { CitedSpan } from "./rag-with-citations"

export type CoverageReport = {
    coverage: number
    uncitedSentences: string[]
    flagged: boolean
}

export function citationCoverage(segments: CitedSpan[]): CoverageReport {
    const fullText = segments.map((s) => s.text).join("")
    const sentences = fullText
        .split(/(?<=[.!?])\s+/)
        .map((s) => s.trim())
        .filter((s) => s.length > 0)

    const citedRanges: Array<[number, number]> = []
    let cursor = 0
    for (const seg of segments) {
        if (seg.citations.length > 0) {
            citedRanges.push([cursor, cursor + seg.text.length])
        }
        cursor += seg.text.length
    }

    const uncitedSentences: string[] = []
    let sentenceStart = 0
    for (const sentence of sentences) {
        const sentenceEnd = sentenceStart + sentence.length
        const overlaps = citedRanges.some(
            ([start, end]) => start < sentenceEnd && end > sentenceStart
        )
        if (!overlaps && sentence.split(/\s+/).length > 4) {
            uncitedSentences.push(sentence)
        }
        sentenceStart = fullText.indexOf(sentence, sentenceStart) + sentence.length
    }

    const coverage = (sentences.length - uncitedSentences.length) / sentences.length
    return {
        coverage,
        uncitedSentences,
        flagged: coverage < 0.5,
    }
}

I skip sentences shorter than five words because connectors like "Here's how:" or "In summary," shouldn't count against coverage. The 50% threshold is what worked for me — your docs and prompt style will shift this. Run it on a representative sample and pick a number where false positives feel acceptable.

In production I log this metric per response. When coverage drops below the threshold, the answer goes into a queue for review. After two weeks of data you can tune the system prompt or upgrade retrieval where it consistently fails. This is far cheaper than running a separate eval LLM as a fact-checker.

Gotchas I ran into

The Citations API only works when you pass documents as document content blocks. If you stuff retrieved text into the user message body the way most basic RAG tutorials do, you get text back with no citation metadata. The whole feature hinges on this structural change.

Token usage goes up. Claude bills the document content like any other input, but adds some overhead for the citation indexing. Budget about 10–15% more input tokens than your old prompt-stuffed version. With prompt caching enabled on the documents it usually nets out cheaper, since you cache the same chunks across follow-up turns in a conversation.

Streaming citation deltas can arrive after the done content_block_stop event for the text block they belong to. The order is "text streams in → text block ends → citations attached to that block stream in → next text block begins." Don't assume a citation always arrives mid-stream; design for the late-arrival case.

Citations on tool use is not supported yet. If your RAG pipeline is agentic and the model decides to retrieve more docs mid-conversation via a tool, those tool results don't carry citations. You can re-pass them as document blocks on the next turn to recover citations, but it's a manual stitch.

What's next

Citations make a chatbot feel honest, but they don't make it accurate — bad retrieval still produces bad citations. The natural next post is Build a Voice-Enabled AI Assistant in the Browser with TypeScript (topic #34 in the backlog), which adds a voice interface to a chatbot like this one using the Web Speech API. Same architecture, different I/O modality, same need to surface sources clearly when the model speaks an answer aloud.

If you want to dig deeper into RAG quality first, the agentic RAG pipeline post pairs well with this one — combine retrieval-quality evaluation with citation coverage and you have a chatbot that knows when it's wrong.

Share:
VA

Vadim Alakhverdov

Software developer writing about JavaScript, web development, and developer tools.

Related Posts