Build a Text-to-SQL Feature: Let Users Query Your Database in Plain English

Wednesday 20/05/2026

·13 min read
Share:

Your support team wants to know "how many trial users converted last month broken down by acquisition channel" and every time the answer requires pinging an engineer. Building a text to SQL TypeScript feature sounds like a weekend project until you remember a wrong prompt can drop a table. This post walks through a Next.js implementation that pairs LLM-generated SQL with a sandboxed Postgres role, Zod-validated output, an EXPLAIN-before-execute gate, and a self-correction loop that recovers from syntax errors automatically.

I'm not going to pretend this is solved. The naive version — send schema to GPT, run whatever it returns — is the kind of feature you ship on Friday and pull on Monday. The version below is the one I've actually deployed: it refuses to run anything that isn't a SELECT, it times out runaway queries, it surfaces ambiguity instead of guessing, and it streams results to the client as they arrive.

The architecture

Four layers, each doing one thing:

  1. Schema describer — generates a token-efficient description of the tables the LLM is allowed to see.
  2. SQL generator — calls the LLM with the user question + schema, returns a Zod-validated { sql, explanation, needs_clarification } object.
  3. Safety gate — parses the SQL, refuses non-SELECT statements, runs EXPLAIN to catch invalid references and estimate cost, then executes against a read-only role with a statement timeout.
  4. Self-correction loop — if execution fails, feeds the error back to the LLM up to N times.

If you've read my earlier post on structured output with Zod, this stacks on top of it. The trick is what you put in the schema and how the safety gate is positioned.

Step 1: Create a sandboxed Postgres role

Do this first. Not later. The LLM doesn't get to authenticate as your app user.

-- migrations/001_text_to_sql_role.sql
CREATE ROLE ai_query LOGIN PASSWORD 'rotate-me';

-- Read-only access to a curated set of tables
REVOKE ALL ON ALL TABLES IN SCHEMA public FROM ai_query;
GRANT USAGE ON SCHEMA public TO ai_query;
GRANT SELECT ON public.customers, public.orders, public.products TO ai_query;

-- Hard timeout so a Cartesian join can't hang the box
ALTER ROLE ai_query SET statement_timeout = '5s';
ALTER ROLE ai_query SET lock_timeout = '1s';
ALTER ROLE ai_query SET idle_in_transaction_session_timeout = '10s';

-- Block expensive operations even on SELECT
ALTER ROLE ai_query SET default_transaction_read_only = on;

The statement_timeout is the load-bearing piece. Without it, a SELECT * against your events table with a bad join condition will pin a Postgres backend for hours. With it, the worst case is five seconds of CPU and an error message.

Connect via a dedicated pool so the connection string with this role never leaks into your normal request path:

// src/lib/db-readonly.ts
import { Pool } from 'pg'

export const readOnlyPool = new Pool({
    connectionString: process.env.AI_QUERY_DATABASE_URL,
    max: 5,
    idleTimeoutMillis: 30_000,
    statement_timeout: 5_000,
})

Step 2: Describe the schema for the LLM

The LLM needs to know what tables and columns exist. Don't dump the entire information_schema — it's noisy and most of it is irrelevant. Maintain a hand-curated description:

// src/lib/schema-context.ts
export type ColumnDef = {
    name: string
    type: string
    description: string
    example?: string
}

export type TableDef = {
    name: string
    description: string
    columns: ColumnDef[]
}

export const SCHEMA: TableDef[] = [
    {
        name: 'customers',
        description: 'One row per signed-up customer. Excludes soft-deleted accounts.',
        columns: [
            { name: 'id', type: 'uuid', description: 'Primary key' },
            { name: 'email', type: 'text', description: 'Login email, lowercase' },
            {
                name: 'plan',
                type: 'text',
                description: 'Current plan',
                example: "one of: 'trial', 'starter', 'pro', 'enterprise'",
            },
            { name: 'signed_up_at', type: 'timestamptz', description: 'Account creation time, UTC' },
            {
                name: 'acquisition_channel',
                type: 'text',
                description: 'How they found us',
                example: "'organic', 'ads', 'referral', 'direct'",
            },
        ],
    },
    {
        name: 'orders',
        description: 'One row per completed order. Refunds are negative-amount rows.',
        columns: [
            { name: 'id', type: 'uuid', description: 'Primary key' },
            { name: 'customer_id', type: 'uuid', description: 'FK to customers.id' },
            { name: 'amount_cents', type: 'integer', description: 'Amount in cents, can be negative for refunds' },
            { name: 'currency', type: 'text', description: "ISO 4217, e.g. 'USD'" },
            { name: 'created_at', type: 'timestamptz', description: 'Order time, UTC' },
        ],
    },
]

export function formatSchemaForPrompt(): string {
    return SCHEMA.map((t) => {
        const cols = t.columns
            .map((c) => `  - ${c.name} (${c.type}): ${c.description}${c.example ? `. Examples: ${c.example}` : ''}`)
            .join('\n')
        return `TABLE ${t.name}\n  ${t.description}\n${cols}`
    }).join('\n\n')
}

The description fields are where most of the quality lives. "amount_cents can be negative for refunds" stops the LLM from confidently writing SUM(amount_cents) for revenue and being off by however much refund volume you have. Spend the half-hour to write these properly. For larger schemas (50+ tables), you'd want to retrieve only the relevant tables for each query instead of stuffing everything into the prompt — the same embeddings + semantic search pattern from AI search with embeddings and Supabase works equally well over a schema catalog.

Step 3: Generate SQL with Zod-validated output

// src/lib/generate-sql.ts
import Anthropic from '@anthropic-ai/sdk'
import { z } from 'zod'
import { formatSchemaForPrompt } from './schema-context'

const client = new Anthropic()

export const SqlResponseSchema = z.object({
    needs_clarification: z.boolean(),
    clarification_question: z.string().nullable(),
    sql: z.string().nullable(),
    explanation: z.string(),
})

export type SqlResponse = z.infer<typeof SqlResponseSchema>

const SYSTEM_PROMPT = `You translate natural-language questions into a single PostgreSQL SELECT statement.

Rules you must follow:
- Output a single SELECT statement only. Never INSERT, UPDATE, DELETE, DDL, or transaction commands.
- If the question is ambiguous (e.g. unspecified time range, "top customers" without a metric), set needs_clarification=true and ask one specific question instead of guessing.
- Use only the tables and columns listed in the schema. Do not invent columns.
- Always include explicit LIMIT (default 100) unless the user asks for an aggregate.
- For date ranges with no year specified, assume the current year.
- Return amounts in dollars (amount_cents / 100.0) for human-readable results.
- If aggregating across currencies, only include rows where currency='USD' unless the user asks otherwise.

Schema:
${formatSchemaForPrompt()}

Respond as JSON matching: { needs_clarification, clarification_question, sql, explanation }`

export async function generateSql(question: string, priorError?: string): Promise<SqlResponse> {
    const userContent = priorError
        ? `Question: ${question}\n\nYour previous SQL failed with: ${priorError}\n\nReturn corrected SQL.`
        : `Question: ${question}`

    const response = await client.messages.create({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        system: SYSTEM_PROMPT,
        messages: [{ role: 'user', content: userContent }],
    })

    const text = response.content
        .filter((block): block is Anthropic.TextBlock => block.type === 'text')
        .map((b) => b.text)
        .join('')

    const jsonMatch = text.match(/\{[\s\S]*\}/)
    if (!jsonMatch) {
        throw new Error('LLM did not return JSON')
    }

    const parsed = SqlResponseSchema.safeParse(JSON.parse(jsonMatch[0]))
    if (!parsed.success) {
        throw new Error(`Invalid response shape: ${parsed.error.message}`)
    }

    return parsed.data
}

A few decisions worth flagging. I'm using a JSON parse off the text block rather than tool use — for a single-shot structured output this is simpler and the failure mode is the same (you re-prompt). If this were a multi-step agent that needed to look up additional data before generating SQL, I'd use the tool-use loop pattern from Build a Multi-Step AI Agent with Tool Use in TypeScript instead. The system prompt is doing real work: the needs_clarification branch is what stops the LLM from confidently inventing "last month" when the user asked "how were sales recently."

Step 4: The safety gate

Even with a read-only role, you want a second layer before the SQL touches the database. The LLM is allowed to be wrong; it is not allowed to send DROP TABLE and let Postgres reject it for you.

// src/lib/sql-gate.ts
import { Parser } from 'node-sql-parser'
import { readOnlyPool } from './db-readonly'

const parser = new Parser()
const ALLOWED_TABLES = new Set(['customers', 'orders', 'products'])

export type GateResult =
    | { ok: true; rowEstimate: number; cost: number }
    | { ok: false; reason: string }

export async function gateSql(sql: string): Promise<GateResult> {
    let ast
    try {
        ast = parser.astify(sql, { database: 'postgresql' })
    } catch (e) {
        return { ok: false, reason: `Unparseable SQL: ${(e as Error).message}` }
    }

    const statements = Array.isArray(ast) ? ast : [ast]
    if (statements.length !== 1) {
        return { ok: false, reason: 'Multiple statements not allowed' }
    }

    const stmt = statements[0]
    if (stmt.type !== 'select') {
        return { ok: false, reason: `Only SELECT allowed, got ${stmt.type}` }
    }

    const tables = parser.tableList(sql, { database: 'postgresql' })
    for (const t of tables) {
        const [mode, , name] = t.split('::')
        if (mode !== 'select') {
            return { ok: false, reason: `Disallowed access mode: ${mode}` }
        }
        if (!ALLOWED_TABLES.has(name)) {
            return { ok: false, reason: `Table not allowed: ${name}` }
        }
    }

    try {
        const explainRes = await readOnlyPool.query(`EXPLAIN (FORMAT JSON) ${sql}`)
        const plan = explainRes.rows[0]['QUERY PLAN'][0].Plan
        const rowEstimate = plan['Plan Rows'] as number
        const cost = plan['Total Cost'] as number

        if (rowEstimate > 100_000) {
            return { ok: false, reason: `Estimated ${rowEstimate} rows — too large` }
        }
        if (cost > 100_000) {
            return { ok: false, reason: `Query cost estimate ${cost.toFixed(0)} too high` }
        }

        return { ok: true, rowEstimate, cost }
    } catch (e) {
        return { ok: false, reason: `EXPLAIN failed: ${(e as Error).message}` }
    }
}

node-sql-parser catches non-SELECT statements before they ever reach Postgres. The EXPLAIN step does double duty: it validates that columns and tables actually exist (Postgres returns an error for unknown identifiers during planning), and it gives you the planner's row estimate so you can refuse a query that would scan the whole orders table. The row and cost thresholds are tunable — start conservative and loosen them once you have telemetry on real usage.

Step 5: Self-correction loop

The LLM will write SQL that EXPLAIN rejects. Most often it's hallucinated a column name or used INTERVAL '1 month' syntax slightly wrong. Don't fail the user request — feed the error back and let it try again.

// src/lib/text-to-sql.ts
import { generateSql, SqlResponse } from './generate-sql'
import { gateSql } from './sql-gate'
import { readOnlyPool } from './db-readonly'

const MAX_ATTEMPTS = 3

export type TextToSqlResult =
    | { kind: 'clarify'; question: string }
    | { kind: 'success'; sql: string; explanation: string; rows: unknown[] }
    | { kind: 'error'; reason: string; lastSql: string | null }

export async function textToSql(question: string): Promise<TextToSqlResult> {
    let priorError: string | undefined
    let lastSql: string | null = null

    for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
        const generated: SqlResponse = await generateSql(question, priorError)

        if (generated.needs_clarification && generated.clarification_question) {
            return { kind: 'clarify', question: generated.clarification_question }
        }
        if (!generated.sql) {
            priorError = 'You returned no SQL and no clarification question.'
            continue
        }

        lastSql = generated.sql
        const gate = await gateSql(generated.sql)
        if (!gate.ok) {
            priorError = gate.reason
            continue
        }

        try {
            const result = await readOnlyPool.query(generated.sql)
            return {
                kind: 'success',
                sql: generated.sql,
                explanation: generated.explanation,
                rows: result.rows,
            }
        } catch (e) {
            priorError = (e as Error).message
        }
    }

    return { kind: 'error', reason: priorError ?? 'Unknown', lastSql }
}

Three attempts is what I've landed on. One is too few (transient hallucinations dominate), five wastes tokens on questions the LLM genuinely can't answer with the schema it has. If you hit attempt 3 with errors, surface that to the user — don't pretend success. The same retry-vs-give-up tradeoff I covered in handling AI API rate limits and errors applies here: at some point a fresh error means the question is bad, not the model.

Step 6: The Next.js route and streaming UI

The query itself is fast enough to return synchronously, but the LLM call adds 2-4 seconds. Stream the stages so the UI can show progress:

// src/pages/api/query.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import { textToSql } from '@/src/lib/text-to-sql'

export const config = { api: { responseLimit: false } }

export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    if (req.method !== 'POST') return res.status(405).end()

    const { question } = req.body as { question?: string }
    if (!question || question.length > 500) {
        return res.status(400).json({ error: 'Question must be 1-500 chars' })
    }

    res.setHeader('Content-Type', 'text/event-stream')
    res.setHeader('Cache-Control', 'no-cache, no-transform')
    res.setHeader('Connection', 'keep-alive')

    const send = (event: string, data: unknown) => {
        res.write(`event: ${event}\ndata: ${JSON.stringify(data)}\n\n`)
    }

    send('status', { stage: 'thinking' })

    try {
        const result = await textToSql(question)

        if (result.kind === 'clarify') {
            send('clarify', { question: result.question })
        } else if (result.kind === 'success') {
            send('sql', { sql: result.sql, explanation: result.explanation })
            send('rows', { rows: result.rows })
        } else {
            send('error', { reason: result.reason, lastSql: result.lastSql })
        }
    } catch (e) {
        send('error', { reason: (e as Error).message, lastSql: null })
    } finally {
        send('done', {})
        res.end()
    }
}

The client component is straightforward — consume the SSE stream, render a clarification prompt if clarify arrives, otherwise render the explanation, the SQL in a collapsible <details>, and the rows in a table. The SSE plumbing here is the same pattern I walked through in streaming Claude API responses in Next.js, just emitting structured stage events instead of raw tokens. Showing the generated SQL is non-negotiable: technical users will spot a wrong join immediately, and even non-technical users gain trust from seeing the machinery.

Handling ambiguity instead of guessing

The single biggest quality win in this whole pipeline is the needs_clarification field. A user asks "who are our top customers?" Top by what? Revenue, order count, recency? An LLM left to its own devices will pick one and confidently produce a wrong answer. With the clarification branch, the UI shows "Top by total revenue, order count, or most recent purchase?" and the user resolves it.

The system prompt asks the LLM to prefer clarification over guessing for any underspecified question. In practice this catches: missing time ranges, ambiguous metrics, undefined groupings ("by region" when there's no region column), and questions that reference data outside the allowed schema.

What can still go wrong

A few honest caveats from running this in production:

  • Cost-bombing. Even with EXPLAIN gates, a determined user can write 500 queries in a minute. Wrap the endpoint in per-user rate limiting.
  • Schema drift. The hand-written schema description must stay in sync with migrations. Add a check in CI that introspects information_schema and diffs against your SCHEMA constant.
  • Multi-tenant data. If your tables have a tenant_id column, the LLM will forget to filter on it. Wrap every query in a transaction that sets a tenant GUC, and use Postgres row-level security to enforce it. Don't trust the LLM to add the WHERE tenant_id = $1.
  • Cached EXPLAIN. Stats can go stale and the cost estimate becomes meaningless. Run ANALYZE on a schedule.

What's next

Once you trust the pipeline enough to expose it to users, the next thing they ask for is charts. The natural follow-up is generative UI with Vercel AI SDK — instead of returning a JSON table, have the LLM stream a <BarChart> component with the right axis and series configuration based on the query results. Same self-correction loop, same safety gate, but the output is a rendered visualization rather than a row dump.

Share:
VA

Vadim Alakhverdov

Software developer writing about JavaScript, web development, and developer tools.

Related Posts