Structured Output with Zod: Force Any LLM to Return Typed JSON

Wednesday 18/03/2026

·10 min read
Share:

You're calling an LLM and you need a JSON object back — maybe a list of extracted products, a structured form response, or categorized feedback. So you add "respond in JSON" to your prompt and hope for the best. Sometimes it works. Sometimes you get markdown-wrapped JSON. Sometimes you get a friendly paragraph explaining the JSON it would have returned. And now you're writing regex to strip code fences and try/catch blocks around JSON.parse and wondering why you became a developer.

Structured output with Zod schemas solves this properly. Instead of hoping the model returns valid JSON, you define the shape you want as a Zod schema, and the LLM is constrained to match it. You get TypeScript types inferred automatically, runtime validation built in, and no more parsing roulette. Here's how to do it across Claude, OpenAI, and Vercel AI SDK.

Why Zod and not just JSON Schema

You could hand-write JSON Schema and pass it to the API. But Zod gives you three things JSON Schema doesn't:

  1. TypeScript types are inferredz.infer<typeof MySchema> gives you the type for free. No maintaining a schema AND a type.
  2. Runtime validation — even if the model returns garbage, schema.safeParse() catches it with detailed error messages.
  3. Composable — build complex schemas from smaller ones. Reuse them across endpoints, validation layers, and LLM calls.

Every major AI SDK now accepts Zod schemas natively. It's the de facto standard for structured output in TypeScript.

Setting up the project

mkdir structured-output-demo && cd structured-output-demo
pnpm init
pnpm add zod @anthropic-ai/sdk openai ai @ai-sdk/anthropic @ai-sdk/openai
pnpm add -D typescript @types/node tsx
// tsconfig.json
{
    "compilerOptions": {
        "target": "ES2022",
        "module": "ESNext",
        "moduleResolution": "bundler",
        "strict": true,
        "esModuleInterop": true,
        "outDir": "dist"
    },
    "include": ["src"]
}

The use case: extracting product data from descriptions

Let's say you're building a tool that takes messy, unstructured product descriptions from suppliers and turns them into clean, typed product objects. Here's the Zod schema:

// src/schemas.ts
import { z } from 'zod'

export const DimensionsSchema = z.object({
    width: z.number().describe('Width in centimeters'),
    height: z.number().describe('Height in centimeters'),
    depth: z.number().describe('Depth in centimeters'),
    weight: z.number().optional().describe('Weight in grams, if mentioned'),
})

export const ProductSchema = z.object({
    name: z.string().describe('Clean product name without marketing fluff'),
    category: z
        .enum(['electronics', 'clothing', 'home', 'sports', 'other'])
        .describe('Product category'),
    price: z.object({
        amount: z.number().describe('Price as a number'),
        currency: z.enum(['USD', 'EUR', 'GBP']).describe('Currency code'),
    }),
    features: z
        .array(z.string())
        .min(1)
        .max(10)
        .describe('Key product features as short bullet points'),
    dimensions: DimensionsSchema.optional().describe('Physical dimensions if mentioned'),
    inStock: z.boolean().describe('Whether the product appears to be available'),
})

export type Product = z.infer<typeof ProductSchema>

Notice the .describe() calls — these become part of the schema description the LLM sees, guiding it on what to put where. They're not just for documentation.

Method 1: Claude (Anthropic SDK) with tool use

Claude doesn't have a native "structured output" mode like OpenAI, but you can achieve the same thing using tool use. Define a tool whose input schema matches your Zod schema, and Claude will return structured data as the tool call arguments.

// src/claude-structured.ts
import Anthropic from '@anthropic-ai/sdk'
import { zodToJsonSchema } from 'zod-to-json-schema'
import { ProductSchema, type Product } from './schemas.js'

const client = new Anthropic()

export async function extractProductClaude(description: string): Promise<Product> {
    const jsonSchema = zodToJsonSchema(ProductSchema, {
        target: 'openApi3',
    })

    const response = await client.messages.create({
        model: 'claude-sonnet-4-20250514',
        max_tokens: 1024,
        tools: [
            {
                name: 'extract_product',
                description:
                    'Extract structured product data from an unstructured description',
                input_schema: jsonSchema as Anthropic.Tool['input_schema'],
            },
        ],
        tool_choice: { type: 'tool', name: 'extract_product' },
        messages: [
            {
                role: 'user',
                content: `Extract structured product data from this description:\n\n${description}`,
            },
        ],
    })

    const toolBlock = response.content.find(
        (block): block is Anthropic.ToolUseBlock => block.type === 'tool_use'
    )

    if (!toolBlock) {
        throw new Error('Claude did not return a tool call')
    }

    // Validate with Zod — don't trust the LLM blindly
    const parsed = ProductSchema.safeParse(toolBlock.input)
    if (!parsed.success) {
        throw new Error(
            `Invalid product data from Claude: ${parsed.error.issues.map((i) => i.message).join(', ')}`
        )
    }

    return parsed.data
}

You need zod-to-json-schema to convert the Zod schema to JSON Schema for Claude's API:

pnpm add zod-to-json-schema

The key trick: tool_choice: { type: 'tool', name: 'extract_product' } forces Claude to use the tool, so you always get structured output instead of a text response.

Method 2: OpenAI with response_format

OpenAI has native structured output support via response_format. It constrains the model's output at the token level, so you're guaranteed valid JSON matching your schema.

// src/openai-structured.ts
import OpenAI from 'openai'
import { zodResponseFormat } from 'openai/helpers/zod'
import { ProductSchema, type Product } from './schemas.js'

const client = new OpenAI()

export async function extractProductOpenAI(description: string): Promise<Product> {
    const response = await client.chat.completions.create({
        model: 'gpt-4o',
        messages: [
            {
                role: 'system',
                content: 'Extract structured product data from unstructured descriptions.',
            },
            {
                role: 'user',
                content: description,
            },
        ],
        response_format: zodResponseFormat(ProductSchema, 'product'),
    })

    const content = response.choices[0].message.content
    if (!content) {
        throw new Error('OpenAI returned no content')
    }

    // Still validate — defense in depth
    const parsed = ProductSchema.safeParse(JSON.parse(content))
    if (!parsed.success) {
        throw new Error(
            `Invalid product data from OpenAI: ${parsed.error.issues.map((i) => i.message).join(', ')}`
        )
    }

    return parsed.data
}

OpenAI's zodResponseFormat helper (built into the openai package) handles the Zod-to-JSON-Schema conversion for you. One gotcha: OpenAI's structured output doesn't support all Zod features. z.union(), z.record(), and z.transform() will throw errors. Stick to objects, arrays, enums, and primitives.

Method 3: Vercel AI SDK (provider-agnostic)

If you want one API that works across providers, Vercel AI SDK's generateObject is the cleanest option. It handles the schema conversion and validation internally.

// src/ai-sdk-structured.ts
import { generateObject } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
import { openai } from '@ai-sdk/openai'
import { ProductSchema, type Product } from './schemas.js'

export async function extractProductAISdk(
    description: string,
    provider: 'claude' | 'openai' = 'claude'
): Promise<Product> {
    const model =
        provider === 'claude'
            ? anthropic('claude-sonnet-4-20250514')
            : openai('gpt-4o')

    const { object } = await generateObject({
        model,
        schema: ProductSchema,
        prompt: `Extract structured product data from this description:\n\n${description}`,
    })

    // generateObject already validates against the schema
    // but the return type is inferred from ProductSchema automatically
    return object
}

That's it. generateObject returns a fully typed object matching your Zod schema. Under the hood, it uses tool calling for Claude and structured output for OpenAI. The object property is already typed as Product — no casting needed.

Handling nested and complex schemas

Real-world data isn't flat. Here's a more complex schema for extracting a full product listing with variants:

// src/schemas-advanced.ts
import { z } from 'zod'

const VariantSchema = z.object({
    sku: z.string().describe('Stock keeping unit identifier'),
    color: z.string().optional(),
    size: z.string().optional(),
    priceModifier: z
        .number()
        .describe('Price difference from base price, can be negative'),
    available: z.boolean(),
})

export const ProductListingSchema = z.object({
    product: z.object({
        name: z.string(),
        brand: z.string().describe('Brand name, or "Unknown" if not mentioned'),
        description: z
            .string()
            .max(200)
            .describe('Clean product description in 1-2 sentences'),
        category: z.enum([
            'electronics',
            'clothing',
            'home',
            'sports',
            'beauty',
            'food',
            'other',
        ]),
    }),
    pricing: z.object({
        basePrice: z.number(),
        currency: z.enum(['USD', 'EUR', 'GBP']),
        onSale: z.boolean(),
        originalPrice: z.number().optional().describe('Only if the item is on sale'),
    }),
    variants: z.array(VariantSchema).describe('Product variants like sizes and colors'),
    tags: z.array(z.string()).max(5).describe('Search tags for this product'),
    confidence: z
        .number()
        .min(0)
        .max(1)
        .describe('How confident you are in the extraction accuracy, 0-1'),
})

export type ProductListing = z.infer<typeof ProductListingSchema>

The confidence field is a neat trick — ask the LLM to self-report how confident it is in the extraction. You can use this downstream to flag low-confidence results for human review.

Error recovery: what to do when validation fails

Even with structured output, things can go wrong — especially with edge-case inputs. Here's a retry pattern with error feedback:

// src/extract-with-retry.ts
import { generateObject } from 'ai'
import { anthropic } from '@ai-sdk/anthropic'
import { z } from 'zod'
import { ProductSchema, type Product } from './schemas.js'

export async function extractWithRetry(
    description: string,
    maxRetries: number = 2
): Promise<Product> {
    let lastError: string | undefined

    for (let attempt = 0; attempt <= maxRetries; attempt++) {
        try {
            const prompt = lastError
                ? `Extract structured product data from this description. A previous attempt failed validation with this error: "${lastError}". Please fix the issue.\n\n${description}`
                : `Extract structured product data from this description:\n\n${description}`

            const { object } = await generateObject({
                model: anthropic('claude-sonnet-4-20250514'),
                schema: ProductSchema,
                prompt,
            })

            return object
        } catch (error) {
            if (attempt === maxRetries) {
                throw new Error(
                    `Failed to extract product after ${maxRetries + 1} attempts: ${error}`
                )
            }
            lastError = error instanceof Error ? error.message : String(error)
        }
    }

    // TypeScript needs this, but we'll never reach it
    throw new Error('Unreachable')
}

Feeding the validation error back to the LLM on retry is much more effective than just retrying blindly. The model sees what went wrong and corrects it.

Gotchas and things that will bite you

1. .describe() matters more than you think. Without descriptions, the LLM guesses what each field means from the name alone. price is obvious; priceModifier is not. Add descriptions to anything ambiguous.

2. Enums are your friend. Use z.enum() instead of z.string() whenever the set of values is known. It constrains the output and prevents creative interpretations like "Electronics & Gadgets" when you wanted "electronics".

3. Optional fields need clear guidance. If a field is optional, describe when it should be present vs. absent. Otherwise the model might always include it with a made-up value.

4. OpenAI's structured output has schema restrictions. No z.union(), no z.record(), no z.transform(), no z.refine(). If you need these, use Claude's tool-use approach or Vercel AI SDK which handles the workarounds for you.

5. Always validate, even with guaranteed structured output. OpenAI guarantees schema-valid JSON at the token level, but that doesn't mean the content is correct. A price of 0 is valid JSON but probably wrong. Add .min(0.01) or custom refinements and validate after parsing.

6. Watch your token usage. Complex schemas with many descriptions increase input token count. For high-volume use cases, benchmark with and without descriptions — sometimes shorter field names with good system prompts work better.

Putting it all together

// src/main.ts
import { extractProductAISdk } from './ai-sdk-structured.js'

const messyDescription = `
    🔥 SALE! Brand new Samsung Galaxy Buds3 Pro — premium noise-cancelling
    wireless earbuds. Was $249.99, now just $179.99!! Available in Graphite
    and White. Amazing 360 Audio with head tracking. IP57 water resistant.
    24hr battery with case. USB-C charging. Ships free! Almost sold out
    in White color. Dimensions: 1.8 x 2.1 x 1.7 cm, weighs about 5.4g
    per earbud.
`

async function main() {
    const product = await extractProductAISdk(messyDescription)

    console.log('Extracted product:')
    console.log(JSON.stringify(product, null, 2))
    // {
    //   name: "Samsung Galaxy Buds3 Pro",
    //   category: "electronics",
    //   price: { amount: 179.99, currency: "USD" },
    //   features: [
    //     "Premium noise cancelling",
    //     "360 Audio with head tracking",
    //     "IP57 water resistant",
    //     "24-hour battery with case",
    //     "USB-C charging",
    //     "Free shipping"
    //   ],
    //   dimensions: { width: 1.8, height: 2.1, depth: 1.7, weight: 5.4 },
    //   inStock: true
    // }

    // Full TypeScript support — product is typed as Product
    console.log(`${product.name}: $${product.price.amount} ${product.price.currency}`)
}

main().catch(console.error)

Run it with npx tsx src/main.ts and you get clean, typed JSON from a messy supplier listing. Every time.

What's next

Once you've got structured output working, the natural next step is building agents that act on that data. Check out how to build a human-in-the-loop AI agent with Vercel AI SDK — where the structured output from your extraction step feeds into an approval workflow before the agent takes action.

Share:
VA

Vadim Alakhverdov

Software developer writing about JavaScript, web development, and developer tools.

Related Posts