Add AI Image Generation to Your Next.js App with Replicate, Fal, and Cloudflare R2

Wednesday 27/05/2026

·14 min read

Most "add AI image generation to Next.js" tutorials stop at the API call. You paste a prompt, get a temporary URL back, dump it into an <img>, and call it done. Then you deploy, the provider's CDN expires the URL after an hour, your hero images 404 across your blog, and the on-call developer has a long Tuesday. Real image generation in production is a pipeline problem: prompt → model → durable storage → signed delivery → cost accounting.

This post walks through the whole pipeline in TypeScript, using Replicate or Fal for inference (Flux 1.1 Pro), Cloudflare R2 for storage, and Next.js as the host. We'll cover async webhooks (because image gen takes 10–30 seconds and an API route can't sit there waiting), prompt safety filtering, deduplication via content hashing, and per-image cost math at scale. Pairs naturally with handling AI API rate limits in production and the real cost of running an AI feature.

Why R2, and why webhooks

Two architectural choices set up everything else.

Cloudflare R2 over S3 because R2 has zero egress fees. You're going to be re-serving each generated image hundreds or thousands of times - to the requesting user, to other visitors, to OpenGraph crawlers, to RSS readers. On S3 at $0.09/GB egress, a single 1MB hero image served 100k times costs $9 in bandwidth. On R2 it costs $0. The storage rate ($0.015/GB-month) is also cheaper. There's no reason to use S3 for this use case in 2026.

Webhooks over polling because Flux 1.1 Pro takes 8–25 seconds per image. Next.js serverless functions on Vercel have a 60s hard limit on the Hobby plan and 300s on Pro, but you don't want to burn execution time waiting - every second of an Edge or serverless function holding open is billable, and your user's browser is sitting on a pending request. Webhooks decouple the "submit job" call (fast) from the "image is ready" event (whenever).

Install and set up

pnpm add replicate @fal-ai/client @aws-sdk/client-s3 @aws-sdk/s3-request-presigner zod
pnpm add -D @types/node

R2 uses the S3 API, which is why @aws-sdk/client-s3 is in the list. Pick replicate or @fal-ai/client based on the provider you're going with - I'll show both. Fal is generally faster (cold-start optimized) but Replicate has a broader model catalog. For Flux specifically, both are good; Fal usually wins on latency, Replicate usually wins on reliability under load.

Environment variables you'll need:

# .env.local
REPLICATE_API_TOKEN=r8_xxx
FAL_KEY=xxx:xxx
R2_ACCOUNT_ID=xxx
R2_ACCESS_KEY_ID=xxx
R2_SECRET_ACCESS_KEY=xxx
R2_BUCKET=ai-generated-images
WEBHOOK_SECRET=long_random_string_for_hmac
NEXT_PUBLIC_R2_PUBLIC_URL=https://images.yourdomain.com

The public URL is a custom domain you point at the R2 bucket via Cloudflare's "Connect Custom Domain" - necessary because the default r2.cloudflarestorage.com URLs aren't intended for direct browser serving.

The R2 client

R2 is S3-compatible, so the same SDK works. The only quirk is the endpoint format.

// src/lib/r2.ts
import { S3Client, PutObjectCommand, HeadObjectCommand } from '@aws-sdk/client-s3'
import { getSignedUrl } from '@aws-sdk/s3-request-presigner'

export const r2 = new S3Client({
    region: 'auto',
    endpoint: `https://${process.env.R2_ACCOUNT_ID}.r2.cloudflarestorage.com`,
    credentials: {
        accessKeyId: process.env.R2_ACCESS_KEY_ID!,
        secretAccessKey: process.env.R2_SECRET_ACCESS_KEY!,
    },
})

const BUCKET = process.env.R2_BUCKET!

export async function uploadImage(key: string, bytes: Uint8Array, contentType: string) {
    await r2.send(
        new PutObjectCommand({
            Bucket: BUCKET,
            Key: key,
            Body: bytes,
            ContentType: contentType,
            CacheControl: 'public, max-age=31536000, immutable',
        })
    )
    return `${process.env.NEXT_PUBLIC_R2_PUBLIC_URL}/${key}`
}

export async function objectExists(key: string): Promise<boolean> {
    try {
        await r2.send(new HeadObjectCommand({ Bucket: BUCKET, Key: key }))
        return true
    } catch (err) {
        if ((err as { name?: string }).name === 'NotFound') return false
        throw err
    }
}

export async function signedReadUrl(key: string, expiresInSeconds = 3600) {
    const cmd = new PutObjectCommand({ Bucket: BUCKET, Key: key })
    return getSignedUrl(r2, cmd, { expiresIn: expiresInSeconds })
}

CacheControl: immutable is important - once an image is generated and stored, it never changes (content-addressable storage), so Cloudflare and browsers can cache it forever. The key contains a content hash, so any edit produces a new key anyway.

Prompt safety: a thin filter that catches the common stuff

You don't want to ship a "generate any image from any prompt" feature without at least a basic filter. The provider APIs have their own safety classifiers but they fire late (after you've already paid for inference) and they're inconsistent. A pre-flight filter is cheap insurance.

// src/lib/safety.ts
const BLOCKED_TERMS = [
    // NSFW / violence - populate this list per your product policy
    'nude',
    'naked',
    'gore',
    // ... your blocklist
]

const SUSPICIOUS_INJECTION_MARKERS = [
    'ignore previous',
    'system prompt',
    '</prompt>',
]

export type SafetyResult =
    | { allowed: true; sanitized: string }
    | { allowed: false; reason: string }

export function checkPrompt(rawPrompt: string): SafetyResult {
    const prompt = rawPrompt.trim().slice(0, 1000)
    if (!prompt) return { allowed: false, reason: 'empty_prompt' }

    const lower = prompt.toLowerCase()
    for (const term of BLOCKED_TERMS) {
        if (lower.includes(term)) {
            return { allowed: false, reason: `blocked_term:${term}` }
        }
    }
    for (const marker of SUSPICIOUS_INJECTION_MARKERS) {
        if (lower.includes(marker)) {
            return { allowed: false, reason: 'prompt_injection_suspected' }
        }
    }
    return { allowed: true, sanitized: prompt }
}

This is intentionally simple. If your product needs serious content moderation, layer in an LLM-based classifier or a dedicated service like Microsoft Content Moderator. For most B2B SaaS use cases (blog headers, product mockups, internal avatars), a blocklist plus the provider's built-in classifier is enough.

Deduplicating via content hashing

If two users prompt for the exact same string with the same seed, you've already paid for that image. Don't pay again. A simple SHA-256 of (prompt + seed + model + dimensions) becomes both the dedup key and the R2 object key.

// src/lib/hash.ts
import crypto from 'node:crypto'

export type GenerationInput = {
    prompt: string
    seed: number
    model: string
    width: number
    height: number
}

export function imageKey(input: GenerationInput): string {
    const canonical = JSON.stringify({
        prompt: input.prompt.trim().toLowerCase(),
        seed: input.seed,
        model: input.model,
        width: input.width,
        height: input.height,
    })
    const hash = crypto.createHash('sha256').update(canonical).digest('hex')
    return `flux/${hash.slice(0, 2)}/${hash}.webp`
}

The slice(0, 2) prefix shards across 256 directories - not strictly necessary on R2 (object storage doesn't care) but it makes the bucket browsable in the Cloudflare UI without choking on a million sibling keys.

Before submitting a job, call objectExists(key). If it's there, return the URL immediately. Free image, zero latency.

Submitting a Replicate job with webhook

// src/app/api/generate/route.ts
import Replicate from 'replicate'
import { NextResponse } from 'next/server'
import { z } from 'zod'
import { checkPrompt } from '@/src/lib/safety'
import { imageKey } from '@/src/lib/hash'
import { objectExists } from '@/src/lib/r2'
import { savePendingJob } from '@/src/lib/jobs'

const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN! })

const RequestSchema = z.object({
    prompt: z.string().min(1).max(1000),
    seed: z.number().int().optional(),
    width: z.number().int().min(256).max(2048).default(1024),
    height: z.number().int().min(256).max(2048).default(1024),
})

export async function POST(req: Request) {
    const body = await req.json().catch(() => null)
    const parsed = RequestSchema.safeParse(body)
    if (!parsed.success) {
        return NextResponse.json({ error: 'invalid_input' }, { status: 400 })
    }

    const safety = checkPrompt(parsed.data.prompt)
    if (!safety.allowed) {
        return NextResponse.json({ error: safety.reason }, { status: 400 })
    }

    const seed = parsed.data.seed ?? Math.floor(Math.random() * 1_000_000)
    const key = imageKey({
        prompt: safety.sanitized,
        seed,
        model: 'black-forest-labs/flux-1.1-pro',
        width: parsed.data.width,
        height: parsed.data.height,
    })

    if (await objectExists(key)) {
        return NextResponse.json({
            status: 'ready',
            url: `${process.env.NEXT_PUBLIC_R2_PUBLIC_URL}/${key}`,
        })
    }

    const prediction = await replicate.predictions.create({
        model: 'black-forest-labs/flux-1.1-pro',
        input: {
            prompt: safety.sanitized,
            seed,
            width: parsed.data.width,
            height: parsed.data.height,
            output_format: 'webp',
            output_quality: 90,
        },
        webhook: `${process.env.NEXT_PUBLIC_R2_PUBLIC_URL.replace('images.', '')}/api/webhooks/replicate`,
        webhook_events_filter: ['completed'],
    })

    await savePendingJob({
        jobId: prediction.id,
        key,
        provider: 'replicate',
        createdAt: Date.now(),
    })

    return NextResponse.json({ status: 'pending', jobId: prediction.id, key })
}

savePendingJob writes to whatever store you want - Redis, Postgres, even a Map if you're prototyping. The webhook handler needs to read it back to know which R2 key to upload to.

Handling the Replicate webhook

The webhook gets called when the prediction completes. Replicate sends a Webhook-Signature header you should verify, then you fetch the temporary image URL Replicate returns, upload to R2, and mark the job as ready.

// src/app/api/webhooks/replicate/route.ts
import { NextResponse } from 'next/server'
import crypto from 'node:crypto'
import { uploadImage } from '@/src/lib/r2'
import { getPendingJob, markJobReady, markJobFailed } from '@/src/lib/jobs'

function verifyReplicateSignature(body: string, signatureHeader: string | null): boolean {
    if (!signatureHeader) return false
    const expected = crypto
        .createHmac('sha256', process.env.WEBHOOK_SECRET!)
        .update(body)
        .digest('hex')
    const provided = signatureHeader.replace(/^sha256=/, '')
    try {
        return crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(provided))
    } catch {
        return false
    }
}

type ReplicatePayload = {
    id: string
    status: 'succeeded' | 'failed' | 'canceled'
    output?: string | string[]
    error?: string
}

export async function POST(req: Request) {
    const raw = await req.text()
    const sigHeader = req.headers.get('webhook-signature')
    if (!verifyReplicateSignature(raw, sigHeader)) {
        return NextResponse.json({ error: 'invalid_signature' }, { status: 401 })
    }

    const payload = JSON.parse(raw) as ReplicatePayload
    const job = await getPendingJob(payload.id)
    if (!job) {
        return NextResponse.json({ error: 'job_not_found' }, { status: 404 })
    }

    if (payload.status !== 'succeeded' || !payload.output) {
        await markJobFailed(payload.id, payload.error ?? 'generation_failed')
        return NextResponse.json({ ok: true })
    }

    const imageUrl = Array.isArray(payload.output) ? payload.output[0] : payload.output
    const res = await fetch(imageUrl)
    if (!res.ok) {
        await markJobFailed(payload.id, `fetch_failed_${res.status}`)
        return NextResponse.json({ error: 'fetch_failed' }, { status: 502 })
    }

    const bytes = new Uint8Array(await res.arrayBuffer())
    const publicUrl = await uploadImage(job.key, bytes, 'image/webp')
    await markJobReady(payload.id, publicUrl)

    return NextResponse.json({ ok: true })
}

Two things to watch out for here. First, Replicate's webhook signature uses the bucket secret you set at the account level, not a per-prediction one - make sure WEBHOOK_SECRET matches the value in your Replicate account settings. Second, the output field can be a string or string array depending on the model. Flux returns a single URL string for one image, but other models return arrays. Always normalize.

The Fal alternative - one synchronous call, no webhook needed

Fal is simpler operationally because it offers a "subscribe" pattern that holds the connection open until the image is ready, with built-in streaming progress updates. For UX where the user is actively waiting (a "generate avatar" button), this is often better than a webhook flow.

// src/lib/fal.ts
import { fal } from '@fal-ai/client'
import { uploadImage } from './r2'
import { imageKey, type GenerationInput } from './hash'
import { objectExists } from './r2'

fal.config({ credentials: process.env.FAL_KEY! })

type FalFluxOutput = {
    images: Array<{ url: string; content_type: string }>
}

export async function generateWithFal(input: Omit<GenerationInput, 'model'>) {
    const key = imageKey({ ...input, model: 'fal-ai/flux-pro/v1.1' })
    if (await objectExists(key)) {
        return { url: `${process.env.NEXT_PUBLIC_R2_PUBLIC_URL}/${key}`, cached: true }
    }

    const result = await fal.subscribe('fal-ai/flux-pro/v1.1', {
        input: {
            prompt: input.prompt,
            seed: input.seed,
            image_size: { width: input.width, height: input.height },
            num_inference_steps: 28,
            output_format: 'webp',
        },
        logs: false,
    })

    const data = result.data as FalFluxOutput
    if (!data.images?.[0]?.url) {
        throw new Error('fal_no_image_returned')
    }

    const res = await fetch(data.images[0].url)
    if (!res.ok) throw new Error(`fal_fetch_failed_${res.status}`)

    const bytes = new Uint8Array(await res.arrayBuffer())
    const publicUrl = await uploadImage(key, bytes, data.images[0].content_type)
    return { url: publicUrl, cached: false }
}

For Fal you can run this directly in a Next.js API route - Flux at 28 inference steps is typically 6–12s, comfortably inside Vercel's 60s budget. Keep the webhook pattern (Replicate flow above) for background jobs or anywhere you can't keep a connection open, like a Cloudflare Worker with a 30s CPU limit on the free plan.

The frontend: optimistic preview, then real image

The user clicks "generate." You want them to see something immediately - a skeleton, a blurred placeholder, an "generating..." state - and then swap to the real image. Don't show a spinning loader on a blank rectangle; that's the cheap-looking version every AI app started with.

// src/components/GenerateImage.tsx
'use client'
import { useState } from 'react'

type Status = 'idle' | 'pending' | 'ready' | 'error'

export function GenerateImage() {
    const [prompt, setPrompt] = useState('')
    const [status, setStatus] = useState<Status>('idle')
    const [url, setUrl] = useState<string | null>(null)
    const [error, setError] = useState<string | null>(null)

    async function generate() {
        setStatus('pending')
        setError(null)
        setUrl(null)

        const res = await fetch('/api/generate', {
            method: 'POST',
            headers: { 'content-type': 'application/json' },
            body: JSON.stringify({ prompt, width: 1024, height: 1024 }),
        })

        if (!res.ok) {
            const body = (await res.json().catch(() => ({}))) as { error?: string }
            setError(body.error ?? 'request_failed')
            setStatus('error')
            return
        }

        const body = (await res.json()) as
            | { status: 'ready'; url: string }
            | { status: 'pending'; jobId: string; key: string }

        if (body.status === 'ready') {
            setUrl(body.url)
            setStatus('ready')
            return
        }

        pollUntilReady(body.jobId)
    }

    async function pollUntilReady(jobId: string) {
        for (let i = 0; i < 40; i++) {
            await new Promise((r) => setTimeout(r, 1500))
            const res = await fetch(`/api/jobs/${jobId}`)
            if (!res.ok) continue
            const body = (await res.json()) as { status: Status; url?: string; error?: string }
            if (body.status === 'ready' && body.url) {
                setUrl(body.url)
                setStatus('ready')
                return
            }
            if (body.status === 'error') {
                setError(body.error ?? 'generation_failed')
                setStatus('error')
                return
            }
        }
        setError('timeout')
        setStatus('error')
    }

    return (
        <div className="space-y-4">
            <textarea
                value={prompt}
                onChange={(e) => setPrompt(e.target.value)}
                className="w-full border rounded p-2"
                rows={3}
                placeholder="A photo of a fox in a library"
            />
            <button
                onClick={generate}
                disabled={status === 'pending' || !prompt.trim()}
                className="px-4 py-2 bg-black text-white rounded disabled:opacity-50"
            >
                {status === 'pending' ? 'Generating…' : 'Generate'}
            </button>

            {status === 'pending' && (
                <div className="w-full aspect-square bg-gradient-to-br from-zinc-100 to-zinc-300 animate-pulse rounded" />
            )}
            {status === 'ready' && url && (
                <img src={url} alt={prompt} className="w-full rounded" />
            )}
            {status === 'error' && error && (
                <div className="text-red-600 text-sm">Failed: {error}</div>
            )}
        </div>
    )
}

The polling pattern is a stopgap. For a real production setup, swap it for Server-Sent Events or a websocket - but if you're early in product, polling every 1.5s is fine. Webhook fires, your DB updates, the next poll returns ready.

Cost per image at scale

At May 2026 prices (call this approximate - providers change pricing more often than they ship features):

Flux 1.1 Pro on Replicate: ~$0.04 per image
Flux 1.1 Pro on Fal: ~$0.04 per image
OpenAI gpt-image-1 (high quality): ~$0.19 per image - much pricier, but better at text rendering and product photography style
R2 storage: $0.015/GB-month. A 1024x1024 WebP at 90% quality is ~80KB. 100k images = ~8GB = $0.12/month
R2 egress: $0

For a side-project blog generating 10 hero images a week and serving them to 1k visitors a month: ~$1.60/year. For a SaaS generating 10k images per month with serious traffic: ~$400/month inference, ~$1.50/month storage, $0 egress. The dedup cache means returning users and crawlers cost zero.

When to pick which provider:

Fal: Best for interactive UX (user clicks → user waits → image appears). Lowest cold-start latency. Subscribe pattern means no webhook plumbing.
Replicate: Best for batch jobs, background processing, and when you need the bigger model catalog (SDXL fine-tunes, ControlNet, Stable Diffusion 3.5, etc.). Webhook flow is mature.
OpenAI image API: Pick when you specifically need text rendering inside images, photorealistic product shots, or you're already deep in the OpenAI ecosystem and don't want a third vendor. Worth the price premium for those cases only.

Gotchas worth knowing

A few things that tripped me up:

Webhook delivery isn't guaranteed. Replicate retries with exponential backoff but if your endpoint is down too long, the webhook is lost. Always have a background sweeper that polls predictions.get(jobId) for jobs older than 5 minutes still marked pending. Belt and braces.
R2 public domain caching is opaque. Once a key is cached at Cloudflare edge, even a delete-and-replace upload at the same key won't invalidate it for up to a day. Content-addressable keys (the hash pattern above) sidestep this entirely - never reuse a key.
Vercel's 4.5MB response body limit will bite you if you try to base64-encode the image and return it in the API response. Don't do that. Return the URL, let the browser fetch from R2.
Fal's subscribe keeps the connection open via WebSocket-over-HTTP. It works on Vercel's Node runtime but not on the Edge runtime. Set export const runtime = 'nodejs' on routes using it.
Prompt safety classifiers can deny you mid-generation with no refund. Always pre-filter before paying for inference.

What's next

The polling-from-the-browser pattern works but stops scaling around the time you have multiple users generating in parallel - every poll is an API request to your server, and a long generation queue creates a thundering herd. The clean fix is to put image generation jobs into a proper queue with status updates, so the API route returns instantly and the browser subscribes to job updates over SSE. That's exactly what I'll cover in the next post: why your AI feature needs a job queue, and how to add one with BullMQ. It pairs directly with this image pipeline - BullMQ owns the generation job, your webhook handler updates the queue state, and your frontend subscribes to the queue, not the LLM provider.