PII Redaction Middleware: Strip Sensitive Data Before It Reaches the LLM

Wednesday 03/06/2026

·11 min read
Share:

Your support assistant works great in the demo. Then legal asks one question: "where does the customer's data go when they type their credit card into the chat?" The honest answer - "straight to a third-party model provider, logged on their side for 30 days" - is the kind of thing that stalls a SOC2 audit and gets a feature pulled from a healthcare or fintech roadmap. You do not need to turn the LLM off. You need a layer between your user's input and the model that strips the sensitive bits out first.

This is PII redaction, and doing it well in TypeScript is mostly an exercise in careful regex, a reversible token map, and an audit trail. This post builds a reusable PII redaction LLM middleware for a Next.js app: it detects emails, phone numbers, credit cards, SSNs, and custom patterns, swaps them for placeholder tokens before the LLM call, and re-hydrates the original values in the response. We will cover reversible vs irreversible redaction, where regex stops being enough (and Microsoft Presidio takes over), and the real tradeoff nobody mentions: redaction degrades the context the model sees. If you are also wiring up observability or eval gates in CI, redaction belongs in the same hardening pass.

Why a middleware, not inline string replacement

The naive version is a function you call before every anthropic.messages.create. That works until you have five call sites, three of which forget to call it, and one that redacts the input but logs the raw text anyway. Redaction has to be a choke point: a single place every LLM-bound payload passes through, with logging baked in so compliance can prove it ran. That is what a middleware gives you.

There are two redaction modes, and you must pick deliberately:

  • Irreversible - replace john@acme.com with [EMAIL] and throw the original away. Safest, but the model loses information. If the user asks "reply to my email," the assistant cannot, because it never saw an address.
  • Reversible - replace john@acme.com with a unique token like [EMAIL_1], keep a { "[EMAIL_1]": "john@acme.com" } map in memory for the duration of the request, and swap the real value back into the model's response. The model reasons over placeholders; the user sees real data. The PII never leaves your server.

Reversible is what most product features need. Build that, and downgrade to irreversible by simply discarding the map.

The detector

Start with the detection layer. Each detector has a name, a regex, and an optional validator (for things like credit cards where a regex match is not enough).

// src/lib/pii/detectors.ts
export type PIIType =
    | 'EMAIL'
    | 'PHONE'
    | 'CREDIT_CARD'
    | 'SSN'
    | 'IP'

export interface Detector {
    type: PIIType
    pattern: RegExp
    validate?: (match: string) => boolean
}

// Luhn check - a 16-digit number that matches the regex is not
// necessarily a card. This kills most false positives.
function luhnValid(value: string): boolean {
    const digits = value.replace(/\D/g, '')
    if (digits.length < 13 || digits.length > 19) return false
    let sum = 0
    let double = false
    for (let i = digits.length - 1; i >= 0; i--) {
        let d = Number(digits[i])
        if (double) {
            d *= 2
            if (d > 9) d -= 9
        }
        sum += d
        double = !double
    }
    return sum % 10 === 0
}

export const DETECTORS: Detector[] = [
    {
        type: 'EMAIL',
        pattern: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
    },
    {
        type: 'CREDIT_CARD',
        // 13-19 digits, optionally separated by spaces or hyphens
        pattern: /\b(?:\d[ -]*?){13,19}\b/g,
        validate: luhnValid,
    },
    {
        type: 'SSN',
        pattern: /\b\d{3}-\d{2}-\d{4}\b/g,
    },
    {
        type: 'PHONE',
        // North American formats; adapt per locale
        pattern: /\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b/g,
    },
    {
        type: 'IP',
        pattern: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
    },
]

Order matters. Credit cards must be checked before phone numbers, because a phone regex can greedily eat part of a card number. The redactor below processes detectors in array order and removes matched spans, so put the most specific patterns first.

The redactor

The redactor walks the text, applies each detector, and builds the reversible token map. The trick is to redact onto a working copy while tracking already-redacted spans, so a later detector does not match inside a token we already inserted.

// src/lib/pii/redactor.ts
import { DETECTORS, type PIIType } from './detectors'

export interface RedactionResult {
    redacted: string
    // token -> original value, e.g. "[EMAIL_1]" -> "john@acme.com"
    map: Record<string, string>
    // counts per type, for the audit log (no raw values)
    found: Record<PIIType, number>
}

export function redact(input: string): RedactionResult {
    const map: Record<string, string> = {}
    const found = {} as Record<PIIType, number>
    // Track which characters are already inside a token placeholder.
    let working = input

    for (const detector of DETECTORS) {
        const counters = new Map<string, number>()
        working = working.replace(detector.pattern, (match) => {
            if (detector.validate && !detector.validate(match)) {
                return match // leave non-PII matches untouched
            }
            // Skip if we somehow matched an existing placeholder.
            if (/^\[[A-Z_]+_\d+\]$/.test(match.trim())) return match

            const n = (counters.get(detector.type) ?? 0) + 1
            counters.set(detector.type, n)
            const token = `[${detector.type}_${n}]`
            map[token] = match
            found[detector.type] = (found[detector.type] ?? 0) + 1
            return token
        })
    }

    return { redacted: working, map, found }
}

export function rehydrate(text: string, map: Record<string, string>): string {
    let out = text
    for (const [token, original] of Object.entries(map)) {
        // Replace every occurrence the model echoed back.
        out = out.split(token).join(original)
    }
    return out
}

rehydrate uses split().join() rather than a regex replace so the original value - which may contain regex-special characters like + in an email - is treated literally. A String.prototype.replaceAll with a string argument works too; both avoid the regex-escaping footgun.

A gotcha worth flagging: the model sometimes mangles tokens. It might return [EMAIL 1] or [email_1] instead of [EMAIL_1]. In production I normalize the model output with a tolerant regex before rehydrating, and log any token in the map that never got rehydrated - that is a signal the model dropped customer data on the floor.

The middleware

Now the choke point. This wraps an Anthropic call so every payload is redacted on the way in and rehydrated on the way out, with an audit log in between.

// src/lib/pii/withRedaction.ts
import Anthropic from '@anthropic-ai/sdk'
import { redact, rehydrate } from './redactor'
import { logRedaction } from './audit'

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY })

export interface RedactedCallOptions {
    userInput: string
    systemPrompt: string
    requestId: string
}

export async function callLLMWithRedaction({
    userInput,
    systemPrompt,
    requestId,
}: RedactedCallOptions): Promise<string> {
    const { redacted, map, found } = redact(userInput)

    // Audit log records COUNTS, never raw PII values.
    await logRedaction({ requestId, found, tokens: Object.keys(map) })

    let response: Anthropic.Message
    try {
        response = await anthropic.messages.create({
            model: 'claude-haiku-4-5-20251001',
            max_tokens: 1024,
            system: systemPrompt,
            messages: [{ role: 'user', content: redacted }],
        })
    } catch (err) {
        // Never let a failure fall through to a raw retry that skips redaction.
        console.error(`[${requestId}] LLM call failed`, err)
        throw new Error('LLM request failed after redaction')
    }

    const text = response.content
        .filter((b): b is Anthropic.TextBlock => b.type === 'text')
        .map((b) => b.text)
        .join('')

    return rehydrate(text, map)
}

Two production rules are encoded here. First, the audit log gets counts and token names, never the underlying values - logging the PII you just redacted defeats the point and is itself a violation. Second, the catch block throws instead of silently retrying, because a "fallback" path that re-sends the raw input is exactly the kind of bug that leaks data under load.

The audit logger is deliberately boring:

// src/lib/pii/audit.ts
import type { PIIType } from './detectors'

export interface RedactionAuditEntry {
    requestId: string
    found: Record<PIIType, number>
    tokens: string[]
}

export async function logRedaction(entry: RedactionAuditEntry): Promise<void> {
    // Swap for your sink: Datadog, a DB table, an append-only S3 log.
    console.info(
        JSON.stringify({
            event: 'pii_redaction',
            at: new Date().toISOString(),
            requestId: entry.requestId,
            counts: entry.found,
            tokenCount: entry.tokens.length,
        })
    )
}

Wiring it into a Next.js route handler is then trivial:

// src/app/api/chat/route.ts
import { callLLMWithRedaction } from '@/src/lib/pii/withRedaction'
import { randomUUID } from 'crypto'

export async function POST(req: Request) {
    const { message } = await req.json()
    if (typeof message !== 'string' || message.length > 8000) {
        return Response.json({ error: 'Invalid input' }, { status: 400 })
    }

    const requestId = randomUUID()
    try {
        const reply = await callLLMWithRedaction({
            userInput: message,
            systemPrompt: 'You are a helpful support assistant.',
            requestId,
        })
        return Response.json({ reply, requestId })
    } catch {
        return Response.json({ error: 'Something went wrong' }, { status: 502 })
    }
}

Where regex stops working: names and addresses

Regex is great for structured PII - anything with a predictable shape. It is useless for names, street addresses, and free-form locations, because "John from Boston" has no pattern. For those you need named-entity recognition.

The standard tool is Microsoft Presidio, which is Python. You have two integration options from a TypeScript app:

  1. Run Presidio as a sidecar service. Deploy presidio-analyzer as a container, POST text to its HTTP API, and merge its findings with your regex detectors. This gives you NER-quality detection for PERSON, LOCATION, NRP, etc.
  2. Use a JS-native NER model via @huggingface/transformers running a token-classification model in a worker. Lighter to deploy, lower accuracy than Presidio.

Here is the sidecar shape - call Presidio, normalize its response into the same found/map structure, and union it with the regex pass:

// src/lib/pii/presidio.ts
interface PresidioFinding {
    entity_type: string
    start: number
    end: number
    score: number
}

export async function analyzeWithPresidio(
    text: string,
    minScore = 0.6
): Promise<PresidioFinding[]> {
    const res = await fetch(`${process.env.PRESIDIO_URL}/analyze`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ text, language: 'en' }),
    })
    if (!res.ok) throw new Error(`Presidio returned ${res.status}`)
    const findings = (await res.json()) as PresidioFinding[]
    return findings.filter((f) => f.score >= minScore)
}

The minScore threshold is the dial you will spend the most time tuning. Too low and you redact every capitalized word; too high and real names slip through. Start at 0.6 and adjust against your own data.

The tradeoff nobody warns you about

Redaction is not free - it degrades the model's context. "Email John at john@acme.com about order 5512" becomes "Email [PERSON_1] at [EMAIL_1] about order [CREDIT_CARD_1]" - and now you have two problems. The order number 5512 got misdetected as a card fragment (Luhn would catch a real 16-digit number, but short numerics are noisy), and the model has lost the human-readable anchors that help it reason. Over-redaction makes the assistant dumber.

The fix is scoping. Do not redact everything blindly:

  • Only redact in the user-input channel, not your own trusted system prompt or retrieved documents you control.
  • Tune detectors to your domain. If order numbers are 4 digits, tighten the credit-card minimum length and validator so they are never candidates.
  • Measure it. Run your eval suite with redaction on and off and watch the quality delta. If task success drops 15%, your detectors are too aggressive.

Test the detectors like a security feature

A redaction layer that silently stops working is worse than none, because you think you are covered. Lock it down with a test suite of real PII patterns - including the tricky ones that should not match.

// src/lib/pii/redactor.test.ts
import { describe, it, expect } from 'vitest'
import { redact, rehydrate } from './redactor'

describe('PII redaction', () => {
    it('redacts and reversibly rehydrates an email', () => {
        const { redacted, map } = redact('Reach me at jane@acme.com please')
        expect(redacted).toBe('Reach me at [EMAIL_1] please')
        expect(rehydrate(redacted, map)).toContain('jane@acme.com')
    })

    it('redacts a valid Luhn credit card', () => {
        const { redacted } = redact('card: 4242 4242 4242 4242')
        expect(redacted).toMatch(/\[CREDIT_CARD_1\]/)
    })

    it('does NOT redact a number that fails Luhn', () => {
        const { redacted } = redact('order 1234 5678 1234 5670')
        expect(redacted).toContain('1234 5678 1234 5670')
    })

    it('handles multiple emails with distinct tokens', () => {
        const { map } = redact('a@x.com and b@y.com')
        expect(Object.keys(map)).toEqual(['[EMAIL_1]', '[EMAIL_2]'])
    })

    it('rehydrates values containing regex-special chars', () => {
        const { redacted, map } = redact('j.doe+tag@a.io')
        expect(rehydrate(redacted, map)).toBe('j.doe+tag@a.io')
    })
})

The negative test - a number that fails the Luhn check stays untouched - is the one that catches a botched detector edit. Treat these tests as a regression gate and run them in CI on every change to the pii/ directory.

What's next

Redaction protects the data going into the model. The other half of input safety is protecting the model from being manipulated by that input - users who craft messages to override your system prompt or exfiltrate data. The natural follow-up is prompt injection defense for JavaScript apps: input sanitization, output validation, and middleware that catches the common attack patterns before they hit your model.

Share:
VA

Vadim Alakhverdov

Software developer writing about JavaScript, web development, and developer tools.

Related Posts