Anthropic Agent Skills in TypeScript: Package Reusable Instructions and Code as Tools

Wednesday 17/06/2026

·11 min read
Share:

You've written the same prompt three times this month. The one that explains how your invoices are structured, which fields are mandatory, how to compute the tax line, and what "looks valid" means. It lives as a 2,000-token string at the top of three different features, each copy slightly drifted from the others, and every one of them re-explains the same rules to Claude on every single call. When the rules change, you hunt down all three.

Anthropic Agent Skills fix this. A skill is a folder — a SKILL.md file plus optional scripts — that Claude discovers, loads on demand, and runs in a sandbox. Now that Agent Skills are an open standard with first-class Claude API support, you can build one with the @anthropic-ai/sdk, upload it once, and have every feature pull it in by ID. This is a hands-on TypeScript SDK tutorial: we'll build a real invoice-processing skill, upload it, run it, version it, and test it in CI.

If you've never thought about structuring code for AI agents, the same instincts apply here — see How to Structure Your TypeScript Codebase So AI Coding Agents Work Better.

What a skill actually is

A skill is just a directory with one required file:

// invoice-processing/ — the skill folder
invoice-processing/
├── SKILL.md            # required: frontmatter + instructions
├── REFERENCE.md        # optional: detailed rules, loaded only when needed
└── scripts/
    └── validate_invoice.py   # optional: deterministic code Claude runs via bash

The magic is progressive disclosure, and it's the whole reason skills beat a giant system prompt. Claude loads content in three stages:

  1. Metadata (always loaded): the name and description from the frontmatter, ~100 tokens. Claude knows the skill exists and when to use it.
  2. Instructions (loaded when triggered): the body of SKILL.md, read via bash only when a request matches the description.
  3. Resources and scripts (loaded as needed): REFERENCE.md, schemas, and scripts. A script's code never enters the context window — Claude runs it and sees only the output.

That last point is the one people miss. A 300-line validation script costs you zero context tokens until Claude runs it, and even then you only pay for its stdout. Stuffing that logic into a system prompt costs you the full token count on every request, forever.

Here's the SKILL.md:

<!-- invoice-processing/SKILL.md -->
---
name: invoice-processing
description: Extract structured data from invoices (PDF or text) and validate it. Use when the user uploads an invoice, asks to parse a bill, or mentions invoice extraction, line items, or invoice totals.
---

# Invoice Processing

Extract invoice data into this exact JSON shape, then validate it before returning.

## Extraction

Pull these fields:
- `invoice_number` (string)
- `issue_date` (ISO 8601 date)
- `currency` (ISO 4217 code, e.g. "USD")
- `line_items`: array of `{ description, quantity, unit_price, amount }`
- `subtotal`, `tax`, `total` (numbers)

For PDFs, read text with `pdfplumber` (pre-installed). Never guess a field — if it
is missing or ambiguous, set it to null and add a note.

## Validation

After extracting, run the validation script and include its result:

```bash
python scripts/validate_invoice.py extracted.json

If validation fails, report which checks failed. Do not silently "fix" the numbers.

See REFERENCE.md for tax-rounding and multi-currency rules.


The `description` is doing real work — it's the only part Claude sees at startup, so it has to say *what the skill does and when to reach for it*. Write it for the model, not for a human reader.

**Gotcha #1:** the `name` field has rules that will bite you. Lowercase letters, numbers, and hyphens only; max 64 characters; and it **cannot contain the words "anthropic" or "claude"**. Naming your skill `claude-invoice-helper` returns a 400. The `description` maxes out at 1024 characters.

## The validation script

Scripts are where you put logic that must be *deterministic* — Claude is great at reading a messy PDF, terrible at reliably adding up 40 line items. Offload the math:

```python
# invoice-processing/scripts/validate_invoice.py
import json
import sys
from decimal import Decimal


def validate(invoice: dict) -> dict:
    errors: list[str] = []

    required = ["invoice_number", "issue_date", "currency", "total"]
    for field in required:
        if invoice.get(field) in (None, ""):
            errors.append(f"missing required field: {field}")

    items = invoice.get("line_items") or []
    if not items:
        errors.append("no line items found")

    # Line items must sum to subtotal (money math uses Decimal, never float)
    line_sum = sum(Decimal(str(i.get("amount", 0))) for i in items)
    subtotal = Decimal(str(invoice.get("subtotal", 0)))
    if items and abs(line_sum - subtotal) > Decimal("0.01"):
        errors.append(f"line items sum to {line_sum}, but subtotal is {subtotal}")

    # subtotal + tax must equal total
    tax = Decimal(str(invoice.get("tax", 0)))
    total = Decimal(str(invoice.get("total", 0)))
    if abs(subtotal + tax - total) > Decimal("0.01"):
        errors.append(f"subtotal ({subtotal}) + tax ({tax}) != total ({total})")

    return {"valid": len(errors) == 0, "errors": errors}


if __name__ == "__main__":
    with open(sys.argv[1]) as f:
        result = validate(json.load(f))
    print(json.dumps(result))
    sys.exit(0 if result["valid"] else 1)

No any-equivalent hand-waving — this either passes or tells Claude exactly which invariant broke, and Claude relays that instead of confidently returning wrong totals.

Uploading the skill

Install the SDK (this assumes a recent version — check your package.json):

pnpm add @anthropic-ai/sdk

Upload the folder as individual files. The SDK's toFile helper wraps each one, and the first argument's filename must keep the folder prefix so the skill's directory structure is preserved in the container:

// src/skills/upload-invoice-skill.ts
import Anthropic, { toFile } from '@anthropic-ai/sdk'
import fs from 'fs'

const client = new Anthropic() // reads ANTHROPIC_API_KEY from the environment

export async function uploadInvoiceSkill(): Promise<{ id: string; version: string }> {
    try {
        const skill = await client.beta.skills.create({
            display_title: 'Invoice Processing',
            files: [
                await toFile(
                    fs.createReadStream('invoice-processing/SKILL.md'),
                    'invoice-processing/SKILL.md',
                    { type: 'text/markdown' }
                ),
                await toFile(
                    fs.createReadStream('invoice-processing/REFERENCE.md'),
                    'invoice-processing/REFERENCE.md',
                    { type: 'text/markdown' }
                ),
                await toFile(
                    fs.createReadStream('invoice-processing/scripts/validate_invoice.py'),
                    'invoice-processing/scripts/validate_invoice.py',
                    { type: 'text/x-python' }
                ),
            ],
        })

        console.log(`Created skill ${skill.id} (version ${skill.latest_version})`)
        return { id: skill.id, version: skill.latest_version }
    } catch (err) {
        if (err instanceof Anthropic.APIError) {
            // 400s here are almost always a bad `name` or `description` in SKILL.md
            console.error(`Skill upload failed (${err.status}): ${err.message}`)
        }
        throw err
    }
}

skill.id looks like skill_01AbCd.... Store it — that's how every feature references the skill from now on. Custom skills uploaded through the API are workspace-wide, so every member of your API workspace can use it by ID.

Running it

Now the payoff. To use a skill in a Messages call you pass it in the container parameter alongside the code execution tool (skills run inside that sandbox), plus three beta headers:

// src/skills/process-invoice.ts
import Anthropic from '@anthropic-ai/sdk'
import fs from 'fs'

const client = new Anthropic()

export async function processInvoice(skillId: string, pdfPath: string) {
    // 1. Upload the invoice file so the container can read it
    const file = await client.beta.files.upload(
        { file: fs.createReadStream(pdfPath) },
        { headers: { 'anthropic-beta': 'files-api-2025-04-14' } }
    )

    // 2. Run the message with the skill loaded
    const response = await client.beta.messages.create({
        model: 'claude-opus-4-8',
        max_tokens: 4096,
        betas: [
            'code-execution-2025-08-25',
            'skills-2025-10-02',
            'files-api-2025-04-14',
        ],
        container: {
            skills: [{ type: 'custom', skill_id: skillId, version: 'latest' }],
        },
        tools: [{ type: 'code_execution_20250825', name: 'code_execution' }],
        messages: [
            {
                role: 'user',
                content: [
                    { type: 'text', text: 'Extract and validate this invoice. Return the JSON.' },
                    { type: 'container_upload', file_id: file.id },
                ],
            },
        ],
    })

    for (const block of response.content) {
        if (block.type === 'text') return block.text
    }
    return null
}

Notice what isn't in this code: any explanation of your invoice schema, any validation logic, any tax rules. The system prompt is empty. Claude saw the one-line skill description, decided this request matched, read SKILL.md, ran validate_invoice.py, and returned the result. The instructions and the script lived in the skill — loaded only because they were needed.

Gotcha #2 — the big one. On the Claude API surface, the skill's sandbox has no network access and no runtime package installation. Only pre-installed packages are available (pdfplumber, pandas, openpyxl, etc. — the code execution tool list). If your skill's script tries to pip install something or call an external API, it works in Claude Code but fails silently on the API. Design API skills to be self-contained. If you need a live lookup, that belongs in a tool you control, not in the skill — see How to Build an MCP Server in TypeScript from Scratch.

When a skill beats a giant system prompt

Not everything should be a skill. Use the decision this way:

| Use a skill when… | Keep it in the prompt when… | | --- | --- | | The same instructions are reused across features | It's one-off, conversation-specific guidance | | You have deterministic logic (math, validation, parsing) | There's no code, just a short instruction | | The full instructions are large but rarely all needed | The guidance is tiny and always relevant | | You want to version and roll back behavior independently | The behavior changes every request |

The progressive-disclosure math is the clincher. A 4,000-token invoice spec in your system prompt costs 4,000 input tokens on every call, even the ones that aren't about invoices. As a skill, it costs ~100 tokens (the description) until a request actually triggers it. Across a busy app, that's a real chunk of your bill — the kind of thing I dug into in The Real Cost of Running an AI Feature in Production.

Versioning and testing in CI

When the tax rules change, you don't redeploy three features — you ship a new skill version:

// src/skills/version-invoice-skill.ts
import Anthropic, { toFile } from '@anthropic-ai/sdk'
import fs from 'fs'

const client = new Anthropic()

export async function publishNewVersion(skillId: string) {
    // You can upload a whole folder as a single zip for a new version
    const version = await client.beta.skills.versions.create(skillId, {
        files: [await toFile(fs.createReadStream('invoice-processing.zip'), 'skill.zip')],
    })
    console.log(`Published version ${version.version}`)
    return version.version
}

Features that pin version: 'latest' pick it up immediately; features that pin a specific version string (custom versions are epoch timestamps like 1759178010641129) keep running the old behavior until you bump them. That's your safe rollout and rollback lever.

The part most people skip: test skills before they go live. A skill is behavior, and behavior regresses. Run a fixture invoice through the new version and assert on the validation output before promoting it:

// src/skills/invoice-skill.test.ts
import { describe, it, expect } from 'vitest'
import { processInvoice } from './process-invoice'

describe('invoice-processing skill', () => {
    it('flags an invoice whose totals do not add up', async () => {
        // fixtures/bad-total.pdf: subtotal 100 + tax 10, total wrongly printed as 120
        const result = await processInvoice(process.env.SKILL_ID!, 'fixtures/bad-total.pdf')
        expect(result).toMatch(/subtotal.*tax.*!= total/i)
    })

    it('extracts a clean invoice without errors', async () => {
        const result = await processInvoice(process.env.SKILL_ID!, 'fixtures/clean.pdf')
        expect(result).toMatch(/"valid":\s*true/)
    })
})

This is fuzzy (you're asserting on natural-language output), so keep assertions loose — match on the signal the validation script emits, not Claude's exact phrasing. The same evaluation patterns from How to Test AI Features: Unit Testing LLM-Powered Code apply directly. Wire this into your pipeline so uploading a broken skill version fails the build instead of breaking production.

The mental model

A skill is your domain expertise, packaged once: instructions Claude reads on demand, scripts Claude runs without burning context, versioned independently of any feature that uses it. You stop copy-pasting prompts, your validation logic becomes deterministic code instead of a hopeful instruction, and changing the rules is a single upload. Up to 8 skills can be active per request, so you can compose them — an invoice-processing skill and an xlsx skill in the same call to extract a bill and drop it into a spreadsheet.

What's next

A skill gives Claude reusable expertise, but it still forgets everything between sessions. The natural next step is real cross-session memory — preferences, past tasks, and context that survives. Next up: Give Your AI Agent Persistent Memory with Anthropic Managed Agents (memory_stores, vaults, and sessions), and how the managed approach trades control for not having to run your own vector store. Pair it with this post and you've got an agent that's both specialized and persistent.

Share:
VA

Vadim Alakhverdov

Software developer writing about JavaScript, web development, and developer tools.

Related Posts