Give Your AI Agent Persistent Memory with Anthropic Managed Agents
Friday 19/06/2026
·10 min readYour assistant is great for exactly one conversation. The user tells it they prefer metric units, that their company's fiscal year starts in April, that the last report came back too verbose — and the moment the session ends, all of it evaporates. Next time they're back to square one, re-explaining context the agent should already know. The usual fix is to bolt on your own memory layer: embed everything, stuff it in a vector DB, write retrieval glue, and pray your top-K is relevant. That's a lot of infrastructure for "remember what the user told you."
Anthropic Managed Agents give you another option. With the Managed Agents beta (managed-agents-2026-04-01) in the TypeScript SDK, you get persistent memory across sessions through memory_stores, MCP credentials through vaults, and stateful conversations through sessions — all hosted by Anthropic. This is a hands-on tutorial: we'll build a personal assistant that remembers user preferences across sessions, and I'll be honest about what you give up versus rolling your own embeddings pipeline like the one in Build an Agentic RAG Pipeline That Retries and Reformulates Queries.
The mental model: agent once, session every run
The single thing people get wrong with Managed Agents is treating the agent like a per-request object. It isn't. An agent is a persisted, versioned config — model, system prompt, tools — that you create once and reference by ID forever. A session is one stateful run against that agent, inside a container Anthropic provisions as the agent's workspace.
Agent (created once, store the ID) ──► Session (created every run)
model, system, tools references agent by ID
+ memory stores, + vault IDs
So the flow is always: create an agent and an environment at setup time, store the IDs, then spin up a session per interaction. If you find yourself calling agents.create() at the top of your request handler, stop — you're accumulating orphaned agents and paying creation latency for nothing.
pnpm add @anthropic-ai/sdk
// src/agent/setup.ts — run ONCE, persist the returned IDs
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic() // reads ANTHROPIC_API_KEY from the environment
export async function setup() {
// The container where the agent's tools execute
const environment = await client.beta.environments.create({
name: 'assistant-env',
config: { type: 'cloud', networking: { type: 'unrestricted' } },
})
// The agent config — model/system/tools live HERE, never on the session
const agent = await client.beta.agents.create({
name: 'Personal Assistant',
model: 'claude-opus-4-8',
system:
'You are a personal assistant. Before starting any task, check your ' +
'memory mount for the user’s preferences and past context, and write ' +
'new durable facts back to it as you learn them.',
// The agent toolset gives Claude read/write/glob/grep — how it touches memory
tools: [{ type: 'agent_toolset_20260401', default_config: { enabled: true } }],
})
return { environmentId: environment.id, agentId: agent.id }
}
The SDK sets the managed-agents-2026-04-01 beta header automatically on every client.beta.{agents,environments,sessions,vaults,memoryStores}.* call, so you never pass it by hand.
Creating a memory store
A memory store is a workspace-scoped collection of small text documents that survives across sessions. The description is passed to the model, so write it for Claude, not for a human reader:
// src/memory/store.ts
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function createUserStore(userId: string) {
const store = await client.beta.memoryStores.create({
name: `user-${userId}`,
description: `Preferences and project context for user ${userId}. Check before any task.`,
})
// Optionally seed it with reference material before the first session runs.
// Memories are addressed by path; prefer many small files over one big one.
await client.beta.memoryStores.memories.create(store.id, {
path: '/preferences/formatting.md',
content: 'Reports use metric units. Fiscal year starts in April. Keep summaries under 200 words.',
})
return store.id // memstore_01Hx...
}
Each file is capped at 100KB, which is the API nudging you toward the right shape: lots of small, well-named files instead of one giant blob. That granularity is what lets the agent grep for the one thing it needs instead of reading everything.
Attaching memory to a session
Now the payoff. Attach the store to a session via the resources array. Two things to internalize here, because both will bite you otherwise:
- Memory stores attach at session-create time only. There's no "add memory to a running session" — you can't
resources.add()a memory store later. Decide up front. - The store is mounted into the container as a filesystem directory at
/mnt/memory/<store-name>/. The agent reads and writes it with ordinary file tools (read,write,grep,glob) — there are no special "memory tools." A note describing the mount is auto-injected into the system prompt, so Claude knows it's there.
// src/session/run.ts
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function runTurn(opts: {
agentId: string
environmentId: string
memoryStoreId: string
userMessage: string
}) {
const session = await client.beta.sessions.create({
agent: opts.agentId, // string shorthand = latest agent version
environment_id: opts.environmentId,
resources: [
{
type: 'memory_store',
memory_store_id: opts.memoryStoreId,
access: 'read_write', // or 'read_only' — enforced at the mount level
instructions: 'User preferences and project notes. Read before acting; record new durable facts.',
},
],
})
// STREAM FIRST, then send. The stream only delivers events emitted after it
// opens — send first and you miss the agent's earliest output.
const stream = await client.beta.sessions.events.stream(session.id)
await client.beta.sessions.events.send(session.id, {
events: [{ type: 'user.message', content: [{ type: 'text', text: opts.userMessage }] }],
})
let answer = ''
for await (const event of stream) {
if (event.type === 'agent.message') {
for (const block of event.content) {
if (block.type === 'text') answer += block.text
}
}
if (event.type === 'session.status_terminated') break
// Idle fires transiently (e.g. between tool calls). Only break on a TERMINAL
// stop reason — 'requires_action' means the agent is waiting on you.
if (event.type === 'session.status_idle' && event.stop_reason.type !== 'requires_action') {
break
}
}
return answer
}
Run this twice across two separate sessions and the second one already knows the user wants metric units and 200-word summaries — because the first session wrote that to the mount and the store persisted. No embeddings, no retrieval call, no vector DB. The agent decided what to read and what to remember using plain file operations.
Gotcha — don't break on bare session.status_idle. The session goes idle transiently: between parallel tool calls, while waiting for a tool confirmation, or while awaiting a result you owe it. If you break on the first idle, you'll cut the agent off mid-task. Gate on the stop_reason — requires_action means keep going, it needs you; anything else (end_turn, retries_exhausted) is terminal.
Vaults: secrets, but only the MCP kind
The backlog framing for this post said vaults "hold secrets," and that's the one place I want to be precise, because it's easy to over-read. A vault stores MCP OAuth credentials that Anthropic auto-refreshes and injects when the agent calls an MCP server. It is not a general-purpose secrets manager, and the credentials never enter the container — they're added by an Anthropic-side proxy after the request leaves the sandbox, so even a prompt-injected agent can't read them.
If your assistant needs to hit, say, a calendar MCP server on the user's behalf:
// src/vault/credentials.ts
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function vaultForUser() {
const vault = await client.beta.vaults.create({ name: 'user-mcp-creds' })
await client.beta.vaults.credentials.create(vault.id, {
display_name: 'Calendar MCP',
auth: {
type: 'mcp_oauth',
mcp_server_url: 'https://mcp.example.com/mcp',
access_token: process.env.CAL_ACCESS_TOKEN!,
expires_at: '2026-07-01T00:00:00Z',
refresh: {
refresh_token: process.env.CAL_REFRESH_TOKEN!,
client_id: process.env.CAL_CLIENT_ID!,
token_endpoint: 'https://example.com/oauth/token',
token_endpoint_auth: { type: 'none' }, // public OAuth client
},
},
})
return vault.id // vlt_...
}
Then declare the MCP server on the agent (no auth there) and pass vault_ids: [vaultId] on sessions.create(). The refresh block is what enables auto-refresh: Anthropic posts the refresh_token grant to your token_endpoint before the access token expires.
The trap that costs people an afternoon: hosted MCP servers want OAuth bearer tokens, not the service's native API key. A Notion ntn_ integration token authenticates against Notion's REST API but will not work as a vault credential for the Notion MCP server — different auth systems entirely. And if you need a non-MCP secret inside the container (an aws CLI, a raw curl to some API), there's currently no way to set container env vars — keep that call host-side via a custom tool instead. Don't paste the key into the system prompt; it persists in the session's event history.
Memory hygiene and the audit trail
Every mutation to a memory — whether the agent writes it or you do host-side — produces an immutable memory version (memver_...). That's your audit log and your rollback lever. The one you'll actually reach for: redaction, for when a secret or piece of PII leaks into a memory and a user asks for deletion.
// src/memory/redact.ts — scrub content from history, keep the audit trail
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function redactVersion(storeId: string, versionId: string) {
// Clears content/path but preserves who changed it and when
await client.beta.memoryStores.memoryVersions.redact(versionId, {
memory_store_id: storeId,
})
}
A pattern worth adopting: attach two stores to a session — one read_only shared-reference store (company policies, formatting standards) and one read_write per-user store. You get up to 8 stores per session, so you can slice memory by owner and lifecycle instead of dumping everything into one bucket. The read_only flag is enforced at the filesystem level, so the agent physically cannot corrupt your shared reference data.
Managed vs. DIY: what you actually trade
So when do you reach for this over the embeddings-in-a-vector-DB approach from the agentic RAG post or the basic 100-line RAG chatbot?
| | Managed memory stores | DIY embeddings + vector DB |
| --- | --- | --- |
| Infra | None — Anthropic hosts it | You run Pinecone/pgvector, ingestion, retrieval glue |
| Retrieval | Agent greps/reads the mount itself | You control top-K, reranking, hybrid search |
| Best for | Per-user preferences, task history, evolving context | Large corpora where retrieval quality is the product |
| Cost shape | Tokens the agent pulls into context | Embedding + vector-store ops + tokens |
The honest tradeoff is control for convenience. With memory stores you don't tune retrieval — the agent decides what to read, and whatever it reads lands in the context window and gets billed (the same token-cost reality I dug into in The Real Cost of Running an AI Feature in Production). If your use case is "remember this user's preferences and what we did last time," that's a great deal — you delete a whole subsystem. If your use case is "answer questions over 10M documents with high recall," you still want a real retrieval pipeline where you own the ranking. They're not competitors so much as different jobs.
The mental model, again
Persistent memory used to mean you build the storage, the embeddings, and the retrieval. Managed Agents flip it: memory is a filesystem the agent reads and writes with normal tools, vaults inject MCP credentials the container never sees, and sessions keep it all stateful — none of which you operate. Create the agent once, attach a per-user store and a shared read-only store, and your assistant stops forgetting.
What's next
A persistent assistant is also a more expensive assistant — every memory file the agent reads is context tokens, and multi-step agentic loops amplify that fast. The natural follow-up is the economics of all this: The Real Cost of AI Agents: A Founder's Guide to Agent Economics and Margins — why agents break the unit economics that single-call features rely on, and the levers (model routing, prompt caching, step caps) that bring per-task cost back under control. Pair it with this post and you'll know not just how to give your agent memory, but what that memory costs you at scale.