How to Migrate from OpenAI Assistants API to Responses API in TypeScript
Monday 04/05/2026
·12 min readOpenAI is sunsetting the Assistants API in mid-2026 and your client.beta.threads.runs.createAndPoll() calls have a stopwatch on them. If you skim the migration docs, the Responses API looks like a different shape entirely — no threads, no runs, no assistant_id. The mapping isn't obvious, and the parts that are one-to-one have just enough subtle differences to break things if you copy-paste blindly.
I migrated a production integration last month. This post is the field guide I wish I'd had: a concept-by-concept map from Assistants to Responses, full TypeScript before/after code diffs for the patterns that actually matter (memory, file search, tools, streaming), and the migration gotchas that aren't in the official guide.
Why the migration matters
Beyond the deprecation date, the Responses API is genuinely better:
- Stateless by default — no polling loop, no
runs.retrieve()until status iscompleted. - One round-trip per turn —
client.responses.create()returns the full response. Streaming uses standard SSE. - Cheaper — no per-thread storage costs, no separate
assistantresource to maintain. - Built-in conversation memory — pass
previous_response_idand OpenAI manages the history server-side.
The tradeoff: every Assistants concept (thread, run, assistant, file batch) maps to a different shape, and a few features (mainly the polling-based file batch upload flow) require rebuilding from primitives.
The conceptual map
| Assistants API | Responses API |
| --- | --- |
| client.beta.assistants.create() | No equivalent — pass instructions and tools per request, or store a config in your code |
| client.beta.threads.create() | previous_response_id parameter (server-managed) or your own message array |
| threads.messages.create() + runs.createAndPoll() | client.responses.create() |
| runs.steps.list() | response.output[] array (tool calls + text in order) |
| file_search tool | file_search tool — same name, different config shape |
| code_interpreter tool | code_interpreter tool — wrap with container config |
| Vector stores (vectorStores.create()) | Vector stores (vector_stores.create()) — moved out of beta, mostly drop-in |
| runs.submitToolOutputs() | Pass tool output as a function call result in the next responses.create() call |
The mental shift: Assistants is server-orchestrated and stateful (you create resources, then poll). Responses is client-orchestrated and idempotent (you send a request, you get a response, you optionally pass a pointer to the previous response). Most of the migration is unwinding the resource lifecycle.
Setup
pnpm install openai@^5.0.0
You need the official SDK v5 or later — earlier versions don't have the responses namespace stable. All examples below assume OPENAI_API_KEY is set in the environment.
// src/lib/openai.ts
import OpenAI from 'openai'
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
})
Pattern 1: Basic conversation with memory
This is the 80% case. An Assistants integration that maintains conversation history across turns.
Before — Assistants API
// src/lib/assistants-chat.ts (BEFORE)
import { openai } from './openai'
export async function createChatAssistant() {
const assistant = await openai.beta.assistants.create({
name: 'Support Bot',
instructions: 'You are a helpful support agent.',
model: 'gpt-4o',
})
return assistant.id
}
export async function startThread(): Promise<string> {
const thread = await openai.beta.threads.create()
return thread.id
}
export async function sendMessage(
assistantId: string,
threadId: string,
userMessage: string
): Promise<string> {
await openai.beta.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
})
const run = await openai.beta.threads.runs.createAndPoll(threadId, {
assistant_id: assistantId,
})
if (run.status !== 'completed') {
throw new Error(`Run failed with status: ${run.status}`)
}
const messages = await openai.beta.threads.messages.list(threadId, {
order: 'desc',
limit: 1,
})
const last = messages.data[0]
const textContent = last.content.find((c) => c.type === 'text')
if (!textContent || textContent.type !== 'text') {
throw new Error('No text content in response')
}
return textContent.text.value
}
Three resources to track (assistant, thread, run), one polling loop hidden inside createAndPoll, and a content-array unwrap that's easy to get wrong.
After — Responses API
// src/lib/responses-chat.ts (AFTER)
import { openai } from './openai'
const SYSTEM_INSTRUCTIONS = 'You are a helpful support agent.'
export async function sendMessage(
userMessage: string,
previousResponseId?: string
): Promise<{ text: string; responseId: string }> {
const response = await openai.responses.create({
model: 'gpt-4o',
instructions: SYSTEM_INSTRUCTIONS,
input: userMessage,
previous_response_id: previousResponseId,
})
return {
text: response.output_text,
responseId: response.id,
}
}
That's the whole thing. Conversation memory is previous_response_id. The first call passes undefined; every subsequent call passes the responseId returned from the previous turn. OpenAI stores the message history for 30 days by default, so you only need to persist that one ID per conversation.
If you're wiring this into a Next.js API route:
// src/pages/api/chat.ts
import type { NextApiRequest, NextApiResponse } from 'next'
import { sendMessage } from '@/src/lib/responses-chat'
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
if (req.method !== 'POST') {
return res.status(405).end()
}
const { message, previousResponseId } = req.body as {
message: string
previousResponseId?: string
}
try {
const result = await sendMessage(message, previousResponseId)
res.status(200).json(result)
} catch (err) {
const msg = err instanceof Error ? err.message : 'unknown error'
res.status(500).json({ error: msg })
}
}
The client persists responseId in localStorage or a session cookie and sends it back on the next turn. No threads, no runs, no polling.
Pattern 2: File search (RAG)
The Assistants API made file search look effortless — attach a vector store, ask questions, get answers with citations. The Responses API has the same feature but the tool config shape changed.
Before — Assistants with file_search
// src/lib/assistants-rag.ts (BEFORE)
import fs from 'fs'
import { openai } from './openai'
export async function createRagAssistant(filePaths: string[]): Promise<{
assistantId: string
vectorStoreId: string
}> {
const vectorStore = await openai.beta.vectorStores.create({
name: 'Docs',
})
const fileStreams = filePaths.map((p) => fs.createReadStream(p))
await openai.beta.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
files: fileStreams,
})
const assistant = await openai.beta.assistants.create({
name: 'Docs Bot',
instructions: 'Answer questions using the attached docs. Cite sources.',
model: 'gpt-4o',
tools: [{ type: 'file_search' }],
tool_resources: {
file_search: { vector_store_ids: [vectorStore.id] },
},
})
return { assistantId: assistant.id, vectorStoreId: vectorStore.id }
}
After — Responses with file_search
// src/lib/responses-rag.ts (AFTER)
import fs from 'fs'
import { openai } from './openai'
export async function createVectorStore(filePaths: string[]): Promise<string> {
const vectorStore = await openai.vectorStores.create({ name: 'Docs' })
const fileStreams = filePaths.map((p) => fs.createReadStream(p))
await openai.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, {
files: fileStreams,
})
return vectorStore.id
}
export async function askWithRag(
question: string,
vectorStoreId: string,
previousResponseId?: string
): Promise<{ text: string; responseId: string; citations: string[] }> {
const response = await openai.responses.create({
model: 'gpt-4o',
instructions: 'Answer questions using the attached docs. Cite sources.',
input: question,
previous_response_id: previousResponseId,
tools: [
{
type: 'file_search',
vector_store_ids: [vectorStoreId],
},
],
include: ['file_search_call.results'],
})
const citations: string[] = []
for (const item of response.output) {
if (item.type === 'message') {
for (const content of item.content) {
if (content.type === 'output_text' && content.annotations) {
for (const ann of content.annotations) {
if (ann.type === 'file_citation') {
citations.push(ann.filename)
}
}
}
}
}
}
return {
text: response.output_text,
responseId: response.id,
citations: [...new Set(citations)],
}
}
Two real differences worth noting:
vectorStoresmoved out ofbeta— the SDK exposes both for now but thebetanamespace will go with the Assistants API.tool_resourcesis gone — the vector store IDs go directly on thefile_searchtool config.
The include: ['file_search_call.results'] flag is optional but useful — it returns the actual chunks the model retrieved, which you'll want for debugging and citation UX. If you're building citation rendering, my post on streaming AI UX with citations covers the frontend side.
Pattern 3: Custom function calling
Your Assistants integration probably has at least one custom function. The polling-with-tool-outputs dance was the most awkward part of the old API.
Before — Assistants tool loop
// src/lib/assistants-tools.ts (BEFORE)
import { openai } from './openai'
const tools = [
{
type: 'function' as const,
function: {
name: 'lookup_order',
description: 'Look up an order by ID',
parameters: {
type: 'object',
properties: { orderId: { type: 'string' } },
required: ['orderId'],
},
},
},
]
async function lookupOrder(orderId: string) {
return { id: orderId, status: 'shipped', total: 49.99 }
}
export async function chatWithTools(
assistantId: string,
threadId: string,
userMessage: string
): Promise<string> {
await openai.beta.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
})
let run = await openai.beta.threads.runs.createAndPoll(threadId, {
assistant_id: assistantId,
})
while (run.status === 'requires_action') {
const calls = run.required_action?.submit_tool_outputs.tool_calls ?? []
const outputs = await Promise.all(
calls.map(async (call) => {
const args = JSON.parse(call.function.arguments)
const result = await lookupOrder(args.orderId)
return {
tool_call_id: call.id,
output: JSON.stringify(result),
}
})
)
run = await openai.beta.threads.runs.submitToolOutputsAndPoll(threadId, run.id, {
tool_outputs: outputs,
})
}
if (run.status !== 'completed') {
throw new Error(`Run ended with status: ${run.status}`)
}
const messages = await openai.beta.threads.messages.list(threadId, { order: 'desc', limit: 1 })
const text = messages.data[0].content.find((c) => c.type === 'text')
if (!text || text.type !== 'text') throw new Error('no text')
return text.text.value
}
After — Responses tool loop
// src/lib/responses-tools.ts (AFTER)
import { openai } from './openai'
const tools = [
{
type: 'function' as const,
name: 'lookup_order',
description: 'Look up an order by ID',
parameters: {
type: 'object',
properties: { orderId: { type: 'string' } },
required: ['orderId'],
},
},
]
async function lookupOrder(orderId: string) {
return { id: orderId, status: 'shipped', total: 49.99 }
}
export async function chatWithTools(
userMessage: string,
previousResponseId?: string
): Promise<{ text: string; responseId: string }> {
let response = await openai.responses.create({
model: 'gpt-4o',
input: userMessage,
previous_response_id: previousResponseId,
tools,
})
while (response.output.some((item) => item.type === 'function_call')) {
const toolOutputs = []
for (const item of response.output) {
if (item.type !== 'function_call') continue
const args = JSON.parse(item.arguments)
let result: unknown
if (item.name === 'lookup_order') {
result = await lookupOrder(args.orderId)
} else {
result = { error: `unknown tool: ${item.name}` }
}
toolOutputs.push({
type: 'function_call_output' as const,
call_id: item.call_id,
output: JSON.stringify(result),
})
}
response = await openai.responses.create({
model: 'gpt-4o',
previous_response_id: response.id,
input: toolOutputs,
tools,
})
}
return { text: response.output_text, responseId: response.id }
}
Three things changed:
- Tool definition is flatter —
type: 'function'and the function fields are siblings, not nested under afunctionkey. - No
submitToolOutputs— pass tool results asfunction_call_outputitems in theinputof the nextresponses.create()call, withprevious_response_idpointing to the response that contained the function calls. - The loop structure is identical, just expressed differently. You're still looping until there are no pending tool calls.
Pattern 4: Streaming
If you used runs.stream() with the Assistants API, the equivalent is responses.stream().
Before — Assistants streaming
// src/lib/assistants-stream.ts (BEFORE)
import { openai } from './openai'
export async function* streamReply(
assistantId: string,
threadId: string,
userMessage: string
): AsyncGenerator<string> {
await openai.beta.threads.messages.create(threadId, {
role: 'user',
content: userMessage,
})
const stream = openai.beta.threads.runs.stream(threadId, { assistant_id: assistantId })
for await (const event of stream) {
if (event.event === 'thread.message.delta') {
for (const block of event.data.delta.content ?? []) {
if (block.type === 'text' && block.text?.value) {
yield block.text.value
}
}
}
}
}
After — Responses streaming
// src/lib/responses-stream.ts (AFTER)
import { openai } from './openai'
export async function* streamReply(
userMessage: string,
previousResponseId?: string
): AsyncGenerator<string> {
const stream = await openai.responses.create({
model: 'gpt-4o',
input: userMessage,
previous_response_id: previousResponseId,
stream: true,
})
for await (const event of stream) {
if (event.type === 'response.output_text.delta') {
yield event.delta
}
}
}
The event types are flatter (response.output_text.delta instead of thread.message.delta with nested content blocks). If you also need to stream tool call deltas, listen for response.function_call_arguments.delta.
Migration checklist
Walk through your codebase in this order:
- [ ] Audit usage: grep for
openai.beta.assistants,openai.beta.threads,openai.beta.vectorStores,runs.createAndPoll,submitToolOutputs. Each match is a migration site. - [ ] Decide on memory strategy: server-managed (
previous_response_id) or client-managed (you build the message array yourself). Server-managed is simpler; client-managed gives you full control over context window pruning. - [ ] Replace
assistants.create()with a config object in your code (instructions string, model name, tools array). You don't need a server-side resource anymore. - [ ] Replace
threadswithprevious_response_id. Persist that ID wherever you persisted thread IDs. - [ ] Update tool definitions to the flatter shape (no nested
functionkey for custom functions). - [ ] Rewrite tool loops to pass
function_call_outputitems ininputinstead of callingsubmitToolOutputs. - [ ] Move
vectorStorescalls out ofbetaand inlinevector_store_idsinto thefile_searchtool config. - [ ] Update streaming event handlers to the new event types.
- [ ] Test conversation continuity: the first turn after a
previous_response_idlookup is where bugs hide.
Gotchas I hit
A few things that aren't in the official migration doc:
previous_response_id expires after 30 days. If your app has long-lived conversations (a saved support session a user comes back to in a month), you'll get a 404 on the next turn. Either bump retention with store: true and your own cleanup, or fall back to client-managed history with the full message array.
Tool-call IDs differ. Assistants uses tool_call_id, Responses uses call_id. Easy to miss in a search-and-replace.
output_text is a convenience, not the source of truth. It concatenates all text outputs in the response. If you have multi-turn tool calling and want to display intermediate text (rare but real), iterate response.output directly.
Error semantics changed. Assistants would set run.status = 'failed' and you'd inspect run.last_error. Responses throws an OpenAI.APIError you catch normally. Update any code that branched on run.status.
The instructions field is per-request, not persistent. With Assistants, you set instructions once at assistant-creation time. With Responses, you pass them on every call. Treat your instructions as a constant in code, not a resource.
Don't double-pay during migration. If you keep both code paths live behind a feature flag, you're paying twice for tokens during testing. Run the new path with synthetic traffic in a staging environment first.
What's next
Once your conversations work, the natural next layer is observability — knowing which prompts regress, which tools fail, and where latency lives. I wrote up adding LLM observability with Langfuse which traces every Responses API call with token usage and tool invocations. Set that up before your migration goes to production — you'll catch behavioral diffs between the Assistants and Responses paths in hours instead of weeks.