Why Most AI Agent Architectures Are Overengineered (And What to Build Instead)

Monday 30/03/2026

·13 min read

You just spent two weeks building a multi-agent system with a planner agent, an executor agent, a critic agent, and a router that coordinates between them. It mostly works - except when the planner hallucinates steps, the executor ignores the plan, and the critic disagrees with itself. You're debugging agent-to-agent communication instead of solving the actual problem.

Here's the uncomfortable truth: 90% of "AI agent" use cases don't need agents at all. They need a single LLM call with well-defined tools. The multi-agent architecture you're building isn't just overkill - it's actively making your system harder to debug, more expensive to run, and less reliable.

I've built both kinds of systems. Let me show you when simple tool-calling beats a multi-agent framework, and how to refactor one into the other.

The complexity trap

The AI agent ecosystem has a complexity problem. Frameworks like LangGraph, CrewAI, and AutoGen encourage you to think in terms of agents with roles - a researcher agent, a writer agent, a reviewer agent. It feels intuitive because it maps to how human teams work.

But LLMs aren't humans. They don't benefit from specialization the way a frontend developer benefits from not having to write SQL. Every "agent" in your system is the same model with a different system prompt. When you split work across agents, you're paying for:

Extra LLM calls - each agent handoff is at minimum one new API call
Lost context - you have to serialize and pass context between agents, and something always gets lost
Compounding errors - each agent can hallucinate independently, and errors cascade
Debugging hell - when the output is wrong, which agent broke? Was it the plan, the execution, or the handoff?

Compare this to a single LLM call with tools. The model sees the full context, calls the tools it needs, and returns a result. One call to debug. One place where context lives. One set of token costs.

The simple tool-calling loop

Here's the pattern that replaces most multi-agent architectures. It's a single loop where the LLM decides which tools to call and when to stop:

// src/lib/tool-loop.ts
import Anthropic from "@anthropic-ai/sdk";

interface Tool {
  name: string;
  description: string;
  input_schema: Record<string, unknown>;
  execute: (input: Record<string, unknown>) => Promise<string>;
}

interface ToolLoopOptions {
  client: Anthropic;
  model: string;
  systemPrompt: string;
  tools: Tool[];
  maxIterations?: number;
}

export async function runToolLoop(
  userMessage: string,
  options: ToolLoopOptions
): Promise<string> {
  const { client, model, systemPrompt, tools, maxIterations = 10 } = options;

  const anthropicTools = tools.map(({ execute, ...toolDef }) => toolDef);
  const toolMap = new Map(tools.map((t) => [t.name, t.execute]));

  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];

  for (let i = 0; i < maxIterations; i++) {
    const response = await client.messages.create({
      model,
      max_tokens: 4096,
      system: systemPrompt,
      tools: anthropicTools,
      messages,
    });

    // If the model didn't use any tools, we're done
    if (response.stop_reason === "end_turn") {
      const textBlock = response.content.find((b) => b.type === "text");
      return textBlock?.text ?? "";
    }

    // Process tool calls
    const toolUseBlocks = response.content.filter(
      (b) => b.type === "tool_use"
    );

    if (toolUseBlocks.length === 0) {
      const textBlock = response.content.find((b) => b.type === "text");
      return textBlock?.text ?? "";
    }

    // Add the assistant's response (with tool_use blocks) to messages
    messages.push({ role: "assistant", content: response.content });

    // Execute each tool and collect results
    const toolResults: Anthropic.ToolResultBlockParam[] = [];

    for (const toolUse of toolUseBlocks) {
      if (toolUse.type !== "tool_use") continue;

      const executor = toolMap.get(toolUse.name);
      if (!executor) {
        toolResults.push({
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: `Error: unknown tool "${toolUse.name}"`,
          is_error: true,
        });
        continue;
      }

      try {
        const result = await executor(
          toolUse.input as Record<string, unknown>
        );
        toolResults.push({
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: result,
        });
      } catch (err) {
        toolResults.push({
          type: "tool_result",
          tool_use_id: toolUse.id,
          content: `Error: ${err instanceof Error ? err.message : String(err)}`,
          is_error: true,
        });
      }
    }

    messages.push({ role: "user", content: toolResults });
  }

  throw new Error(
    `Tool loop exceeded ${maxIterations} iterations without completing`
  );
}

That's it. No orchestrator, no planner, no critic. The LLM is the planner - it decides which tools to call based on the task. Let's put it to work.

Real example: a "multi-agent" support system in one loop

A common multi-agent example is customer support: a triage agent routes to a billing agent, a technical agent, or a returns agent. Each "agent" has specialized knowledge and tools.

Here's how to build the same thing without agents:

// src/support-bot.ts
import Anthropic from "@anthropic-ai/sdk";
import { runToolLoop } from "./lib/tool-loop";

const client = new Anthropic();

const tools = [
  {
    name: "lookup_order",
    description:
      "Look up a customer order by order ID or email. Returns order details including status, items, and shipping info.",
    input_schema: {
      type: "object" as const,
      properties: {
        order_id: { type: "string", description: "Order ID (e.g., ORD-12345)" },
        email: { type: "string", description: "Customer email address" },
      },
      required: [],
    },
    execute: async (input: Record<string, unknown>) => {
      // In production, query your database
      return JSON.stringify({
        order_id: input.order_id ?? "ORD-12345",
        status: "shipped",
        items: [{ name: "Wireless Keyboard", price: 79.99, qty: 1 }],
        tracking: "1Z999AA10123456784",
        shipped_at: "2026-03-28",
      });
    },
  },
  {
    name: "check_billing",
    description:
      "Check billing status, recent charges, and payment methods for a customer.",
    input_schema: {
      type: "object" as const,
      properties: {
        email: { type: "string", description: "Customer email" },
      },
      required: ["email"],
    },
    execute: async (input: Record<string, unknown>) => {
      return JSON.stringify({
        email: input.email,
        plan: "Pro",
        last_charge: { amount: 29.99, date: "2026-03-01", status: "paid" },
        payment_method: "Visa ending 4242",
        next_billing_date: "2026-04-01",
      });
    },
  },
  {
    name: "initiate_return",
    description:
      "Start a return process for an order. Returns a return label URL and instructions.",
    input_schema: {
      type: "object" as const,
      properties: {
        order_id: { type: "string", description: "Order ID to return" },
        reason: {
          type: "string",
          description: "Return reason",
          enum: ["defective", "wrong_item", "changed_mind", "other"],
        },
      },
      required: ["order_id", "reason"],
    },
    execute: async (input: Record<string, unknown>) => {
      return JSON.stringify({
        return_id: "RET-67890",
        label_url: "https://shipping.example.com/label/RET-67890",
        instructions:
          "Print the label and drop off at any UPS location within 14 days.",
        refund_estimate: "3-5 business days after we receive the item",
      });
    },
  },
  {
    name: "search_knowledge_base",
    description:
      "Search the help center knowledge base for articles matching a query.",
    input_schema: {
      type: "object" as const,
      properties: {
        query: { type: "string", description: "Search query" },
      },
      required: ["query"],
    },
    execute: async (input: Record<string, unknown>) => {
      return JSON.stringify({
        results: [
          {
            title: "How to reset your password",
            url: "/help/reset-password",
            snippet: "Go to Settings > Security > Reset Password...",
          },
          {
            title: "Shipping times and tracking",
            url: "/help/shipping",
            snippet: "Standard shipping takes 3-5 business days...",
          },
        ],
      });
    },
  },
  {
    name: "escalate_to_human",
    description:
      "Escalate the conversation to a human support agent when the issue can't be resolved automatically.",
    input_schema: {
      type: "object" as const,
      properties: {
        summary: {
          type: "string",
          description: "Brief summary of the issue for the human agent",
        },
        priority: {
          type: "string",
          enum: ["low", "medium", "high"],
          description: "Priority level",
        },
      },
      required: ["summary", "priority"],
    },
    execute: async (input: Record<string, unknown>) => {
      return JSON.stringify({
        ticket_id: "TKT-11111",
        estimated_wait: "Under 5 minutes",
        message: "A human agent will be with you shortly.",
      });
    },
  },
];

const systemPrompt = `You are a customer support assistant for an e-commerce store.
You handle billing questions, order lookups, returns, and technical issues.

Guidelines:
- Always look up relevant data before answering (don't guess order statuses or billing info)
- If you can resolve the issue with available tools, do so
- If the issue requires human judgment (e.g., exceptions to policy, angry customer), escalate
- Be friendly but concise - customers want answers, not essays
- Never make up information that isn't returned by a tool`;

async function main() {
  const answer = await runToolLoop(
    "Hi, I ordered a keyboard last week (order ORD-12345) but I want to return it - I changed my mind.",
    { client, model: "claude-sonnet-4-6-20250514", systemPrompt, tools }
  );

  console.log(answer);
}

main();

The model handles triage implicitly. It reads the message, decides it needs to look up the order, then initiates a return - calling exactly the tools it needs. No router agent, no handoff protocol, no inter-agent message format. And if the customer follows up with a billing question in the same conversation, the model just calls the billing tool next. No "transferring you to the billing department."

When you actually need multiple agents

I'm not saying multi-agent architectures are always wrong. There are legitimate cases:

Different models for different tasks. If your triage step needs a cheap, fast model (Haiku) but your technical analysis needs a powerful one (Opus), separate agents make sense - you're optimizing cost per step.

Parallel execution. If you need to search three different data sources simultaneously and synthesize the results, launching parallel tool calls or parallel agents can reduce latency. Though note that Claude already supports parallel tool use within a single call.

Genuinely long-running workflows. If your task spans hours or days - monitoring a deployment, waiting for human approval at multiple stages - a single LLM context window won't work. You need persistent state and resumable agents. I covered this pattern in my post about building a human-in-the-loop agent with Vercel AI SDK.

Adversarial validation. If you need one model to check another's work (like a code review agent checking a code generation agent), the separation is the point. Using the same model to both generate and review creates blind spots.

Here's a quick decision framework:

// src/lib/should-use-agents.ts
interface UseCaseProfile {
  needsDifferentModels: boolean;
  hasParallelIndependentTasks: boolean;
  spansMultipleSessions: boolean;
  requiresAdversarialValidation: boolean;
  toolCount: number;
}

function shouldUseMultipleAgents(profile: UseCaseProfile): {
  recommendation: "single-loop" | "multi-agent";
  reason: string;
} {
  if (profile.requiresAdversarialValidation) {
    return {
      recommendation: "multi-agent",
      reason: "Adversarial validation needs separate model contexts",
    };
  }

  if (profile.spansMultipleSessions) {
    return {
      recommendation: "multi-agent",
      reason: "Long-running workflows need persistent state per agent",
    };
  }

  if (profile.needsDifferentModels && profile.hasParallelIndependentTasks) {
    return {
      recommendation: "multi-agent",
      reason: "Different models + parallel execution justifies the complexity",
    };
  }

  if (profile.toolCount > 20) {
    return {
      recommendation: "multi-agent",
      reason:
        "Too many tools degrade model performance - split into focused agents",
    };
  }

  return {
    recommendation: "single-loop",
    reason:
      "A single tool-calling loop handles this with less complexity and better context",
  };
}

The threshold of ~20 tools is based on practical experience. Models start to get confused or ignore tools when the tool list gets too long. If you have 8 tools, you don't need a multi-agent framework. If you have 40, consider splitting them across focused agents.

Refactoring a multi-agent system: before and after

Here's what a typical overengineered setup looks like and what it becomes after simplification.

Before - three agents with orchestration:

// src/overengineered.ts (DON'T do this for simple use cases)
class ResearchAgent {
  async run(query: string): Promise<string> {
    // LLM call #1: "You are a research agent..."
    // Searches the web, returns findings
    return findings;
  }
}

class WriterAgent {
  async run(research: string, topic: string): Promise<string> {
    // LLM call #2: "You are a writing agent..."
    // Takes research, writes a draft
    return draft;
  }
}

class EditorAgent {
  async run(draft: string): Promise<string> {
    // LLM call #3: "You are an editing agent..."
    // Reviews and polishes the draft
    return finalVersion;
  }
}

class Orchestrator {
  async run(topic: string): Promise<string> {
    const research = await new ResearchAgent().run(topic);
    const draft = await new WriterAgent().run(research, topic);
    const final = await new EditorAgent().run(draft);
    return final;
    // 3 LLM calls, context lost between each
  }
}

After - one loop with tools:

// src/simplified.ts
const contentTools = [
  {
    name: "web_search",
    description: "Search the web for information on a topic",
    input_schema: {
      type: "object" as const,
      properties: {
        query: { type: "string", description: "Search query" },
      },
      required: ["query"],
    },
    execute: async (input: Record<string, unknown>) => {
      // Call your search API (Serper, Tavily, etc.)
      return JSON.stringify({ results: ["result1", "result2"] });
    },
  },
  {
    name: "save_draft",
    description: "Save a draft article to review later",
    input_schema: {
      type: "object" as const,
      properties: {
        title: { type: "string" },
        content: { type: "string" },
      },
      required: ["title", "content"],
    },
    execute: async (input: Record<string, unknown>) => {
      // Save to your CMS or file system
      return `Draft saved: ${input.title}`;
    },
  },
];

const answer = await runToolLoop(
  "Research the latest developments in WebGPU and write a short article about it.",
  {
    client,
    model: "claude-sonnet-4-6-20250514",
    systemPrompt: `You are a technical content writer. When given a topic:
1. Search for current information using the web_search tool
2. Write a well-researched article based on what you find
3. Save the draft using the save_draft tool
4. Return a summary of what you wrote`,
    tools: contentTools,
  }
);

Same result, one-third the LLM calls, full context preserved throughout. The model researches, writes, and self-edits in a single context window. If the research reveals something that changes the angle, the model adapts immediately - no need to re-plan or re-route.

The cost argument

Let's do some quick math. Say you're processing 1,000 customer support queries per day with Claude Sonnet.

Multi-agent approach (triage + specialist + summary = 3 calls per query):

Average input: 2,000 tokens per call (prompt + context passing)
Average output: 500 tokens per call
3 calls × 1,000 queries = 3,000 API calls/day
Tokens: 6M input + 1.5M output per day

Single-loop approach (1-2 calls per query, tool results inline):

Average input: 2,500 tokens (slightly more context in one call)
Average output: 500 tokens
1.5 calls × 1,000 queries = 1,500 API calls/day
Tokens: 3.75M input + 750K output per day

The single-loop approach uses roughly 40-50% fewer tokens because you're not re-sending system prompts and context for each agent. You're also not serializing and deserializing agent state, which always bloats the token count.

For production token cost strategies, check out my post on the real cost of running an AI feature and caching AI responses.

A gotcha: tool descriptions matter more than you think

When you move from multi-agent to single-loop, the quality of your tool descriptions becomes critical. In a multi-agent system, the router prompt tells the model which "agent" to call. In a single-loop system, the model picks tools based on their descriptions.

Bad tool description:

"Handle billing"

Good tool description:

"Check billing status, recent charges, refund eligibility, and payment methods for a customer. Use this when the customer asks about charges, invoices, payments, or subscription billing."

The model needs enough context in the description to know when to use the tool, not just what it does. Spend time on these - they're your new routing logic.

What's next

If you want to see a case where multiple agents do make sense, my next post covers Claude Agent SDK vs OpenAI Agents SDK - building the same autonomous tool with both frameworks. That's a case where the SDK abstractions actually earn their complexity.

For the simple tool-calling loop pattern, grab the runToolLoop function from this post and start replacing your agent graphs. You might be surprised how far a single loop takes you.

Why Most AI Agent Architectures Are Overengineered (And What to Build Instead)

The complexity trap

The simple tool-calling loop

Real example: a "multi-agent" support system in one loop

When you actually need multiple agents

Refactoring a multi-agent system: before and after

The cost argument

A gotcha: tool descriptions matter more than you think

What's next

Vadim Alakhverdov

Related Posts

Run Real AI Features in the Browser with Transformers.js v4 and WebGPU

Edge RAG: Build a Sub-100ms Retrieval App with Cloudflare Workers AI and Vectorize

Give Your AI Agent Persistent Memory with Anthropic Managed Agents