What is the difference between a single-agent and multi-agent architecture?

A single-agent system has one model reasoning and calling tools in a loop. A multi-agent system uses multiple specialised agents — an orchestrator delegates tasks to subagents, each focused on a domain (search, code execution, data retrieval). Multi-agent systems improve parallelism and reduce context length per agent, but add coordination complexity.

How do I prevent runaway agents from burning through my API budget?

Set a hard maximum iteration count on every agent loop — never allow unbounded recursion. Implement a token budget tracker across all tool calls in a session. Use structured tool output contracts so agents cannot call unexpected tools. Log every action with cost metadata so you can audit and cap per-session spend.

Agentic AI Architecture: Memory, Tools, Control Loops & Multi-Agent Orchestration

Q: When should I use an agent vs a fixed workflow?

Use a fixed workflow when the steps are known and the sequence is predictable. Use an agent when the steps are unknown upfront and the model needs to decide its own strategy based on intermediate results. Agents are powerful but expensive and hard to debug — always prefer a simpler fixed workflow unless genuine autonomy is required.

Q: How do I evaluate if my agent is actually working correctly?

Agentic evaluation requires trajectory evaluation — scoring the sequence of tool calls and intermediate reasoning steps, not just the final output. Define expected tool call sequences for a test set of inputs, then compare actual trajectories. Tools like LangSmith or custom logging pipelines can capture traces for offline evaluation.

← Back to Software Architecture Hub

Agentic AI Architecture: Memory, Tools, Control Loops & Multi-Agent Orchestration

What Makes a System "Agentic"?
The ReAct Control Loop: Think -> Act -> Observe
The Four Memory Layers
Tool Calling: Architecture and Security
The Model Context Protocol (MCP)
Building Agentic Workflows with LangGraph
Multi-Agent Patterns: Supervisor and Swarm
Failure Modes and Reliability Engineering
Evaluating Agentic Systems
Frequently Asked Questions
Key Takeaway

What Makes a System "Agentic"?

The word "agentic" describes a spectrum of autonomy:

Level	Description	Example
0 — Static	Single prompt -> single response	ChatGPT one-shot answer
1 — Tool Use	LLM calls tools, gets results, responds	LLM calls a weather API
2 — RAG	LLM retrieves context before responding	LLM searches documents
3 — Multi-Step	LLM plans and executes multiple steps	LLM researches, drafts, reviews
4 — Autonomous	LLM decides its own tool sequence, recovers from errors	AI coding agent that writes, tests, debugs
5 — Multi-Agent	Multiple specialised LLMs collaborate with handoffs	Researcher + Coder + Reviewer agents

Level 3+ requires deliberate architectural design — naive implementations at this level hallucinate tool calls, loop infinitely, and produce unauditable results.

The ReAct Control Loop: Think -> Act -> Observe

ReAct (Reasoning + Acting) is the foundational pattern for agentic control:

python

from anthropic import Anthropic

client = Anthropic()

def run_agent(task: str, tools: list[dict], max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": task}]
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        
        # THINK: LLM reasons about what to do next
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )
        
        # Check stop condition:
        if response.stop_reason == "end_turn":
            # Extract text response — agent is done
            return next(b.text for b in response.content if hasattr(b, 'text'))
        
        # ACT: Extract tool calls from response
        tool_uses = [b for b in response.content if b.type == "tool_use"]
        
        if not tool_uses:
            break  # No tools called, no end — unexpected state
        
        # Append assistant's reasoning + tool calls to conversation:
        messages.append({"role": "assistant", "content": response.content})
        
        # OBSERVE: Execute each tool and collect results
        tool_results = []
        for tool_use in tool_uses:
            result = execute_tool(tool_use.name, tool_use.input)  # Your tool router
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": str(result),
            })
        
        # Feed observations back to LLM:
        messages.append({"role": "user", "content": tool_results})
    
    # Max iterations reached — return partial result with warning
    return f"[Agent reached max_iterations={max_iterations}] Last state: {messages[-1]}"

The max_iterations guard is non-negotiable — without it, infinite loops burn tokens and money.

The Four Memory Layers

Effective agents require different memory systems for different timescales:

Practical implementation:

python

class AgentMemory:
    def __init__(self, vector_store, session_store):
        self.vector_store = vector_store   # Long-term semantic memory (Pinecone/Chroma)
        self.session_store = session_store  # Episodic memory (Redis/database)
        self.working_memory: list = []     # In-context window (messages list)
    
    def remember(self, content: str, metadata: dict):
        """Store to long-term semantic memory."""
        embedding = embed(content)
        self.vector_store.upsert(embedding, content, metadata)
    
    def recall(self, query: str, k: int = 5) -> list[str]:
        """Retrieve relevant long-term memories."""
        embedding = embed(query)
        results = self.vector_store.query(embedding, k=k)
        return [r.content for r in results]
    
    def inject_relevant_context(self, task: str) -> str:
        """Build context-enriched system prompt from long-term memory."""
        memories = self.recall(task)
        if memories:
            return f"Relevant past context:\n" + "\n".join(f"- {m}" for m in memories)
        return ""

Tool Calling: Architecture and Security

Tools are the agent's interface to the real world. Every tool must be:

Defined with a precise schema (the LLM must understand inputs/outputs)
Idempotent where possible (safe to retry on timeout)
Sandboxed (execution cannot escape its security boundary)

python

# Tool schema (sent to the LLM):
tools = [
    {
        "name": "run_python",
        "description": "Execute Python code and return stdout/stderr. Use for calculations, data processing.",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute. No imports of os, subprocess, sys."
                }
            },
            "required": ["code"]
        }
    },
    {
        "name": "web_search",
        "description": "Search the web for current information. Returns top 5 result snippets.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "max_results": {"type": "integer", "default": 5}
            },
            "required": ["query"]
        }
    }
]

# Sandboxed code execution (using E2B or Firecracker VMs):
import e2b

def execute_tool(name: str, inputs: dict) -> str:
    if name == "run_python":
        with e2b.Sandbox() as sandbox:
            # Runs in isolated microVM — cannot access host filesystem
            result = sandbox.process.start_and_wait(
                f"python3 -c '{inputs['code']}'"
            )
            return result.stdout or result.stderr
    elif name == "web_search":
        return search_api.query(inputs["query"], inputs.get("max_results", 5))
    raise ValueError(f"Unknown tool: {name}")

The Model Context Protocol (MCP)

MCP (Anthropic, 2024) is an open standard for connecting LLM agents to tools, data sources, and services through a unified protocol:

text

Without MCP:                        With MCP:
Each agent integrates tools         Agent connects to MCP servers
individually (bespoke code):        (standardised protocol):

Agent -> custom Slack code           Agent -> MCP Client -> MCP Server (Slack)
Agent -> custom DB code              Agent -> MCP Client -> MCP Server (PostgreSQL)
Agent -> custom Calendar code        Agent -> MCP Client -> MCP Server (Google Calendar)

Tools are not reusable across        Any MCP-compatible agent can use
agents or frameworks                any MCP server — plug and play

In 2026, major IDEs (VS Code, JetBrains), cloud providers, and SaaS tools publish MCP servers, creating an ecosystem of standardised agent tools.

Multi-Agent Patterns: Supervisor and Swarm

Supervisor Pattern: A coordinator agent routes tasks to specialised subagents:

python

# Supervisor decides which specialist to call:
AGENTS = {
    "researcher":  ResearchAgent(),   # Web search + summarisation
    "coder":       CoderAgent(),      # Code generation + execution
    "writer":      WriterAgent(),     # Long-form content generation
    "reviewer":    ReviewerAgent(),   # Quality checking + critique
}

def supervisor(task: str) -> str:
    # Supervisor LLM decides the workflow:
    plan = supervisor_llm.plan(task)  
    # e.g.: ["researcher", "coder", "reviewer"]
    
    context = task
    for agent_name in plan:
        agent = AGENTS[agent_name]
        result = agent.run(context)
        context = f"{context}\n\nAgent ({agent_name}) output:\n{result}"
    
    return context

Swarm Pattern: Agents autonomously hand off to each other based on context — no central coordinator:

python

# Each agent decides if it should transfer to another agent:
def research_agent(messages, context):
    response = llm.call(messages, tools=[web_search, transfer_to_coder])
    if response.tool == "transfer_to_coder":
        return coder_agent(messages + [response], context)
    return response.content

Failure Modes and Reliability Engineering

Failure Mode	Cause	Mitigation
Infinite loop	Agent retries failing tool forever	`max_iterations` limit + exponential backoff
Hallucinated tool calls	LLM invents tool names/parameters	Strict tool schema validation before execution
Context window overflow	Long tasks fill the context window	Episodic memory summarisation, sliding window
Goal drift	Agent forgets original task after many steps	Inject original goal into every iteration prompt
Irreversible actions	Agent deletes files, sends emails	Confirmation step for destructive tools
Token cost explosion	Complex tasks with many tool calls	Budget limits (`max_tokens_per_task`), cost alerts

Frequently Asked Questions

When should I use an agent vs a fixed workflow? Use a fixed workflow (LangGraph conditional edges, prompt chaining) when the steps are known and the sequence is predictable. Use an agent when the steps are unknown upfront and the LLM needs to decide its own strategy based on intermediate results. Agents are powerful but expensive and hard to debug — always prefer the simpler fixed workflow unless genuine autonomy is required.

How do I evaluate if my agent is actually working correctly? Agentic evaluation requires trajectory evaluation — not just the final answer, but whether the agent took reasonable steps to get there. Tools like LangSmith, Braintrust, or custom evaluation harnesses let you record agent traces (all tool calls, reasoning steps, observations) and score them. Minimum viable evaluation: a test suite of representative tasks with expected outcomes, run after every code change.

Key Takeaway

Agentic AI architecture is where software engineering and AI research converge. The ReAct loop, multi-layer memory, sandboxed tool execution, and multi-agent orchestration are not optional refinements — they are load-bearing architectural components that determine whether your agent is reliable or a liability. As LLMs become more capable in 2026, the bottleneck shifts from model intelligence to system design: the quality of your tool schemas, your memory retrieval strategy, your failure handling, and your evaluation framework are what separate production-grade agents from impressive demos.

Structured learning: For a hands-on implementation of the agent patterns covered here, see LangChain Agents & Tools(Coming soon) — the LLM Engineering course lesson that builds a working multi-tool agent from scratch — and Building a RAG Pipeline(Coming soon) for the retrieval architecture layer.

Part of the Software Architecture Hub — comprehensive guides from architectural foundations to advanced distributed systems patterns.

Agentic AI Architecture: Memory, Tools, Control Loops & Multi-Agent Orchestration

Agentic AI Architecture: Memory, Tools, Control Loops & Multi-Agent Orchestration

Table of Contents

What Makes a System "Agentic"?

The ReAct Control Loop: Think -> Act -> Observe

The Four Memory Layers

Tool Calling: Architecture and Security

The Model Context Protocol (MCP)

Multi-Agent Patterns: Supervisor and Swarm

Failure Modes and Reliability Engineering

Frequently Asked Questions

Key Takeaway