Artificial IntelligenceAnthropicClaude AI

Tool Design & MCP Integration: Claude Architect Exam Domain 2

Tool descriptions that drive reliable selection, structured MCP errors, tool distribution, and server config — Domain 2 of the Claude Architect exam.

TT
Emily Ross
9 min read
Tool Design & MCP Integration: Claude Architect Exam Domain 2

Lesson 3 of the Claude Certified Architect – Foundations course. Domain 2 is 18% of the exam (~11 questions) and is where architecture meets interface design: how you describe, scope, and error-handle tools determines whether an agent behaves reliably. If Lesson 2 was about who does the work, this lesson is about what they work with.

Previous: Lesson 2 — Multi-Agent Orchestration

Back to Course Overview


Tool Descriptions Are the Selection Mechanism

The foundational fact of Domain 2: tool descriptions are the primary mechanism LLMs use to select tools. Minimal descriptions ("Retrieves customer information") produce unreliable selection among similar tools — and the exam's favorite first-step fix is almost never few-shot examples or a routing layer, it's expanding the descriptions.

A production-grade description includes:

  • Purpose — what the tool does and doesn't do
  • Input formats it accepts (ID patterns, date formats)
  • Example queries it should handle
  • Edge cases and boundaries — when to use it versus similar alternatives

Common misrouting causes to recognize in scenarios:

  • Near-identical descriptions on overlapping tools (analyze_content vs analyze_document) → rename and differentiate (e.g., analyze_contentextract_web_results with a web-specific description)
  • Overly generic tools → split into purpose-specific tools with defined contracts (analyze_documentextract_data_points, summarize_content, verify_claim_against_source)
  • Keyword-sensitive system prompt wording that creates unintended tool associations and overrides good descriptions — review the system prompt too, not just the tools

Exam tip: When a question says both tools have "minimal descriptions" and asks for the most effective first step, the answer is enriching descriptions. Routing layers and tool consolidation are real techniques but over-engineered as first steps.


Structured Error Responses

MCP tools signal failure with the isError flag — but the flag alone isn't enough. A generic "Operation failed" gives the agent nothing to base a recovery decision on. Structured error responses include:

json
{
  "isError": true,
  "errorCategory": "transient",     // transient | validation | business | permission
  "isRetryable": true,
  "message": "Order service timed out after 5s; retry is likely to succeed."
}

The taxonomy the exam expects:

CategoryExampleRetryable?
TransientTimeout, service unavailableYes
ValidationMalformed order IDNo — fix the input
BusinessRefund exceeds policy limitNo — explain to user / escalate
PermissionAgent lacks access to the resourceNo — different path needed

Two subtleties worth points:

  • Business errors deserve customer-friendly explanations plus retriable: false, so the agent communicates the policy instead of retrying into the same wall.
  • Distinguish access failures from valid empty results. A search that returned zero matches succeeded. Marking a timeout as an empty-but-successful result silently corrupts downstream reasoning — a classic anti-pattern (more in Lesson 6).

In multi-agent systems, subagents should attempt local recovery for transient failures and propagate to the coordinator only what they can't resolve — including partial results and what was attempted.


Tool Distribution: Fewer, Scoped, Purposeful

Giving one agent 18 tools instead of 4–5 measurably degrades selection reliability — more options, more decision complexity, more misuse. Domain 2 principles:

  • Scope tool access to role. A synthesis agent with web-search tools will eventually attempt web searches. Restrict each subagent to the tools its specialization needs.
  • Constrain generic tools. Replace fetch_url with load_document that validates document URLs — narrower surface, fewer failure modes.
  • Allow scoped cross-role tools for high-frequency needs. If the synthesis agent constantly needs simple fact checks, give it a narrow verify_fact tool and keep routing complex verifications through the coordinator. This is the least-privilege pattern the exam rewards.

tool_choice Configuration

SettingBehaviorUse when
"auto"Model may call a tool or answer in textDefault conversational agents
"any"Model must call some toolGuaranteeing structured output when multiple schemas exist
{"type": "tool", "name": "extract_metadata"}Model must call that toolForcing a specific step to run first, then continuing in follow-up turns

MCP Server Configuration in Claude Code

Scoping rules the exam tests directly:

  • Project scope — .mcp.json (committed to the repo): shared team tooling. Supports environment variable expansion${GITHUB_TOKEN} — so credentials never land in version control.
  • User scope — ~/.claude.json: personal or experimental servers, invisible to teammates.
  • Tools from all configured servers are discovered at connection time and available simultaneously.

Adjacent judgment calls:

  • Prefer existing community MCP servers for standard integrations (Jira, GitHub); build custom servers only for team-specific workflows.
  • If the agent keeps using built-in tools (like Grep) instead of your more capable MCP tool, the fix is again richer MCP tool descriptions explaining capabilities and outputs.
  • MCP resources expose content catalogs — issue summaries, documentation hierarchies, database schemas — so agents can see what data exists without burning turns on exploratory tool calls. Resources for content, tools for actions.

Built-In Tools: Selection Criteria

The exam expects instant recall of the built-in toolset:

ToolForExample
GrepContent search inside filesFind all callers of a function, locate an error message
GlobFile path pattern matching**/*.test.tsx — find files by name/extension
Read / WriteFull-file operationsLoad a file; rewrite it entirely
EditTargeted modification via unique text matchChange one function body
BashShell executionRun tests, git operations

Two tested patterns: when Edit fails on non-unique anchor text, fall back to Read + Write. And build codebase understanding incrementally — Grep for entry points, Read to follow imports and trace flows — rather than reading everything upfront.


Hands-On Exercise

  1. Write two deliberately-confusable tools (get_customer, lookup_order) with one-line descriptions; measure misrouting on 10 ambiguous requests.
  2. Expand both descriptions with input formats, examples, and boundaries; re-measure.
  3. Add structured error responses covering all four categories; verify the agent retries transient errors but explains business errors.
  4. Configure one shared server in .mcp.json with ${API_TOKEN} expansion and one personal server in ~/.claude.json; confirm both toolsets are live simultaneously.

The Four Error Categories, Concretely

Abstract taxonomies are hard to hold under exam pressure; concrete payloads are not. Here is one well-formed error response per category for a process_refund tool:

json
// Transient — retry is appropriate
{ "isError": true, "errorCategory": "transient", "isRetryable": true,
  "message": "Payment service timed out after 5s. Retrying in a few seconds is likely to succeed." }

// Validation — fix the input, don't retry as-is
{ "isError": true, "errorCategory": "validation", "isRetryable": false,
  "message": "order_id 'ORD-99x' is malformed. Expected format: ORD- followed by 8 digits." }

// Business — explain the policy, offer the alternative path
{ "isError": true, "errorCategory": "business", "isRetryable": false,
  "message": "Refund of $612 exceeds the $500 automatic limit. Explain the limit to the customer and use escalate_to_human with a case summary." }

// Permission — a different capability is required
{ "isError": true, "errorCategory": "permission", "isRetryable": false,
  "message": "This agent lacks refund authority for enterprise accounts. Route to the enterprise support queue." }

Notice what each message does beyond flagging failure: it tells the agent what to do next. That is the design bar the exam holds tool errors to — and it is why a uniform "Operation failed" is always a wrong answer.


Worked Exam Question

A search tool queries your product catalog. For the query "waterproof hiking boots size 15" it finds nothing, and returns { "isError": true, "message": "search failed" }. The agent apologizes to the customer for a technical problem. What is wrong?

  • A. A query that matches zero products is a successful search with an empty result — it should return an empty list with isError false, so the agent can tell the customer no matching products exist.
  • B. The tool should retry the search with broader terms before returning anything.
  • C. The error message should include a stack trace so the agent can diagnose the failure.
  • D. The agent's prompt should instruct it to reword technical errors as stock-availability messages.

Answer: A. "No matches" is information, not failure. Marking it as an error makes the agent misreport reality — the mirror image of the anti-pattern where real failures are disguised as empty successes. Option B hides a product decision inside a tool, C leaks internals without helping the model act, and D papers over a data-contract bug with prompt instructions.


Key Takeaways for the Exam

  • Descriptions drive selection; enrich them before adding routing infrastructure.
  • Errors need category + retryability + human-readable message; empty results are not errors.
  • Fewer, role-scoped tools beat many generic ones; least-privilege cross-role tools for hot paths.
  • .mcp.json = shared project scope with env-var expansion; ~/.claude.json = personal.
  • Grep = content, Glob = paths, Edit → Read+Write fallback on non-unique matches.

Next: Lesson 4 — Claude Code Configuration & Workflows


Frequently Asked Questions

Why do tool descriptions matter so much for agent reliability?

Because the description is the only information the model has when choosing among tools. Humans can ask clarifying questions; the model just matches the request against the descriptions in front of it. Two tools described "Retrieves customer information" and "Retrieves order details" look nearly identical to a model handling "check my order for me" — which is why enriching descriptions with input formats, example queries, and explicit boundaries is the highest-leverage reliability fix, and why it beats few-shot examples or routing layers as a first step.

What is the difference between .mcp.json and ~/.claude.json?

Project-scoped .mcp.json is committed to the repository, so every teammate gets the same MCP servers on clone — with environment variable expansion like $ keeping actual credentials out of version control. User-scoped ~/.claude.json is personal: experimental servers live there without affecting anyone else. Tools from every configured server in both scopes are discovered at connection time and available simultaneously.

When should I use Grep versus Glob?

Grep searches inside files — find every caller of a function, locate an error message, trace an import. Glob matches file paths — every **/*.test.tsx, everything under src/api/. The exam also tests the Edit fallback: when Edit fails because its anchor text appears multiple times in a file, switch to Read followed by Write rather than fighting for a unique anchor.