Tool Design & MCP Integration: Claude Architect Exam Domain 2

Q: Why do tool descriptions matter so much for agent reliability?

Tool descriptions are the primary mechanism LLMs use to select tools. Minimal or overlapping descriptions cause misrouting between similar tools. A production description covers purpose, input formats, example queries, edge cases, and when to use the tool versus alternatives.

Q: What is the difference between .mcp.json and ~/.claude.json?

.mcp.json is project-scoped and committed to the repository for shared team tooling, with environment variable expansion (like ${GITHUB_TOKEN}) keeping credentials out of version control. ~/.claude.json is user-scoped for personal or experimental servers.

Q: When should I use Grep versus Glob?

Grep searches file contents - finding function callers, error messages, or imports. Glob matches file paths by pattern, such as **/*.test.tsx. If Edit fails on non-unique anchor text, fall back to Read plus Write.

← Back to Claude Certified Architect Course

Lesson 3 of the Claude Certified Architect – Foundations course. Domain 2 is 18% of the exam (~11 questions) and is where architecture meets interface design: how you describe, scope, and error-handle tools determines whether an agent behaves reliably. If Lesson 2 was about who does the work, this lesson is about what they work with.

Previous: Lesson 2 — Multi-Agent Orchestration

Back to Course Overview

Tool Descriptions Are the Selection Mechanism

The foundational fact of Domain 2: tool descriptions are the primary mechanism LLMs use to select tools. Minimal descriptions ("Retrieves customer information") produce unreliable selection among similar tools — and the exam's favorite first-step fix is almost never few-shot examples or a routing layer, it's expanding the descriptions.

A production-grade description includes:

Purpose — what the tool does and doesn't do
Input formats it accepts (ID patterns, date formats)
Example queries it should handle
Edge cases and boundaries — when to use it versus similar alternatives

Common misrouting causes to recognize in scenarios:

Near-identical descriptions on overlapping tools (analyze_content vs analyze_document) → rename and differentiate (e.g., analyze_content → extract_web_results with a web-specific description)
Overly generic tools → split into purpose-specific tools with defined contracts (analyze_document → extract_data_points, summarize_content, verify_claim_against_source)
Keyword-sensitive system prompt wording that creates unintended tool associations and overrides good descriptions — review the system prompt too, not just the tools

Exam tip: When a question says both tools have "minimal descriptions" and asks for the most effective first step, the answer is enriching descriptions. Routing layers and tool consolidation are real techniques but over-engineered as first steps.

Structured Error Responses

MCP tools signal failure with the isError flag — but the flag alone isn't enough. A generic "Operation failed" gives the agent nothing to base a recovery decision on. Structured error responses include:

json

{
  "isError": true,
  "errorCategory": "transient",     // transient | validation | business | permission
  "isRetryable": true,
  "message": "Order service timed out after 5s; retry is likely to succeed."
}

The taxonomy the exam expects:

Category	Example	Retryable?
Transient	Timeout, service unavailable	Yes
Validation	Malformed order ID	No — fix the input
Business	Refund exceeds policy limit	No — explain to user / escalate
Permission	Agent lacks access to the resource	No — different path needed

Two subtleties worth points:

Business errors deserve customer-friendly explanations plus retriable: false, so the agent communicates the policy instead of retrying into the same wall.
Distinguish access failures from valid empty results. A search that returned zero matches succeeded. Marking a timeout as an empty-but-successful result silently corrupts downstream reasoning — a classic anti-pattern (more in Lesson 6).

In multi-agent systems, subagents should attempt local recovery for transient failures and propagate to the coordinator only what they can't resolve — including partial results and what was attempted.

Tool Distribution: Fewer, Scoped, Purposeful

Giving one agent 18 tools instead of 4–5 measurably degrades selection reliability — more options, more decision complexity, more misuse. Domain 2 principles:

Scope tool access to role. A synthesis agent with web-search tools will eventually attempt web searches. Restrict each subagent to the tools its specialization needs.
Constrain generic tools. Replace fetch_url with load_document that validates document URLs — narrower surface, fewer failure modes.
Allow scoped cross-role tools for high-frequency needs. If the synthesis agent constantly needs simple fact checks, give it a narrow verify_fact tool and keep routing complex verifications through the coordinator. This is the least-privilege pattern the exam rewards.

tool_choice Configuration

Setting	Behavior	Use when
`"auto"`	Model may call a tool or answer in text	Default conversational agents
`"any"`	Model must call some tool	Guaranteeing structured output when multiple schemas exist
`{"type": "tool", "name": "extract_metadata"}`	Model must call that tool	Forcing a specific step to run first, then continuing in follow-up turns

MCP Server Configuration in Claude Code

Scoping rules the exam tests directly:

Project scope — .mcp.json (committed to the repo): shared team tooling. Supports environment variable expansion — ${GITHUB_TOKEN} — so credentials never land in version control.
User scope — ~/.claude.json: personal or experimental servers, invisible to teammates.
Tools from all configured servers are discovered at connection time and available simultaneously.

Adjacent judgment calls:

Prefer existing community MCP servers for standard integrations (Jira, GitHub); build custom servers only for team-specific workflows.
If the agent keeps using built-in tools (like Grep) instead of your more capable MCP tool, the fix is again richer MCP tool descriptions explaining capabilities and outputs.
MCP resources expose content catalogs — issue summaries, documentation hierarchies, database schemas — so agents can see what data exists without burning turns on exploratory tool calls. Resources for content, tools for actions.

Built-In Tools: Selection Criteria

The exam expects instant recall of the built-in toolset:

Tool	For	Example
Grep	Content search inside files	Find all callers of a function, locate an error message
Glob	File path pattern matching	`*/.test.tsx` — find files by name/extension
Read / Write	Full-file operations	Load a file; rewrite it entirely
Edit	Targeted modification via unique text match	Change one function body
Bash	Shell execution	Run tests, git operations

Two tested patterns: when Edit fails on non-unique anchor text, fall back to Read + Write. And build codebase understanding incrementally — Grep for entry points, Read to follow imports and trace flows — rather than reading everything upfront.

Hands-On Exercise

Write two deliberately-confusable tools (get_customer, lookup_order) with one-line descriptions; measure misrouting on 10 ambiguous requests.
Expand both descriptions with input formats, examples, and boundaries; re-measure.
Add structured error responses covering all four categories; verify the agent retries transient errors but explains business errors.
Configure one shared server in .mcp.json with ${API_TOKEN} expansion and one personal server in ~/.claude.json; confirm both toolsets are live simultaneously.

The Four Error Categories, Concretely

Abstract taxonomies are hard to hold under exam pressure; concrete payloads are not. Here is one well-formed error response per category for a process_refund tool:

json

// Transient — retry is appropriate
{ "isError": true, "errorCategory": "transient", "isRetryable": true,
  "message": "Payment service timed out after 5s. Retrying in a few seconds is likely to succeed." }

// Validation — fix the input, don't retry as-is
{ "isError": true, "errorCategory": "validation", "isRetryable": false,
  "message": "order_id 'ORD-99x' is malformed. Expected format: ORD- followed by 8 digits." }

// Business — explain the policy, offer the alternative path
{ "isError": true, "errorCategory": "business", "isRetryable": false,
  "message": "Refund of $612 exceeds the $500 automatic limit. Explain the limit to the customer and use escalate_to_human with a case summary." }

// Permission — a different capability is required
{ "isError": true, "errorCategory": "permission", "isRetryable": false,
  "message": "This agent lacks refund authority for enterprise accounts. Route to the enterprise support queue." }

Notice what each message does beyond flagging failure: it tells the agent what to do next. That is the design bar the exam holds tool errors to — and it is why a uniform "Operation failed" is always a wrong answer.

Worked Exam Question

A search tool queries your product catalog. For the query "waterproof hiking boots size 15" it finds nothing, and returns { "isError": true, "message": "search failed" }. The agent apologizes to the customer for a technical problem. What is wrong?

A. A query that matches zero products is a successful search with an empty result — it should return an empty list with isError false, so the agent can tell the customer no matching products exist.
B. The tool should retry the search with broader terms before returning anything.
C. The error message should include a stack trace so the agent can diagnose the failure.
D. The agent's prompt should instruct it to reword technical errors as stock-availability messages.

Answer: A. "No matches" is information, not failure. Marking it as an error makes the agent misreport reality — the mirror image of the anti-pattern where real failures are disguised as empty successes. Option B hides a product decision inside a tool, C leaks internals without helping the model act, and D papers over a data-contract bug with prompt instructions.

Key Takeaways for the Exam

Descriptions drive selection; enrich them before adding routing infrastructure.
Errors need category + retryability + human-readable message; empty results are not errors.
Fewer, role-scoped tools beat many generic ones; least-privilege cross-role tools for hot paths.
.mcp.json = shared project scope with env-var expansion; ~/.claude.json = personal.
Grep = content, Glob = paths, Edit → Read+Write fallback on non-unique matches.

Next: Lesson 4 — Claude Code Configuration & Workflows

Frequently Asked Questions

Why do tool descriptions matter so much for agent reliability?

Because the description is the only information the model has when choosing among tools. Humans can ask clarifying questions; the model just matches the request against the descriptions in front of it. Two tools described "Retrieves customer information" and "Retrieves order details" look nearly identical to a model handling "check my order for me" — which is why enriching descriptions with input formats, example queries, and explicit boundaries is the highest-leverage reliability fix, and why it beats few-shot examples or routing layers as a first step.

What is the difference between .mcp.json and ~/.claude.json?

Project-scoped .mcp.json is committed to the repository, so every teammate gets the same MCP servers on clone — with environment variable expansion like $ keeping actual credentials out of version control. User-scoped ~/.claude.json is personal: experimental servers live there without affecting anyone else. Tools from every configured server in both scopes are discovered at connection time and available simultaneously.

When should I use Grep versus Glob?

Grep searches inside files — find every caller of a function, locate an error message, trace an import. Glob matches file paths — every **/*.test.tsx, everything under src/api/. The exam also tests the Edit fallback: when Edit fails because its anchor text appears multiple times in a file, switch to Read followed by Write rather than fighting for a unique anchor.