What score should I aim for before booking the real exam?

The real exam passes at 720 on a 100-1,000 scale - roughly 42 of 60 questions. Aim for 48+ correct (80%, about a 820 scaled score) consistently before booking, to leave a safety margin.

Are these official Anthropic exam questions?

No. This is original, unofficial practice material modeled on the official exam guide's format, scenarios, and domain weightings. It is not affiliated with or endorsed by Anthropic.

60 Claude Certified Architect (CCA-F) Practice Questions 2026

Q: How many practice questions are in this bank and how are they weighted?

60 scenario-based questions weighted exactly to the official exam domains: 16 on Agentic Architecture (27%), 12 on Claude Code (20%), 12 on Structured Output (20%), 11 on Tool Design & MCP (18%), and 9 on Context & Reliability (15%).

← Back to Claude Certified Architect Course

This is the complete practice question bank for the Claude Certified Architect – Foundations (CCA-F) exam, built to the official specification: scenario-based multiple-choice questions with one correct answer and three plausible distractors, weighted exactly to the published domain breakdown — 16 questions on Agentic Architecture (27%), 12 on Claude Code (20%), 12 on Structured Output (20%), 11 on Tool Design & MCP (18%), and 9 on Context & Reliability (15%).

Work through each scenario block, commit to an answer before expanding the explanation, and track your domain-level accuracy. For a realistic timed rehearsal with scaled 100–1,000 scoring, use the companion interactive mock exam, and see the exam guide for strategy. New to the material? Start with the full course.

Scoring yourself: the real exam passes at 720 on a 100–1,000 scale — roughly 42 of 60 questions (70%). Aim for 48+ (80%) before booking.

Scenario 1: Customer Support Resolution Agent

You are building a customer support resolution agent with the Claude Agent SDK. It handles returns, billing disputes, and account issues through custom MCP tools (get_customer, lookup_order, process_refund, escalate_to_human), targeting 80%+ first-contact resolution.

Question 1 — Agentic Architecture & Orchestration

Your agentic loop occasionally ends the conversation while the agent still has pending work. Reviewing the code, you find the loop exits whenever the response contains any assistant text. What is the correct termination condition?

A. Terminate after a fixed maximum of 10 iterations to bound the loop deterministically.
B. Terminate when the response text contains a completion phrase such as "task complete" or "resolved".
C. Terminate when the model stops requesting the same tool twice in a row, indicating it has converged.
D. Continue the loop while stop_reason is "tool_use" and terminate only when stop_reason is "end_turn".

Show answer

Correct answer: D. stop_reason is the API contract for loop control: "tool_use" means the model needs tools executed, "end_turn" means it has finished. Responses can contain both text and tool calls, so text presence signals nothing. Phrase parsing is brittle, and iteration caps are a safety circuit breaker, not a termination condition.

Question 2 — Agentic Architecture & Orchestration

After executing the tools the model requested, what must your loop do before sending the next request so the agent can decide its next action?

A. Send only the results of the most recent tool call and drop earlier ones to save tokens.
B. Store the results in an external database the model can query through a memory tool if needed.
C. Append the tool results to the conversation history so the model can reason over what it learned.
D. Summarize the results into one sentence and replace the previous assistant turn with it.

Show answer

Correct answer: C. The agentic loop works because each iteration the model sees the full history including tool results — that is how it incorporates new information into its reasoning. Replacing or dropping results removes exactly the information the model asked for.

Question 3 — Agentic Architecture & Orchestration

Compliance requires that no refund is ever processed without a verified customer identity. Your system prompt already states this rule in bold, yet audits find rare violations. What should you implement?

A. A rewritten system prompt that repeats the verification requirement at the beginning and end of the instructions.
B. A post-hoc audit job that scans transcripts nightly and flags refunds issued without prior verification.
C. Few-shot examples demonstrating the agent calling get_customer before every refund operation.
D. A programmatic prerequisite gate that blocks process_refund until get_customer has returned a verified customer ID.

Show answer

Correct answer: D. Prompt instructions provide probabilistic compliance with a non-zero failure rate — unacceptable when errors have financial or regulatory consequences. Programmatic enforcement (hooks, prerequisite gates) gives a deterministic guarantee. Nightly audits detect violations but do not prevent them.

Question 4 — Agentic Architecture & Orchestration

A customer writes one message containing three unrelated issues: a damaged item, a duplicate charge, and a loyalty points question. How should the agent handle this?

A. Answer only the first issue and ask the customer to resubmit the other two as separate tickets.
B. Escalate to a human immediately, since multi-concern messages exceed single-agent scope.
C. Decompose the message into distinct items, investigate each in parallel using shared context, then synthesize one unified resolution.
D. Process the issues strictly sequentially in the order written, resolving each fully before reading the next.

Show answer

Correct answer: C. The tested pattern is decomposition into distinct concerns, parallel investigation with shared context, and a single synthesized response. Forcing resubmission or immediate escalation hurts first-contact resolution, and strict sequencing wastes latency when the investigations are independent.

Question 5 — Tool Design & MCP Integration

Logs show the agent frequently calls get_customer when users ask about specific orders. Both tools carry one-line descriptions ("Retrieves customer information" / "Retrieves order details") and accept similar identifier formats. What is the most effective first step?

A. Add a pre-processing routing layer that inspects each user message and enables only the tool matching detected keywords.
B. Merge both tools into one lookup_entity tool that accepts any identifier and routes internally.
C. Add eight few-shot examples to the system prompt demonstrating correct tool choice for order queries.
D. Expand each tool's description with accepted input formats, example queries, edge cases, and guidance on when to use it versus the other tool.

Show answer

Correct answer: D. Tool descriptions are the primary mechanism LLMs use for tool selection; minimal descriptions are the root cause here. Enriching them is the low-effort, high-leverage first step. Routing layers and consolidation are heavier interventions, and few-shot examples add token overhead without fixing the underlying ambiguity.

Question 6 — Tool Design & MCP Integration

When process_refund is called for an amount that violates refund policy, the tool currently returns {"isError": true, "message": "Operation failed"}. The agent responds by retrying the identical call three times. How should the error response be redesigned?

A. Return isError: false with an empty result so the agent moves on instead of retrying.
B. Return the same message but add exponential backoff between the agent's retry attempts.
C. Raise an unhandled exception so the orchestration layer terminates the conversation safely.
D. Return structured metadata: errorCategory "business", retriable: false, and a customer-friendly explanation of the policy limit.

Show answer

Correct answer: D. Uniform generic errors prevent the agent from making recovery decisions. A structured business error with retriable: false stops wasted retries, and the customer-friendly description lets the agent communicate the policy correctly. Masking the failure as success or terminating the session are both anti-patterns.

Question 7 — Tool Design & MCP Integration

lookup_order sometimes times out, and sometimes legitimately finds no orders for a customer. Both cases currently return an empty list. Why is this design a problem?

A. Timeouts should always terminate the conversation, and empty lists prevent that from happening.
B. Empty lists consume more context window tokens than structured error objects do.
C. The agent cannot distinguish an access failure needing a retry decision from a valid empty result, so it may present incomplete data as truth.
D. The MCP specification forbids returning empty arrays from tools that declare a list return type.

Show answer

Correct answer: C. A timeout is a transient access failure; zero matching orders is a successful query. Conflating them means the agent can neither retry appropriately nor trust its data. The fix is structured errors for failures (isError, category, retryability) and normal empty results for genuine no-match queries.

Question 8 — Context Management & Reliability

A customer's first message is: "I've been burned by bots before. Get me a human right now." The issue looks like a routine address change the agent could easily handle. What should the agent do?

A. Resolve the address change first, then offer the human handoff once the work is complete.
B. Run sentiment analysis and escalate only if frustration exceeds the configured threshold.
C. Escalate to a human immediately, honoring the explicit request without first attempting the investigation.
D. Ask three diagnostic questions to confirm the issue is really simple before deciding.

Show answer

Correct answer: C. An explicit customer demand for a human is a first-class escalation trigger and should be honored immediately. (Offering to resolve first is appropriate only when the customer hasn't explicitly demanded a human.) Sentiment scores are unreliable proxies and ignoring the request damages trust.

Question 9 — Context Management & Reliability

get_customer returns three plausible matches for "Maria Garcia." The agent must act on one account. What behavior should you instruct?

A. Ask the customer for an additional identifier (email, order number, postal code) to disambiguate before proceeding.
B. Select the match whose region matches the caller's IP-derived location.
C. Select the match with the most recent order activity, since that customer is most likely to be contacting support.
D. Proceed with the first match returned and correct course later if details fail to line up.

Show answer

Correct answer: A. Multiple matches require clarification, not heuristic selection. Acting on the wrong account can expose data or process refunds against the wrong customer — heuristics like recency or geolocation are exactly the shortcuts the exam flags as wrong.

Question 10 — Context Management & Reliability

In conversations beyond ~25 turns, the agent starts citing wrong refund amounts. You use progressive summarization to control context size, and the exact figures are being blurred into phrases like "a refund of around $50." What is the standard fix?

A. Summarize more aggressively so the summary is short enough to remain accurate.
B. Ask the customer to restate the amounts whenever the conversation exceeds 25 turns.
C. Extract transactional facts (amounts, dates, order numbers, statuses) into a persistent case-facts block included in every prompt outside the summarized history.
D. Switch to a model with a larger context window so summarization is never needed.

Show answer

Correct answer: C. Progressive summarization characteristically loses precise numerical values. The pattern is to persist exact transactional facts in a structured block that rides alongside the summary, immune to compression. Bigger windows delay but don't remove the problem, and re-asking customers erodes trust.

Scenario 2: Code Generation with Claude Code

Your team uses Claude Code for code generation, refactoring, debugging, and documentation, with custom slash commands, CLAUDE.md configuration, and a mix of plan mode and direct execution.

Question 11 — Claude Code Configuration & Workflows

You create a /deploy-check slash command that every developer should get automatically when they clone or pull the repository. Where does the command file belong?

A. As a named section inside the root CLAUDE.md file.
B. In each developer's ~/.claude/commands/ directory, synced via the onboarding script.
C. In .claude/config.json under a "commands" array.
D. In .claude/commands/ inside the project repository, so it is version-controlled and shared.

Show answer

Correct answer: D. Project-scoped commands live in .claude/commands/ and travel with the repo. ~/.claude/commands/ is for personal commands, CLAUDE.md holds instructions rather than command definitions, and a config.json commands array does not exist.

Question 12 — Claude Code Configuration & Workflows

A senior engineer's carefully written coding standards apply perfectly in her own sessions, but a new teammate reports Claude ignores them entirely. What is the most likely cause?

A. The standards file exceeds the maximum size and is silently truncated on other machines.
B. CLAUDE.md files require an explicit /reload command in each new session before they apply.
C. The new teammate's Claude Code version predates CLAUDE.md support and must be upgraded.
D. The standards live in her user-level ~/.claude/CLAUDE.md, which is not shared through version control.

Show answer

Correct answer: D. User-level configuration applies only to that user. Instructions the whole team needs belong in project-level configuration (.claude/CLAUDE.md or root CLAUDE.md) committed to the repo. The /memory command is the diagnostic tool that reveals which files are actually loaded.

Question 13 — Claude Code Configuration & Workflows

Test files sit next to the components they test throughout the codebase (Button.test.tsx beside Button.tsx), and all tests must follow one set of conventions regardless of directory. What is the most maintainable configuration?

A. A skill in .claude/skills/ that developers invoke manually before writing tests.
B. A CLAUDE.md file in every directory that contains at least one test file.
C. A .claude/rules/ file with YAML frontmatter paths: ["**/*.test.tsx"] so the conventions load exactly when matching files are edited.
D. A section in the root CLAUDE.md relying on Claude to infer when test conventions apply.

Show answer

Correct answer: C. Glob-scoped rules apply conventions by file pattern regardless of location — exactly right for test files scattered across the tree. Per-directory CLAUDE.md files can't track scattered files maintainably, root-file inference is unreliable, and manual skill invocation contradicts the automatic requirement.

Question 14 — Claude Code Configuration & Workflows

Your team's codebase-analysis skill produces thousands of lines of exploratory output that crowd out the main conversation. Which frontmatter option addresses this?

A. argument-hint — prompting for parameters keeps the skill's scope, and therefore output, smaller.
B. compact: true — automatically compacts the conversation after the skill finishes.
C. allowed-tools: [Read] — restricting tools reduces how much output the skill can produce.
D. context: fork — run the skill in an isolated sub-agent context so its output does not pollute the main session.

Show answer

Correct answer: D. context: fork exists precisely to isolate verbose or exploratory skill output from the main conversation, returning only the useful result. allowed-tools restricts capability (a safety control), argument-hint improves invocation UX, and compact: true is not a real frontmatter option.

Question 15 — Claude Code Configuration & Workflows

You must migrate the codebase from one ORM to another, touching roughly 50 files, with at least two viable migration strategies. How should you begin?

A. Enter plan mode to explore the codebase, compare the strategies, and design the approach before making any changes.
B. Start direct execution on the smallest file to build momentum, switching to plan mode if problems appear.
C. Split the work across three parallel sessions, each migrating a subset of files independently.
D. Write exhaustive upfront instructions specifying every file change, then run direct execution.

Show answer

Correct answer: A. Large-scale changes with multiple valid approaches and architectural implications are exactly what plan mode is for — safe exploration and design before committing. "Switch to plan mode if complexity emerges" ignores that the complexity is already stated; exhaustive upfront instructions assume knowledge you haven't gathered.

Question 16 — Claude Code Configuration & Workflows

Claude applies your conventions in some sessions but not others, and you suspect the wrong memory files are loading. What is the fastest diagnostic?

A. Enable verbose API logging and inspect the raw system prompt in the request payload.
B. Run the /memory command to see exactly which memory files are loaded in the current session.
C. Grep the codebase for every CLAUDE.md and manually diff their contents.
D. Delete all CLAUDE.md files and re-add them one at a time across sessions.

Show answer

Correct answer: B. /memory directly shows which memory files are loaded, making it the first-line diagnostic for inconsistent configuration behavior. The alternatives are slow, destructive, or infrastructure-heavy versions of what /memory does in one step.

Question 17 — Agentic Architecture & Orchestration

You resume yesterday's named session to continue a refactor, but overnight a teammate rewrote two of the modules Claude analyzed. The session's prior tool results now describe code that no longer exists. What is the most reliable approach?

A. Resume the session as-is; Claude will notice discrepancies when edits fail and self-correct.
B. Replay yesterday's entire tool call sequence in a fresh session to rebuild identical context.
C. Resume the session and instruct Claude to distrust all of its earlier conclusions.
D. Start a new session and inject a structured summary of the prior analysis plus the current state of the changed modules.

Show answer

Correct answer: D. When prior tool results are stale, resuming propagates wrong beliefs. Starting fresh with a structured summary preserves the valuable conclusions without the stale evidence. (If only specific files changed and most context were valid, resuming while naming the changed files would also work — but wholesale distrust or full replay wastes the session.)

Question 18 — Agentic Architecture & Orchestration

After a long shared analysis of the codebase, you want to evaluate two incompatible testing strategies — mocking-heavy versus integration-first — from that same baseline. Which session mechanism fits?

A. --resume in one session, alternating between strategies in successive prompts.
B. One session with /compact run between the two strategy discussions.
C. fork_session, creating independent branches that each inherit the shared analysis and diverge from there.
D. Two brand-new sessions, each re-analyzing the codebase from scratch for independence.

Show answer

Correct answer: C. fork_session exists for exactly this: exploring divergent approaches from a shared baseline without re-paying the analysis cost or letting the two explorations contaminate each other. Alternating in one session mixes contexts; fresh sessions discard the baseline.

Question 19 — Context Management & Reliability

During a multi-phase feature build, the discovery phase floods the main conversation with file listings and dead-end exploration, and later phases run short on context. What should you change?

A. Disable file-reading tools during discovery and rely on the model's general knowledge of typical project layouts.
B. Run discovery last so its output cannot interfere with the implementation phases.
C. Raise max_tokens on every request so the conversation can absorb the discovery output.
D. Delegate discovery to the Explore subagent, which isolates verbose output and returns a summary to the main conversation.

Show answer

Correct answer: D. The Explore subagent (and subagent delegation generally) exists to keep verbose discovery out of the main context while preserving its conclusions. max_tokens governs output length, not context capacity; discovery must precede implementation; and guessing at layouts defeats the purpose.

Question 20 — Context Management & Reliability

Three hours into a session, Claude answers questions about your authentication module with generic statements about "typical auth patterns" instead of the specific classes it read earlier. What is happening, and what mitigates it?

A. Context degradation — have the agent maintain a scratchpad file of key findings and reference it for subsequent questions.
B. Model drift — restart Claude Code to load a fresh model checkpoint.
C. Rate limiting — long sessions receive progressively smaller context allocations.
D. Hallucination onset — lower the temperature parameter for the remainder of the session.

Show answer

Correct answer: A. Extended sessions degrade: earlier discoveries effectively fall out of usable context and the model falls back on generic priors. Scratchpad files persist key findings outside the context window so they can be re-referenced. The other options describe mechanisms that don't exist or don't address the cause.

Scenario 3: Multi-Agent Research System

A coordinator agent delegates to specialized subagents — web search, document analysis, synthesis, and report generation — to produce comprehensive, cited research reports.

Question 21 — Agentic Architecture & Orchestration

A report on "renewable energy adoption" covers only solar power, though every subagent completed successfully. The coordinator's logs show it created subtasks for "rooftop solar," "utility-scale solar," and "solar manufacturing." What is the root cause?

A. The coordinator's task decomposition was too narrow, so the subagent assignments never covered wind, hydro, or other domains.
B. The synthesis agent filtered out non-solar findings as off-topic.
C. The document analysis agent's relevance threshold was set too high for non-solar sources.
D. The web search agent's queries were under-specified and returned solar-heavy results.

Show answer

Correct answer: A. The logs show the answer: the coordinator decomposed the broad topic into exclusively solar subtasks. The downstream agents did their assigned jobs correctly — the coverage gap was created at assignment time. Diagnosing decomposition (not downstream agents) when coverage is wrong is a core Domain 1 skill.

Question 22 — Agentic Architecture & Orchestration

The synthesis subagent produces reports that ignore everything the search and analysis agents found, synthesizing from general knowledge instead. The coordinator invokes it with the prompt "Synthesize the research findings." What is wrong?

A. Subagents do not inherit the coordinator's conversation history — the findings must be passed explicitly in the synthesis agent's prompt.
B. The synthesis agent's temperature is too high, causing it to ignore provided context.
C. The coordinator must call the synthesis agent last in a single parallel batch so memory is shared.
D. The synthesis agent needs the Task tool in its allowedTools to pull findings from its sibling agents.

Show answer

Correct answer: A. Subagent context isolation is fundamental: each subagent sees only what its prompt contains. "The research findings" refers to data the synthesis agent has never seen. The fix is passing complete findings — ideally structured with sources and metadata — directly into its prompt.

Question 23 — Agentic Architecture & Orchestration

The coordinator invokes its three research subagents one per turn, tripling wall-clock time. How do you run them in parallel?

A. Set parallel: true in each subagent's AgentDefinition.
B. Reduce each subagent's max_tokens so the sequential calls complete faster.
C. Spawn three coordinator instances, each owning one subagent.
D. Have the coordinator emit multiple Task tool calls within a single response.

Show answer

Correct answer: D. Parallel subagent execution is achieved by emitting multiple Task calls in one coordinator response rather than across separate turns. There is no parallel flag on AgentDefinition, multiple coordinators break the hub-and-spoke pattern, and max_tokens doesn't change sequencing.

Question 24 — Agentic Architecture & Orchestration

Your coordinator prompt scripts each subagent's work step-by-step ("first search X, then open the top three results, then..."). Subagents follow the script even when it clearly doesn't fit the topic. What should the coordinator's delegation prompts specify instead?

A. Nothing — subagents should receive only the raw topic string for maximum flexibility.
B. A formal state machine definition each subagent must execute deterministically.
C. Research goals and quality criteria, leaving subagents free to adapt their approach to what they find.
D. The same steps, but with more branches covering additional topic types.

Show answer

Correct answer: C. Goal-and-criteria prompts preserve subagent adaptability — the reason to use an LLM agent at all. Ever-larger procedural scripts and state machines fight the model's strengths, while a bare topic string discards the coordinator's decomposition and quality standards.

Question 25 — Agentic Architecture & Orchestration

Synthesis output for complex topics keeps arriving with obvious gaps. You want the system to notice and fix this before the report ships. Which pattern applies?

A. Raising the search agent's result count so gaps are statistically less likely on the first pass.
B. An iterative refinement loop: the coordinator evaluates synthesis output for gaps, re-delegates targeted queries to search and analysis, and re-invokes synthesis until coverage is sufficient.
C. A second synthesis agent that rewrites the first agent's report with instructions to be more comprehensive.
D. A human review step that returns gap feedback to the team the following day.

Show answer

Correct answer: B. The tested pattern is coordinator-driven iterative refinement: evaluate, re-delegate with targeted queries, re-synthesize. Rewriting without new information cannot fill evidence gaps, more first-pass results is a blunt instrument, and next-day human review abandons autonomous quality control.

Question 26 — Tool Design & MCP Integration

The synthesis agent needs to verify facts while combining findings. Currently every verification routes back through the coordinator to the search agent, adding round trips; measurement shows 85% are simple lookups (dates, names, statistics) and 15% need real investigation. What is the best design?

A. Give the synthesis agent the full web search toolset so it never needs the coordinator.
B. Have the search agent proactively cache extra context around each source in anticipation of verification needs.
C. Give the synthesis agent a scoped verify_fact tool for the simple lookups, and keep routing complex verifications through the coordinator.
D. Batch all verification requests to the end of the synthesis pass and send them to the coordinator at once.

Show answer

Correct answer: C. A narrowly scoped cross-role tool serves the high-frequency simple case under least privilege, while the coordinator path handles the complex minority. Full tool access invites cross-specialization misuse, batching creates blocking dependencies between facts, and speculative caching can't predict verification needs.

Question 27 — Tool Design & MCP Integration

Two tools — analyze_content ("Analyzes content and returns insights") and analyze_document ("Analyzes documents and returns insights") — get misrouted constantly. One actually processes web search results; the other processes uploaded PDFs. What is the right fix?

A. Lower the model's temperature so tool selection becomes more deterministic.
B. Add a system prompt rule: "use analyze_content for anything from the web."
C. Delete one tool and extend the other to accept both input types.
D. Rename them to reflect purpose (e.g., extract_web_results, analyze_pdf_document) and rewrite each description with inputs, outputs, and when to use it versus the other.

Show answer

Correct answer: D. Functional overlap in names and descriptions is the root cause of misrouting; renaming plus differentiated descriptions eliminates it. Consolidation changes the architecture rather than the labeling problem, keyword rules in the system prompt are fragile, and temperature doesn't resolve genuine ambiguity.

Question 28 — Tool Design & MCP Integration

For convenience, every subagent was configured with all 18 of the system's tools. Tool misuse and misselection are rising. What does the exam's design principle say?

A. Keep all 18 tools but sort them so the most relevant appear first in each agent's list.
B. Consolidate the 18 tools into 3 mega-tools with a mode parameter.
C. Restrict each subagent to the 4-5 tools relevant to its role — large tool sets increase decision complexity and degrade selection reliability.
D. Keep all 18 tools and add a system prompt instruction to use only role-appropriate ones.

Show answer

Correct answer: C. Tool count directly affects selection reliability; agents given tools outside their specialization tend to misuse them. Scoped tool access per role is the principle. Ordering and prompt admonitions leave the decision complexity in place, and mega-tools trade selection errors for parameter errors.

Question 29 — Context Management & Reliability

The web search subagent times out mid-research. Which failure response best enables the coordinator to recover intelligently?

A. An uncaught exception that terminates the whole research workflow for consistency.
B. A generic "search unavailable" status after the subagent exhausts internal retries.
C. Structured error context: the failure type, the attempted query, any partial results, and potential alternative approaches.
D. An empty result set marked successful, so the pipeline continues without interruption.

Show answer

Correct answer: C. The coordinator can only choose between retrying, rephrasing, rerouting, or proceeding with partials if it knows what failed and what was salvaged. Generic statuses hide that information, success-masking silently corrupts the report, and full termination discards recoverable work.

Question 30 — Context Management & Reliability

Two credible sources report different figures for the same statistic — one from a 2024 industry survey, one from a 2026 government dataset. How should the document analysis agent handle this before synthesis?

A. Include both values, each annotated with source attribution and publication date, and let the coordinator decide reconciliation before synthesis.
B. Report the newer figure only, since more recent data supersedes older data.
C. Report the average of the two figures with a footnote noting variance.
D. Discard both figures as unreliable and search for a third source that breaks the tie.

Show answer

Correct answer: A. Conflicting credible statistics are annotated with attribution — never arbitrarily resolved by the analysis agent. Publication dates matter because temporal differences may explain the gap entirely (a trend, not a contradiction). Averaging fabricates a number no source reported.

Scenario 4: Developer Productivity with Claude

You are building developer productivity tooling on the Claude Agent SDK: codebase exploration, legacy system understanding, boilerplate generation, and task automation using built-in tools (Read, Write, Bash, Grep, Glob) plus MCP servers.

Question 31 — Tool Design & MCP Integration

The agent needs to find every place the function calculateDiscount is called across a large repository. Which built-in tool fits?

A. Read — load each file and scan for the function within the agent's context.
B. Grep — it searches file contents for patterns like function names across the codebase.
C. Bash with find — enumerate files modified since the function was introduced.
D. Glob — it matches files whose names contain the string calculateDiscount.

Show answer

Correct answer: B. Grep is the content-search tool — finding callers, error messages, or import statements inside files. Glob matches file paths, not contents; reading every file wastes context; and file modification times say nothing about call sites.

Question 32 — Tool Design & MCP Integration

The agent must operate on all Storybook story files, which follow the naming pattern *.stories.tsx anywhere in the tree. Which tool and pattern apply?

A. Glob with the pattern **/*.stories.tsx to match file paths by name across all directories.
B. Bash with cat piped through a filter for the .tsx extension.
C. Read on the components directory to list its files recursively.
D. Grep for the string "stories" across all file contents.

Show answer

Correct answer: A. Glob is purpose-built for file path pattern matching — finding files by name or extension patterns like **/*.stories.tsx. Grep searches contents (many non-story files mention "stories"), and Read takes files, not directory trees.

Question 33 — Tool Design & MCP Integration

The agent tries to Edit a config file but fails repeatedly because the anchor text it targets appears four times in the file. What is the reliable fallback?

A. Read the full file, then Write the complete modified version back.
B. Retry Edit with progressively longer anchor strings until one happens to be unique.
C. Use Bash with sed to replace all four occurrences at once.
D. Split the file into four smaller files so each anchor becomes unique.

Show answer

Correct answer: A. When Edit cannot find a unique text match, the documented fallback is Read + Write: load the whole file and write back the modified version. Anchor-lengthening retries are fragile, sed replaces occurrences you didn't intend, and restructuring files to satisfy a tool is backwards.

Question 34 — Tool Design & MCP Integration

You integrated a powerful code-intelligence MCP server, but the agent keeps using built-in Grep instead of the server's semantic search tool, which carries the description "Searches code." What should you do?

A. Remove Grep from the agent's allowed tools so the MCP tool is the only search option.
B. Lower the MCP server's response latency so the agent learns to prefer it.
C. Enhance the MCP tool's description to explain in detail its capabilities, inputs, and outputs, so the agent can see why it beats Grep for semantic queries.
D. Add a system prompt rule stating that built-in tools are deprecated for this project.

Show answer

Correct answer: C. Agents choose tools by their descriptions; a three-word description cannot compete with a well-understood built-in. Enriching the description is the root-cause fix. Removing Grep breaks legitimate content searches, blanket deprecation rules are keyword-fragile, and the agent has no latency feedback loop.

Question 35 — Claude Code Configuration & Workflows

Your team needs a shared Jira MCP server (authenticated by a token each developer supplies) plus one engineer wants to trial an experimental profiling server privately. How should these be configured?

A. Both servers in .mcp.json, with the experimental one commented out for everyone else.
B. Both in each developer's ~/.claude.json, distributed through the onboarding wiki.
C. Jira in the project's .mcp.json using ${JIRA_TOKEN} environment variable expansion; the experimental server in that engineer's ~/.claude.json.
D. Jira in .mcp.json with the shared team token committed; the experimental server in a feature branch.

Show answer

Correct answer: C. Project scope (.mcp.json) is for shared team tooling, with env-var expansion keeping credentials out of version control; user scope (~/.claude.json) is for personal/experimental servers. Committing tokens is a security failure, and distributing team config via wiki defeats version control.

Question 36 — Claude Code Configuration & Workflows

The agent burns many turns making exploratory tool calls just to discover what documentation pages and database schemas exist before it can answer anything. Which MCP capability addresses this?

A. MCP prompts — predefined prompt templates that embed the catalog into every request.
B. A get_everything tool that returns the full documentation corpus in one call.
C. Increasing the tool-call budget so exploration completes more often.
D. MCP resources — expose the documentation hierarchy and schema catalog so the agent can see available content without exploratory tool calls.

Show answer

Correct answer: D. Resources are MCP's mechanism for exposing content catalogs — issue summaries, documentation hierarchies, database schemas — giving agents visibility into what exists without spending turns discovering it. Dumping the entire corpus blows the context window; a larger budget just pays the same cost more often.

Question 37 — Claude Code Configuration & Workflows

The team wants Claude Code connected to GitHub for issues and PRs. An engineer proposes writing a custom MCP server for it. What is the sound architectural guidance?

A. Build custom — first-party code is always more maintainable than community dependencies.
B. Use an existing community MCP server for the standard GitHub integration; reserve custom servers for workflows unique to your team.
C. Skip MCP and have the agent call the GitHub REST API through Bash and curl.
D. Wait for an official first-party integration before connecting GitHub at all.

Show answer

Correct answer: B. For standard integrations like GitHub or Jira, community MCP servers already encode the API surface and edge cases; custom effort should target team-specific workflows that nothing off-the-shelf covers. Raw curl through Bash discards MCP's typed tool interface.

Question 38 — Agentic Architecture & Orchestration

The task is open-ended: "add comprehensive tests to this legacy codebase." Which decomposition approach fits?

A. Adaptive decomposition: map the structure first, identify high-impact areas, create a prioritized plan, and revise it as dependencies are discovered.
B. Ask the model to write all tests in one large pass to keep full context in a single generation.
C. A fixed pipeline: generate one test file per source file in alphabetical order until coverage hits the target.
D. Randomly sample 20% of files for testing to bound the scope deterministically.

Show answer

Correct answer: A. Open-ended investigation tasks call for dynamic, adaptive decomposition — the plan should change as the structure and dependencies reveal themselves. Fixed pipelines suit predictable multi-aspect work; alphabetical order and random sampling ignore impact entirely.

Question 39 — Agentic Architecture & Orchestration

The agent must understand how payment processing flows through an unfamiliar 500-file codebase. What exploration strategy does the exam endorse?

A. Read all 500 files upfront so the agent has complete knowledge before answering.
B. Incremental tracing: Grep for entry points (e.g., route handlers, "processPayment"), then Read to follow imports and trace the flow outward.
C. Ask the developer to paste the relevant files, keeping the agent read-only.
D. Read only the README and directory names, inferring the architecture from conventions.

Show answer

Correct answer: B. Incremental understanding — search for entry points, then follow the threads — matches both context limits and how flows actually connect. Reading everything upfront exhausts context on irrelevant files; README-only inference produces the "typical patterns" failure mode.

Question 40 — Agentic Architecture & Orchestration

A multi-agent exploration of a huge monorepo sometimes crashes hours in, losing everything. How do you make the workflow resumable?

A. Log all agent output to a file that a human reviews to manually reconstruct progress.
B. Increase the heartbeat timeout so crashes become less frequent.
C. Rerun the whole exploration from scratch on failure — determinism guarantees identical results.
D. Have each agent export structured state to a known location, and on resume have the coordinator load a manifest and inject the state into agent prompts.

Show answer

Correct answer: D. Structured state persistence with a coordinator-loaded manifest is the crash-recovery pattern: work survives the process. Longer timeouts reduce frequency but not cost; LLM runs are not deterministic; and manual reconstruction doesn't scale.

Scenario 5: Claude Code for Continuous Integration

Claude Code runs inside your CI/CD pipeline for automated code review, test generation, and PR feedback. The goals are actionable findings and a minimal false-positive rate.

Question 41 — Claude Code Configuration & Workflows

Your CI job invokes claude "Review this diff" and hangs forever; logs show Claude Code awaiting interactive input. What fixes it?

A. Use the --batch flag to queue the request for asynchronous processing.
B. Export CLAUDE_HEADLESS=true before invoking the command.
C. Use the -p (or --print) flag: claude -p "Review this diff" — non-interactive mode that prints the result and exits.
D. Pipe /dev/null to stdin so the interactive prompt receives an EOF.

Show answer

Correct answer: C. -p / --print is the documented non-interactive mode for pipelines: process the prompt, print to stdout, exit. CLAUDE_HEADLESS and --batch do not exist — classic invented-feature distractors — and stdin redirection is a workaround that doesn't engage the proper mode.

Question 42 — Claude Code Configuration & Workflows

You want review findings posted automatically as inline PR comments, which requires reliably machine-parseable output with a fixed structure. Which CLI capabilities apply?

A. Piping the text output through a regex extraction script in the pipeline.
B. A system prompt instructing Claude to "always respond in valid JSON."
C. --output-format json together with --json-schema to enforce structured, schema-conformant findings.
D. The --xml flag, since XML is more robust for automated parsing than JSON.

Show answer

Correct answer: C. --output-format json plus --json-schema is the supported mechanism for structured CI output. Prompt-level JSON requests are probabilistic, regex extraction from prose is brittle, and there is no --xml flag.

Question 43 — Claude Code Configuration & Workflows

Nightly generated tests are technically valid but low-value: they duplicate existing coverage and test trivial getters. What is the highest-leverage fix?

A. Restrict generation to files with zero existing coverage.
B. Raise the temperature so generated tests are more diverse.
C. Generate five times as many tests and keep the subset a human likes best.
D. Document testing standards, criteria for valuable tests, and available fixtures in CLAUDE.md, and include existing test files in context so duplicates are avoided.

Show answer

Correct answer: D. CI-invoked Claude Code inherits project context from CLAUDE.md — documenting what makes a test valuable, plus supplying existing tests to prevent duplication, targets both failure modes directly. Temperature adds randomness, over-generation wastes human review, and zero-coverage-only misses weakly covered critical paths.

Question 44 — Prompt Engineering & Structured Output

Developers are muting the automated reviewer: it flags dozens of subjective style preferences alongside occasional real bugs. The prompt already says "only report important, high-confidence issues." What actually improves precision?

A. Route every finding through a second model that votes on whether to keep it.
B. Replace confidence language with specific categorical criteria — define which issue types to report (bugs, security flaws) and which to skip (style preferences, local idioms).
C. Have the reviewer attach a confidence score to each finding and let the pipeline filter below a threshold.
D. Strengthen the wording to "only report issues you are 95% certain about."

Show answer

Correct answer: B. Vague conservatism instructions and self-reported confidence both fail because model confidence is poorly calibrated. Specific categorical criteria — what to report, what to skip — is the tested fix. High false-positive categories can also be temporarily disabled to restore trust.

Question 45 — Prompt Engineering & Structured Output

Severity labels on review findings are inconsistent: near-identical issues are tagged "critical" one day and "minor" the next. What produces consistent classification?

A. An instruction to "think carefully about severity before labeling."
B. Explicit severity criteria with concrete code examples illustrating each severity level.
C. Removing severity levels entirely and reporting a flat list.
D. Sorting findings by severity after generation so mislabels are less visible.

Show answer

Correct answer: B. Consistent classification requires defining each level with concrete examples — the model then has reference points instead of re-deriving judgment each run. Thinking harder without criteria reproduces the inconsistency; hiding or removing labels abandons the requirement.

Question 46 — Prompt Engineering & Structured Output

When a developer pushes new commits, the review re-runs on the whole PR and reposts the same comments, burying new findings. What is the correct re-run design?

A. Review only the newest commit's diff in isolation from the rest of the PR.
B. Include the prior review findings in context and instruct Claude to report only new or still-unaddressed issues.
C. Debounce reviews to run once per day regardless of push frequency.
D. Delete all previous review comments before each re-run so duplicates never coexist.

Show answer

Correct answer: B. Passing prior findings and asking for only new or still-open issues eliminates duplicates while preserving continuity. Deleting history destroys the audit trail, isolated newest-commit review misses interactions with earlier changes, and daily debouncing delays feedback.

Question 47 — Prompt Engineering & Structured Output

You add "review your own work critically before finalizing" to the code-generation prompt, but generated bugs still slip through the same session's self-review. Why, and what works better?

A. The model needs the test suite passing before it can meaningfully self-review.
B. Self-review needs extended thinking enabled to be effective.
C. The instruction must appear at the start of the prompt rather than the end.
D. The generating session retains its own reasoning and won't question its decisions — use a second, independent Claude instance without that context to review.

Show answer

Correct answer: D. A model reviewing in the same session inherits the reasoning that produced the bug, so it confirms rather than questions. Independent review instances, lacking that context, catch subtle issues that self-review instructions and extended thinking miss.

Question 48 — Prompt Engineering & Structured Output

A 14-file PR reviewed in one pass yields uneven depth, missed obvious bugs, and contradictory verdicts on identical code in different files. What restructuring fixes this?

A. A model with a larger context window so all 14 files receive full attention.
B. Three full-PR passes, keeping only findings that appear at least twice.
C. Per-file passes for local issues plus a separate integration pass for cross-file data flow.
D. A repository rule capping PRs at 4 files before automated review runs.

Show answer

Correct answer: C. The failure is attention dilution, which context window size does not fix. Focused per-file passes ensure uniform depth, and the integration pass covers cross-file concerns. Consensus voting suppresses intermittently caught real bugs, and PR-size mandates shift the burden to developers.

Question 49 — Prompt Engineering & Structured Output

Finance asks you to cut analysis API costs. Two workflows exist: a pre-merge check developers wait on, and a nightly technical-debt report read the next morning. Where does the Message Batches API (50% savings) fit?

A. Both workflows, with a fallback that re-sends the pre-merge check synchronously if the batch is slow.
B. The nightly report only — batch has up to 24-hour processing with no latency SLA, which is fine overnight but unacceptable for a blocking check.
C. Both workflows — batches usually complete well within an hour in practice.
D. Neither workflow — batch results cannot be reliably matched back to their requests.

Show answer

Correct answer: B. Batch processing suits latency-tolerant, non-blocking work; "usually fast" is not a basis for blocking developer merges. Results correlate cleanly via custom_id, so option C's premise is false, and the fallback design adds complexity where simply matching API to workflow suffices.

Question 50 — Agentic Architecture & Orchestration

Your review workflow always has the same aspects — correctness, security, performance, and test coverage — and single-prompt reviews shortchange whichever aspect comes last. Which decomposition pattern applies?

A. Dynamic adaptive decomposition driven by what each pass discovers.
B. A single pass with the four aspects reordered randomly each run to average out the neglect.
C. Four parallel full reviews with results concatenated without deduplication.
D. Prompt chaining: a fixed sequential pipeline with one focused pass per aspect.

Show answer

Correct answer: D. A predictable multi-aspect workflow with known structure is the textbook case for prompt chaining — fixed, focused sequential passes. Adaptive decomposition is for open-ended investigation; randomized ordering institutionalizes the neglect; and concatenated parallel reviews produce contradictions.

Scenario 6: Structured Data Extraction

You are building an extraction system that pulls structured information from unstructured documents, validates output against JSON schemas, handles edge cases gracefully, and feeds downstream systems.

Question 51 — Prompt Engineering & Structured Output

Downstream parsing keeps failing on malformed JSON: trailing commas, unquoted keys, markdown fences around the payload. What is the most reliable way to guarantee schema-compliant output?

A. Lower the temperature to zero so the model formats output more carefully.
B. Add "return only valid JSON, no markdown" prominently in the system prompt.
C. Post-process the text output with a lenient JSON repair library.
D. Define an extraction tool whose input schema is your output schema and read the structured data from the tool_use block.

Show answer

Correct answer: D. Tool use with JSON schemas is the guaranteed path to syntactically valid, schema-shaped output — it eliminates the malformed-JSON class entirely. Prompt instructions and low temperature reduce but don't eliminate errors, and repair libraries guess at intent.

Question 52 — Prompt Engineering & Structured Output

Your invoice schema marks vendor_tax_id as required, and for documents that lack a tax ID the model fills the field with plausible-looking fabricated values. What is the schema-level fix?

A. Add a prompt instruction: "never invent tax IDs."
B. Fine-tune the model on invoices that lack tax IDs.
C. Post-validate tax IDs against a checksum and discard failures.
D. Make the field optional/nullable so the model can return null when the information is absent from the document.

Show answer

Correct answer: D. A required field on absent information forces the model to fabricate a value to satisfy the schema — the design creates the hallucination. Nullable fields remove the pressure. Prompt admonitions fight the schema instead of fixing it, and checksum filtering catches only detectably invalid fabrications.

Question 53 — Prompt Engineering & Structured Output

You have separate extraction tools for invoices, receipts, and purchase orders, and incoming documents arrive with unknown type. You need guaranteed structured output — never a conversational text reply. Which tool_choice setting applies?

A. "any" — the model must call some tool, and it selects the schema matching the document it sees.
B. "none", relying on the system prompt to mandate tool usage.
C. Forced selection of the invoice tool, since invoices are the most common type.
D. "auto" — the model naturally prefers tools when they are relevant.

Show answer

Correct answer: A. tool_choice: "any" guarantees a tool call while leaving schema selection to the model — exactly right for unknown document types with multiple schemas. "auto" permits text replies, forcing the invoice tool mis-extracts receipts and POs, and "none" prevents tool calls entirely.

Question 54 — Prompt Engineering & Structured Output

An extraction fails Pydantic validation: dates in the wrong format and a missing currency code. How should the retry request be constructed?

A. Include the original document, the failed extraction, and the specific validation errors so the model can self-correct.
B. Send only the validation errors to minimize tokens, since the model remembers the document.
C. Switch to a different model family for the retry to get an independent attempt.
D. Resend the identical request — validation failures are usually transient sampling noise.

Show answer

Correct answer: A. Retry-with-error-feedback works because the model sees what it produced and precisely what was wrong. The API is stateless — the model remembers nothing between requests, so the document must be re-sent. Identical resends reproduce structural errors.

Question 55 — Prompt Engineering & Structured Output

One recurring validation failure is a contract_effective_date field that is consistently empty — and inspection shows the date genuinely appears only in a separate amendment document you never provide. Will retry-with-error-feedback fix this?

A. No — date fields categorically require regex post-processing rather than model extraction.
B. No — retries cannot recover information absent from the source; the fix is supplying the amendment document (or making the field nullable).
C. Yes — after enough retries the model will locate the date it previously overlooked.
D. Yes — with the validation error highlighted, the model will infer the date from context clues.

Show answer

Correct answer: B. Retry fixes format and structural errors, not missing source information. When the data isn't in the provided document, every retry fails identically (or worse, fabricates). The distinction between retryable format errors and unretryable absent-information errors is directly tested.

Question 56 — Prompt Engineering & Structured Output

Extractions pass schema validation, yet invoices regularly arrive downstream where line items don't sum to the stated total. What does this illustrate, and what is the design response?

A. Strict schemas eliminate syntax errors but not semantic errors — add self-check fields such as calculated_total alongside stated_total with a conflict_detected boolean.
B. Tool use is the wrong mechanism for numeric data — extract totals with regex instead.
C. The schema is under-constrained — adding stricter type annotations to the amount fields will catch the mismatch.
D. The model is too small for arithmetic — route all documents to a larger model.

Show answer

Correct answer: A. Schema conformance guarantees shape, not truth: values can be well-typed and mutually inconsistent. Designing semantic validation into the schema (calculated vs stated totals, conflict flags) surfaces discrepancies for routing. Stricter types can't express cross-field arithmetic.

Question 57 — Tool Design & MCP Integration

Your extraction MCP tool returns {"isError": true, "message": "failed"} for every problem — oversized documents, malformed PDFs, permission-restricted files alike. The agent responds identically to all three. What is the fix?

A. Return structured error metadata — errorCategory (validation/permission/transient), isRetryable, and a description — so the agent can choose chunking, skipping, or retrying appropriately.
B. Map every error to a numeric code from an internal wiki page the agent can look up.
C. Have the tool retry internally until the document processes, hiding errors from the agent.
D. Log errors server-side and always return success so the pipeline never stalls.

Show answer

Correct answer: A. Uniform error responses prevent differentiated recovery: an oversized document needs chunking, a permission error needs skipping/escalation, a timeout deserves retry. Structured categories and retryability flags enable those decisions. Silent success-masking is the canonical anti-pattern.

Question 58 — Context Management & Reliability

Your dashboard shows 97% overall extraction accuracy, so leadership wants to remove human review entirely. What must you verify first?

A. That the 97% holds when re-measured on a different random seed.
B. That the model version is pinned so accuracy cannot drift after review is removed.
C. That competitors operate at comparable automation levels for the same document classes.
D. Accuracy segmented by document type and by field — aggregate metrics can mask poor performance on specific segments.

Show answer

Correct answer: D. Aggregate accuracy hides segment failures — 97% overall can coexist with much worse results on one document type or one field. Segment-level validation (plus stratified sampling of high-confidence extractions on an ongoing basis) is the prerequisite for reducing review.

Question 59 — Context Management & Reliability

You have capacity to human-review only 10% of extractions. How should that capacity be allocated?

A. Review the 10% of documents with the highest dollar amounts.
B. Review the first 10% of each day's volume to catch problems early in the batch.
C. Review a uniform random 10% so every document type has equal representation.
D. Route extractions with low calibrated field-level confidence or ambiguous/contradictory source documents to review, calibrating thresholds against a labeled validation set.

Show answer

Correct answer: D. Confidence-routed review spends scarce reviewer attention where errors concentrate — but only after calibrating the model's confidence scores against labeled data, since raw confidence is unreliable. Uniform sampling has a place for monitoring, not as the primary allocation.

Question 60 — Agentic Architecture & Orchestration

Regulation requires that extracted records pass a validate_extraction step before store_record ever runs. The prompt instructs this ordering, but logs show occasional direct stores. What is the correct enforcement?

A. A nightly reconciliation job that deletes stored records lacking validation stamps.
B. A programmatic prerequisite that blocks store_record until validate_extraction has succeeded for that record.
C. Few-shot transcripts demonstrating validation always preceding storage.
D. A louder prompt: capitalize the ordering rule and repeat it in three places.

Show answer

Correct answer: B. Required orderings with compliance consequences need deterministic gates, not probabilistic prompt compliance — the same principle as identity-before-refund. Prompt emphasis and examples lower the violation rate without eliminating it; after-the-fact deletion means the violation still occurred.

How Did You Score?

Correct	Approx. scaled score	Verdict
54–60	910–1000	Exam-ready — book it
48–53	820–895	Ready with a safety margin
42–47	730–805	Borderline — shore up weak domains
Below 42	Under 720	Not yet — revisit the course lessons

Weak on a specific domain? Each maps to a course lesson: Agentic Loops, Multi-Agent Orchestration, Tool Design & MCP, Claude Code Workflows, Structured Output, and Context & Reliability.

This is unofficial practice material created for exam preparation and is not affiliated with or endorsed by Anthropic.