Claude Extended Thinking: How to Unlock Deep Reasoning

One of the most impressive things a skilled human expert does is not just knowing the answer — it is visibly thinking through a hard problem before committing to a conclusion. They consider alternatives, check their assumptions, notice contradictions, and revise their reasoning before speaking.
Claude's extended thinking mode is designed to replicate exactly this behaviour. Instead of producing an immediate response, Claude works through a problem step by step in a dedicated thinking phase before generating its final answer. The thinking is transparent — you can see the full reasoning trace if you want to.
For certain categories of problems, extended thinking produces dramatically better results than standard Claude responses. For others, it is unnecessary overhead. This guide explains exactly what extended thinking is, how to use it, and when it is worth the additional cost.
What is Extended Thinking?
Extended thinking is a feature that gives Claude a dedicated scratchpad phase before it produces its final response. During this phase, Claude reasons through the problem — exploring different approaches, testing hypotheses, identifying errors in its own reasoning, and building towards a conclusion.
The thinking phase is visible in the API response as a series of thinking content blocks. These blocks contain Claude's internal reasoning and are not shown to end users by default — they are there for developers who want to audit or understand how Claude reached its conclusion.
The final response — the text block — is what you would show to your users. It is typically more accurate, better structured, and more thoroughly reasoned than a response produced without extended thinking.
Extended Thinking is Not Just Chain-of-Thought
Extended thinking is different from asking Claude to 'think step by step' in your prompt. Prompt-based chain-of-thought adds reasoning to the output text. Extended thinking uses a genuinely separate computational phase that runs before the response is generated — the thinking tokens are processed differently from output tokens and unlock qualitatively different reasoning capability.
Adaptive Thinking vs Extended Thinking
Anthropic offers two modes of enhanced reasoning for Claude Opus 4.6:
Adaptive Thinking (Recommended for Opus 4.6)
Adaptive thinking lets Claude dynamically decide how much thinking to apply based on the difficulty of the problem. Simple questions get quick answers. Complex problems get deep reasoning. You control the overall thinking budget with the effort parameter rather than micromanaging thinking tokens.
- effort: "low": Minimal thinking — fast, token-efficient, suitable for straightforward tasks
- effort: "medium": Moderate thinking — good balance for most professional tasks
- effort: "high": Maximum thinking — reserved for the most difficult reasoning tasks
Adaptive thinking is the recommended default for Claude Opus 4.6. It automatically allocates thinking budget where it actually helps rather than applying fixed thinking to every query regardless of complexity.
Budget-Constrained Extended Thinking
Extended thinking for Claude Sonnet 4.6 uses a budget_tokens parameter to set a maximum number of thinking tokens. Claude will spend up to (but not more than) that token budget on reasoning before producing its response.
Enabling Extended Thinking via the API
Python — Adaptive Thinking (Opus 4.6)
1import anthropic
2
3client = anthropic.Anthropic()
4
5response = client.messages.create(
6 model="claude-opus-4-6",
7 max_tokens=16000,
8 thinking={
9 "type": "adaptive",
10 "effort": "high"
11 },
12 messages=[{
13 "role": "user",
14 "content": "What are the key trade-offs in choosing between a microservices architecture and a monolithic architecture for a fintech startup with 3 developers and 1 year of runway?"
15 }]
16)
17
18# Print thinking blocks and response separately
19for block in response.content:
20 if block.type == "thinking":
21 print("=== THINKING ===")
22 print(block.thinking)
23 print("=== END THINKING ===")
24 elif block.type == "text":
25 print("=== RESPONSE ===")
26 print(block.text)Python — Budget Thinking (Sonnet 4.6)
1response = client.messages.create(
2 model="claude-sonnet-4-6",
3 max_tokens=16000,
4 thinking={
5 "type": "enabled",
6 "budget_tokens": 10000
7 },
8 messages=[{
9 "role": "user",
10 "content": "Review this Python function for correctness, edge cases, and potential improvements."
11 }]
12)max_tokens Must Exceed Budget Tokens
The max_tokens parameter covers the total of thinking tokens plus output tokens. If you set budget_tokens to 10,000, your max_tokens must be at least 10,000 plus the expected output length. Setting max_tokens too low will result in the model being unable to finish its thinking or its response. A safe starting point is max_tokens: 16,000 for moderate tasks.
Understanding the Thinking Content Blocks
When extended thinking is enabled, the response content array includes thinking blocks alongside text blocks.
1# Example response.content structure:
2[
3 {
4 "type": "thinking",
5 "thinking": "Let me think about this carefully. The user is asking about microservices vs monolith for a fintech startup. Key considerations: team size (3 devs is very small), runway (1 year is tight), fintech domain (regulatory complexity, security requirements)..."
6 },
7 {
8 "type": "text",
9 "text": "For a 3-developer fintech startup with 1 year of runway, a well-structured monolith is almost certainly the right choice. Here is why..."
10 }
11]The thinking block shows Claude's raw reasoning — it may be exploratory, sometimes contradictory, and often quite long. The text block is the polished, final response.
Should You Show Thinking to Users?
In most consumer-facing applications, you would not show the thinking block to users — only the text block. However, there are valuable use cases for showing or using the thinking output:
- Debugging and auditing: Developers can review thinking blocks to understand why Claude reached a particular conclusion and identify reasoning errors
- Trust-building in regulated industries: In healthcare or legal applications, showing the reasoning trace to qualified professionals lets them verify the logic before acting on the conclusion
- Educational tools: Showing how Claude works through a problem step by step can help students learn problem-solving approaches
When Does Extended Thinking Make a Real Difference?
Extended thinking is not universally better — it is specifically better for problems where reasoning quality is the limiting factor.
Tasks Where Extended Thinking Excels
- Complex multi-step mathematics: Proofs, derivations, and calculations with many interdependent steps where an error early propagates to an incorrect conclusion
- Code debugging: Identifying subtle bugs in complex codebases, especially logic errors where the cause is non-obvious
- Strategic analysis: Business strategy, architectural decisions, trade-off analysis where multiple competing factors must be weighed systematically
- Legal and compliance review: Identifying regulatory risks in complex documents where thorough, methodical reading is required
- Scientific reasoning: Interpreting experimental data, evaluating research methodology, drawing conclusions from evidence
- Competitive programming challenges: Algorithm design problems that require exploring multiple approaches before finding the correct solution
Tasks Where Extended Thinking Adds Little Value
- Simple factual questions with well-established answers
- Creative writing, brainstorming, and open-ended generation tasks
- Classification, tagging, and extraction tasks where the patterns are clear
- Summarisation of straightforward documents
- Any task where speed is critical and accuracy on complex reasoning is not
Practical Example: Code Architecture Review
Here is a concrete example where extended thinking produces a qualitatively better result than standard mode.
Task: Review a proposed database schema for a multi-tenant SaaS application and identify design issues.
Without extended thinking, Claude might identify two or three obvious issues — missing indexes, nullable foreign keys. With extended thinking on high effort, Claude's reasoning trace reveals it worked through:
- Tenant isolation approach — are all tables filtered by tenant_id? Are there any tables that lack the tenant_id column that should have it?
- Data leakage risks — are there any query patterns that could allow cross-tenant data access if a developer forgets to add the tenant filter?
- Scalability implications — with millions of rows per tenant, what happens to queries that do full table scans?
- Migration complexity — if the schema needs to change in future, how difficult would it be given this current design?
The final response identifies issues the standard model missed — including the subtle data isolation vulnerability — because the thinking phase allowed thorough, systematic analysis.
Pricing for Extended Thinking
Thinking tokens are charged at a different rate from standard output tokens. This varies by model and is detailed on the Anthropic pricing page. As a rough guide:
- Thinking tokens cost more than standard output tokens because of the additional computational work involved
- With effort: "high", a single complex query can generate thousands of thinking tokens, adding meaningful cost
- Use extended thinking selectively — on the tasks where it genuinely improves quality, not on every request
Start with effort: medium
When first experimenting with adaptive thinking, start with effort: medium. In many cases, medium effort produces results close to high effort at lower cost. Run the same test query at medium and high effort and compare — if the quality difference is not meaningful for your use case, medium is the right default.
Summary
Extended thinking is one of Claude's most distinctive capabilities. It transforms Claude from a model that generates responses quickly into one that reasons carefully before committing to an answer. For complex technical, analytical, and strategic tasks, this difference is substantial.
The key decisions:
- Use adaptive thinking with the effort parameter for Claude Opus 4.6
- Use budget tokens for Sonnet 4.6 when you need explicit control over thinking cost
- Always check that max_tokens exceeds budget_tokens plus expected output
- Use extended thinking selectively for tasks where reasoning quality is the constraint
- Review thinking blocks during development and auditing to verify Claude's reasoning
In our next post, we look at a capability that is critical for building production AI applications — getting Claude to produce structured, machine-readable output every time: Structured Outputs with Claude: Getting JSON Every Time.
This post is part of the Anthropic AI Tutorial Series. Don't forget to check out our previous post: Advanced Prompting Techniques: Chain-of-Thought, Role Prompts, and Few-Shot.
