When should I use Extended Thinking vs a standard prompt?

Use Extended Thinking for tasks involving complex multi-step reasoning, mathematics, logic puzzles, strategic planning, or nuanced analysis where errors compound. It is not necessary for straightforward text generation, summarisation, classification, or simple Q&A. Extended Thinking uses significantly more tokens than standard calls - use it selectively where the quality improvement justifies the additional cost.

How much does Extended Thinking cost compared to a standard call?

Extended Thinking uses a thinking token budget you configure - typically 1,000 to 10,000 tokens for the internal reasoning. These thinking tokens are charged at the standard input token rate. For a task that previously used 500 output tokens, Extended Thinking might use 5,000 thinking tokens plus 500 output tokens - roughly 10x the output cost. The quality improvement for genuinely complex reasoning tasks often justifies this.

Can I see Claude's thinking process with Extended Thinking?

Yes - when Extended Thinking is enabled, the API response includes a thinking content block containing Claude internal reasoning, as well as the standard text block with the final answer. You can display this to users (useful for research or transparency) or discard it and use only the final answer. The thinking content is not re-sent in subsequent turns, keeping follow-up requests efficient.

Claude Extended Thinking: When to Use It and What It Actually Does

← Back to Claude API Hub

One of the most impressive things a skilled human expert does is not just knowing the answer - it is visibly thinking through a hard problem before committing to a conclusion. They consider alternatives, check their assumptions, notice contradictions, and revise their reasoning before speaking.

Claude's extended thinking mode is designed to replicate exactly this behaviour. Instead of producing an immediate response, Claude works through a problem step by step in a dedicated thinking phase before generating its final answer. The thinking is transparent - you can see the full reasoning trace if you want to.

What is Claude Extended Thinking?

Claude extended thinking is a feature that adds a dedicated reasoning phase before Claude generates its final response. During this phase, Claude explores the problem, tests approaches, catches errors in its own logic, and builds to a conclusion - all visible in the API response as thinking content blocks. For complex technical, analytical, and strategic tasks, this reasoning phase produces measurably better answers than standard mode.

For certain categories of problems, extended thinking produces dramatically better results than standard Claude responses. For others, it is unnecessary overhead. This guide explains exactly what extended thinking is, how to use it, and when it is worth the additional cost.

What is Extended Thinking?

Extended thinking is a feature that gives Claude a dedicated scratchpad phase before it produces its final response. During this phase, Claude reasons through the problem - exploring different approaches, testing hypotheses, identifying errors in its own reasoning, and building towards a conclusion.

The thinking phase is visible in the API response as a series of thinking content blocks. These blocks contain Claude's internal reasoning and are not shown to end users by default - they are there for developers who want to audit or understand how Claude reached its conclusion.

The final response - the text block - is what you would show to your users. It is typically more accurate, better structured, and more thoroughly reasoned than a response produced without extended thinking.

Extended Thinking is Not Just Chain-of-Thought

Extended thinking is different from asking Claude to 'think step by step' in your prompt. Prompt-based chain-of-thought adds reasoning to the output text. Extended thinking uses a genuinely separate computational phase that runs before the response is generated - the thinking tokens are processed differently from output tokens and unlock qualitatively different reasoning capability.

Adaptive Thinking vs Extended Thinking

Anthropic offers two modes of enhanced reasoning for Claude Opus 4.6:

Adaptive Thinking (Recommended for Opus 4.6)

Adaptive thinking lets Claude dynamically decide how much thinking to apply based on the difficulty of the problem. Simple questions get quick answers. Complex problems get deep reasoning. You control the overall thinking budget with the effort parameter rather than micromanaging thinking tokens.

effort: "low": Minimal thinking - fast, token-efficient, suitable for straightforward tasks
effort: "medium": Moderate thinking - good balance for most professional tasks
effort: "high": Maximum thinking - reserved for the most difficult reasoning tasks

Adaptive thinking is the recommended default for Claude Opus 4.6. It automatically allocates thinking budget where it actually helps rather than applying fixed thinking to every query regardless of complexity.

Budget-Constrained Extended Thinking

Extended thinking for Claude Sonnet 4.6 uses a budget_tokens parameter to set a maximum number of thinking tokens. Claude will spend up to (but not more than) that token budget on reasoning before producing its response.

Enabling Extended Thinking via the API

Python - Adaptive Thinking (Opus 4.6)

python

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "high"
    },
    messages=[{
        "role": "user",
        "content": "What are the key trade-offs in choosing between a microservices architecture and a monolithic architecture for a fintech startup with 3 developers and 1 year of runway?"
    }]
)

# Print thinking blocks and response separately
for block in response.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
        print("=== END THINKING ===")
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

Python - Budget Thinking (Sonnet 4.6)

python

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Review this Python function for correctness, edge cases, and potential improvements."
    }]
)

max_tokens Must Exceed Budget Tokens

The max_tokens parameter covers the total of thinking tokens plus output tokens. If you set budget_tokens to 10,000, your max_tokens must be at least 10,000 plus the expected output length. Setting max_tokens too low will result in the model being unable to finish its thinking or its response. A safe starting point is max_tokens: 16,000 for moderate tasks.

Understanding the Thinking Content Blocks

When extended thinking is enabled, the response content array includes thinking blocks alongside text blocks.

python

# Example response.content structure:
[
    {
        "type": "thinking",
        "thinking": "Let me think about this carefully. The user is asking about microservices vs monolith for a fintech startup. Key considerations: team size (3 devs is very small), runway (1 year is tight), fintech domain (regulatory complexity, security requirements)..."
    },
    {
        "type": "text",
        "text": "For a 3-developer fintech startup with 1 year of runway, a well-structured monolith is almost certainly the right choice. Here is why..."
    }
]

The thinking block shows Claude's raw reasoning - it may be exploratory, sometimes contradictory, and often quite long. The text block is the polished, final response.

Should You Show Thinking to Users?

In most consumer-facing applications, you would not show the thinking block to users - only the text block. However, there are valuable use cases for showing or using the thinking output:

Debugging and auditing: Developers can review thinking blocks to understand why Claude reached a particular conclusion and identify reasoning errors
Trust-building in regulated industries: In healthcare or legal applications, showing the reasoning trace to qualified professionals lets them verify the logic before acting on the conclusion
Educational tools: Showing how Claude works through a problem step by step can help students learn problem-solving approaches

When Does Extended Thinking Make a Real Difference?

Extended thinking is not universally better - it is specifically better for problems where reasoning quality is the limiting factor.

Tasks Where Extended Thinking Excels

Complex multi-step mathematics: Proofs, derivations, and calculations with many interdependent steps where an error early propagates to an incorrect conclusion
Code debugging: Identifying subtle bugs in complex codebases, especially logic errors where the cause is non-obvious
Strategic analysis: Business strategy, architectural decisions, trade-off analysis where multiple competing factors must be weighed systematically
Legal and compliance review: Identifying regulatory risks in complex documents where thorough, methodical reading is required
Scientific reasoning: Interpreting experimental data, evaluating research methodology, drawing conclusions from evidence
Competitive programming challenges: Algorithm design problems that require exploring multiple approaches before finding the correct solution

Tasks Where Extended Thinking Adds Little Value

Simple factual questions with well-established answers
Creative writing, brainstorming, and open-ended generation tasks
Classification, tagging, and extraction tasks where the patterns are clear
Summarisation of straightforward documents
Any task where speed is critical and accuracy on complex reasoning is not

Practical Example: Code Architecture Review

Here is a concrete example where extended thinking produces a qualitatively better result than standard mode.

Task: Review a proposed database schema for a multi-tenant SaaS application and identify design issues.

Without extended thinking, Claude might identify two or three obvious issues - missing indexes, nullable foreign keys. With extended thinking on high effort, Claude's reasoning trace reveals it worked through:

Tenant isolation approach - are all tables filtered by tenant_id? Are there any tables that lack the tenant_id column that should have it?
Data leakage risks - are there any query patterns that could allow cross-tenant data access if a developer forgets to add the tenant filter?
Scalability implications - with millions of rows per tenant, what happens to queries that do full table scans?
Migration complexity - if the schema needs to change in future, how difficult would it be given this current design?

The final response identifies issues the standard model missed - including the subtle data isolation vulnerability - because the thinking phase allowed thorough, systematic analysis.

Pricing for Extended Thinking

Thinking tokens are charged at a different rate from standard output tokens. This varies by model and is detailed on the Anthropic pricing page. As a rough guide:

Thinking tokens cost more than standard output tokens because of the additional computational work involved
With effort: "high", a single complex query can generate thousands of thinking tokens, adding meaningful cost
Use extended thinking selectively - on the tasks where it genuinely improves quality, not on every request

Start with effort: medium

When first experimenting with adaptive thinking, start with effort: medium. In many cases, medium effort produces results close to high effort at lower cost. Run the same test query at medium and high effort and compare - if the quality difference is not meaningful for your use case, medium is the right default.

Summary

Extended thinking is one of Claude's most distinctive capabilities. It transforms Claude from a model that generates responses quickly into one that reasons carefully before committing to an answer. For complex technical, analytical, and strategic tasks, this difference is substantial.

The key decisions:

Use adaptive thinking with the effort parameter for Claude Opus 4.6
Use budget tokens for Sonnet 4.6 when you need explicit control over thinking cost
Always check that max_tokens exceeds budget_tokens plus expected output
Use extended thinking selectively for tasks where reasoning quality is the constraint
Review thinking blocks during development and auditing to verify Claude's reasoning

In our next post, we look at a capability that is critical for building production AI applications - getting Claude to produce structured, machine-readable output every time: Structured Outputs with Claude: Getting JSON Every Time.

Extended thinking pairs naturally with the Claude model family guide - Opus 4.6's adaptive thinking is the recommended approach for the most demanding tasks. For building multi-step agents that benefit from reasoning at each step, see the Claude agentic loop explained post.

The Anthropic extended thinking documentation covers the current budget_tokens limits, supported models, and interaction with streaming. Check the Anthropic pricing page for current thinking token rates, which differ from standard output token rates.

This post is part of the Anthropic AI Tutorial Series. Don't forget to check out our previous post: Advanced Prompting Techniques: Chain-of-Thought, Role Prompts, and Few-Shot.

External references:

Frequently Asked Questions

Q: What is Claude's Extended Thinking feature? Extended Thinking allows Claude to spend extra tokens on internal reasoning before producing its final answer. Claude writes a hidden "thinking" scratchpad where it breaks down the problem, considers multiple approaches, and self-corrects - similar to chain-of-thought reasoning but more structured. The final response benefits from this deeper deliberation. You enable it via the thinking parameter in the API with a budget_tokens value that caps the reasoning tokens.

Q: When should you use Extended Thinking versus a standard Claude API call? Use Extended Thinking for problems that genuinely benefit from multi-step reasoning: complex maths, logic puzzles, multi-constraint planning, ambiguous instructions that need careful interpretation, or any task where answer quality matters more than latency and cost. For simple lookups, creative writing, or conversational tasks, standard generation is faster and cheaper - extended thinking adds latency proportional to the reasoning budget.

Q: How does Extended Thinking affect API costs? Thinking tokens count as output tokens and are billed at the output token rate. If you set budget_tokens: 10000, Claude may use up to 10,000 tokens of internal reasoning before its visible response - all billed at output prices. The thinking content is not returned in the response by default (it is hidden), but it still costs tokens. Set the budget conservatively and increase it only if response quality plateaus at lower budgets.

Part of the Claude AI Masterclass. See the Claude API Complete Guide for the full learning path.