Artificial IntelligenceAnthropicClaude API

Claude API Pricing Explained: Tokens, Cost Tiers, and Batch Savings

TT
TopicTrick
Claude API Pricing Explained: Tokens, Cost Tiers, and Batch Savings

One of the most common questions from developers new to the Claude API is: "How much is this actually going to cost me?" It is a practical, important question — and the answer is more nuanced than a simple price list suggests.

Claude's pricing model is usage-based. You pay for the tokens you consume, not for a monthly seat licence. That is good news for development and experimentation, where your usage is low. It also means you need to understand the pricing model properly before you scale, because the wrong design decisions can make your costs significantly higher than they need to be.

This guide explains exactly how Claude pricing works, what each model costs, and — most importantly — how features like prompt caching and the Batch API can dramatically reduce your real-world bill.


What is a Token?

Everything in Claude's pricing is measured in tokens. A token is the basic unit of text that the model processes — neither a character nor a word, but something in between.

The rough rule of thumb for English text:

  • 1 token ≈ 4 characters of text
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words
  • 1,000 tokens ≈ 750 words ≈ roughly 1.5 pages of typed text

For code, the token count tends to be higher per character because programming syntax includes many short tokens for punctuation, keywords, and whitespace. For other languages like Chinese or Japanese, more characters fit into fewer tokens due to how those character systems map to the model's vocabulary.

Count Your Tokens Before You Commit

The Anthropic API provides a Token Counting endpoint (POST /v1/messages/count_tokens) that tells you exactly how many tokens a message will consume before you send it. Use this during development to understand the cost profile of your prompts — especially your system prompts, which are charged on every single API call.


    Input Tokens vs Output Tokens

    Claude charges differently for input and output tokens, and the rates are not symmetric. Output tokens — the text Claude generates — cost significantly more than input tokens.

    This is not arbitrary. Generating tokens is computationally more expensive than reading them. The model must perform a full forward pass through the neural network for each output token it generates. Reading the input happens in a single, parallelised forward pass.

    What counts as input tokens:

    • Every word in your system prompt
    • Every message in the conversation history (user and assistant turns)
    • Tool definitions you pass in the request
    • Document content (images are converted to tokens based on their dimensions)

    What counts as output tokens:

    • Every word Claude generates in its response
    • Tool call arguments Claude generates
    • Thinking tokens when extended thinking is enabled (at a different rate)

    Current Model Pricing (March 2026)

    Claude Opus 4.6

    • Input: $5.00 per million tokens
    • Output: $25.00 per million tokens
    • Cache writes: $6.25 per million tokens
    • Cache reads: $0.50 per million tokens

    Claude Sonnet 4.6

    • Input: $3.00 per million tokens
    • Output: $15.00 per million tokens
    • Cache writes: $3.75 per million tokens
    • Cache reads: $0.30 per million tokens

    Claude Haiku 4.5

    • Input: $1.00 per million tokens
    • Output: $5.00 per million tokens
    • Cache writes: $1.25 per million tokens
    • Cache reads: $0.10 per million tokens

    Pricing Changes Over Time

    AI model pricing has consistently decreased as infrastructure costs fall and competition increases. Always check the Anthropic pricing page at anthropic.com/pricing for the current rates — what you see there will always be more current than any guide.


      Real-World Cost Examples

      Abstract numbers become meaningful when you apply them to actual use cases.

      Example 1: A Simple Q&A Query

      A user asks a single question with a 200-token system prompt, a 50-token question, and Claude gives a 300-token answer.

      • Input tokens: 250 (system prompt + question)
      • Output tokens: 300
      • Cost using Sonnet 4.6: (250 × $0.000003) + (300 × $0.000015) = $0.00075 + $0.0045 = $0.00525 per query
      • At 10,000 queries per month: $52.50

      Example 2: Document Summarisation

      A user uploads a 10-page PDF (~5,000 tokens) with a 300-token system prompt and asks for a summary (50-token question). Claude produces a 500-token summary.

      • Input tokens: 5,350
      • Output tokens: 500
      • Cost using Sonnet 4.6: (5,350 × $0.000003) + (500 × $0.000015) = $0.01605 + $0.0075 = $0.024 per document
      • At 1,000 documents per month: $24.00

      Example 3: Customer Support Chatbot (Multi-turn)

      A typical support conversation has 8 turns. The system prompt is 500 tokens, and each turn grows the history by roughly 200 tokens. Average output per turn is 150 tokens.

      • Total input tokens across 8 turns (with growing history): approximately 12,000 tokens
      • Total output tokens: approximately 1,200 tokens
      • Cost using Sonnet 4.6: (12,000 × $0.000003) + (1,200 × $0.000015) = $0.036 + $0.018 = $0.054 per conversation
      • At 5,000 conversations per month: $270

      Prompt Caching: Your Most Powerful Cost-Reduction Tool

      Prompt caching is arguably the most impactful cost-reduction feature available in the Claude API. It works by storing portions of your prompt on Anthropic's servers for subsequent reuse.

      How Caching Works

      When you enable caching on a portion of your prompt — typically the system prompt and any large, static documents you pass — Anthropic stores that processed prompt on its infrastructure. The next time you make an API call that uses the same cached content, you are charged at the cache read rate rather than the full input rate.

      The standard cache duration is 5 minutes. An extended 1-hour cache is also available for content that is accessed less frequently but still important to cache.

      Cache Write vs Cache Read Costs

      The first time you request caching of a piece of text, you pay the cache write rate (slightly higher than standard input). Every subsequent call that reads from that cache pays the cache read rate:

      • Cache read (Sonnet): $0.30 per million tokens — that is 90% cheaper than the standard $3.00 input rate
      • Cache read (Opus): $0.50 per million tokens — 90% cheaper than the standard $5.00 input rate

      When Caching Makes Sense

      • Large system prompts: If your system prompt is 2,000 tokens and you make 10,000 API calls per month, the system prompt alone costs $60 per month at standard rates. With caching, it costs $3 per month for reads — a $57 monthly saving on inputs alone.
      • Static documents: If you are building a RAG application that retrieves the same reference documents frequently, cache those documents rather than re-uploading them with every call
      • Conversation history patterns: In long conversations, cache the earlier turns that are unlikely to change, so only the latest few turns are charged at full input rate

      The Batch API: 50% Off for Non-Real-Time Work

      The Message Batches API processes your requests asynchronously and charges 50% of the standard API price. The trade-off is that batch processing takes up to 24 hours rather than responding immediately.

      When to Use the Batch API

      • Overnight data processing: Classifying, tagging, or summarising large datasets that do not need real-time results
      • Bulk document analysis: Processing hundreds or thousands of documents where same-day results are acceptable
      • Training data generation: Generating synthetic datasets or annotations for other models
      • Report generation: Generating scheduled reports that run overnight and are ready for morning review

      Combine Caching and Batch for Maximum Savings

      The two biggest cost levers are caching (for repeated context) and batch processing (for non-real-time work). On a large document processing workload with a shared system prompt, combining caching on the system prompt with batch processing on the document calls can reduce your effective cost by 90-95% compared to naive synchronous calls without caching.


        Token Counting API: Know Before You Pay

        The Token Counting API lets you calculate the exact token count of a message before sending it, without being charged for the actual API call.

        This is invaluable for:

        • Enforcing context limits: Prevent errors by checking that a message will fit within the model's context window before sending
        • Cost estimation: Show users an estimated cost before they run a large batch job
        • Prompt optimisation: During development, measure the token cost of different system prompt versions to find the most efficient wording
        • Dynamic truncation: If a request would exceed your budget or context window, truncate conversation history until it fits

        Cost Management Best Practices

        Set Spend Limits

        Configure monthly spend limits in the Anthropic Console. The API will return a 429 error when the limit is reached rather than continuing to charge.

        Use the Right Model for Each Task

        Do not use Opus for tasks where Haiku or Sonnet is sufficient. A task that costs $0.05 with Opus costs $0.01 with Sonnet and $0.002 with Haiku. At scale, that difference is enormous.

        Optimise Your System Prompts

        Every unnecessary word in your system prompt costs money on every API call. Review your system prompts during development and trim anything that does not materially improve Claude's behaviour.

        Monitor Usage in the Console

        The Usage section of the Anthropic Console breaks down your token consumption by model, day, and API key. Review this weekly to catch unexpected spikes before they become significant bills.

        Infinite Loop Risk

        Agentic applications that make multiple Claude calls in a loop are at risk of runaway costs if error handling is incomplete. A bug that causes an agent to loop infinitely can consume your entire monthly budget in hours. Always implement maximum iteration limits, spend monitoring, and emergency kill switches in agent code.


          Summary

          Claude API pricing is usage-based and transparent. The key levers for managing your costs are:

          • Choosing the right model for each task (Haiku, Sonnet, or Opus)
          • Enabling prompt caching for large, repeated context — up to 90% savings on cached tokens
          • Using the Batch API for non-real-time workloads — 50% off standard rates
          • Using the Token Counting API during development to understand your cost profile before deploying
          • Setting spend limits in the Console to prevent runaway costs

          You now have a solid practical foundation for the Claude API. Before we move into prompt engineering, let us check your understanding with a short knowledge assessment.

          In our next post, we test what you have learned in Modules 1 and 2: Knowledge Check: Claude Foundations and API Basics Quiz.


          This post is part of the Anthropic AI Tutorial Series. Don't forget to check out our previous post: Understanding the Claude Messages API: Roles, Turns, and System Prompts.