How do I estimate how many tokens my prompt uses?

A rough rule of thumb is 1 token per 4 characters of English text, or about 750 words per 1,000 tokens. The Anthropic Tokenizer tool gives exact counts. In your application, the usage field in every API response reports exact input and output token counts for that call - log these to track actual usage. Tool definitions count as input tokens and can be substantial for large tool sets.

What is prompt caching and how much does it save?

Prompt caching stores a prefix of your prompt on Anthropic servers so subsequent requests reuse it without reprocessing. Cached input tokens cost approximately 10% of the normal input token price on cache hits, and you pay a one-time cache write cost of about 125% on the first request. For applications with long stable system prompts or reference documents sent on every call, caching typically reduces costs by 70-90% on the cached portion.

Which Claude model gives the best price-to-performance ratio?

Claude Sonnet is the best price-to-performance choice for most production applications. It offers most of the capability of Opus at a fraction of the cost, and significantly more capability than Haiku. Use Haiku for high-volume, latency-sensitive classification or routing tasks where speed and cost matter more than depth. Use Opus for complex reasoning, research, and tasks where quality differences are measurable and worth the cost.

Claude API Pricing 2026: Cost Per Token Explained

← Back to Claude API Hub

One of the most common questions from developers new to the Claude API is: "How much is this actually going to cost me?" It is a practical, important question - and the answer is more nuanced than a simple price list suggests.

Claude's pricing model is usage-based. You pay for the tokens you consume, not for a monthly seat licence. That is good news for development and experimentation, where your usage is low. It also means you need to understand the pricing model properly before you scale, because the wrong design decisions can make your costs significantly higher than they need to be.

How Claude API Pricing Works

Claude API pricing charges separately for input tokens (your prompts and conversation history) and output tokens (Claude's responses). Output tokens cost roughly 5x more than input tokens because generating each token requires a full neural network forward pass. Prompt caching reduces repeated input costs by up to 90%, and the Batch API halves all costs for non-real-time workloads.

This guide explains exactly how Claude pricing works, what each model costs, and - most importantly - how features like prompt caching and the Batch API can dramatically reduce your real-world bill.

What is a Token?

Everything in Claude's pricing is measured in tokens. A token is the basic unit of text that the model processes - neither a character nor a word, but something in between.

The rough rule of thumb for English text:

1 token ≈ 4 characters of text
1 token ≈ 0.75 words
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words ≈ roughly 1.5 pages of typed text

For code, the token count tends to be higher per character because programming syntax includes many short tokens for punctuation, keywords, and whitespace. For other languages like Chinese or Japanese, more characters fit into fewer tokens due to how those character systems map to the model's vocabulary.

Count Your Tokens Before You Commit

The Anthropic API provides a Token Counting endpoint (POST /v1/messages/count_tokens) that tells you exactly how many tokens a message will consume before you send it. Use this during development to understand the cost profile of your prompts - especially your system prompts, which are charged on every single API call.

Input Tokens vs Output Tokens

Claude charges differently for input and output tokens, and the rates are not symmetric. Output tokens - the text Claude generates - cost significantly more than input tokens.

This is not arbitrary. Generating tokens is computationally more expensive than reading them. The model must perform a full forward pass through the neural network for each output token it generates. Reading the input happens in a single, parallelised forward pass.

What counts as input tokens:

Every word in your system prompt
Every message in the conversation history (user and assistant turns)
Tool definitions you pass in the request
Document content (images are converted to tokens based on their dimensions)

What counts as output tokens:

Every word Claude generates in its response
Tool call arguments Claude generates
Thinking tokens when extended thinking is enabled (at a different rate)

Current Model Pricing (March 2026)

Claude Opus 4.6

Input: $5.00 per million tokens
Output: $25.00 per million tokens
Cache writes: $6.25 per million tokens
Cache reads: $0.50 per million tokens

Claude Sonnet 4.6

Input: $3.00 per million tokens
Output: $15.00 per million tokens
Cache writes: $3.75 per million tokens
Cache reads: $0.30 per million tokens

Claude Haiku 4.5

Input: $1.00 per million tokens
Output: $5.00 per million tokens
Cache writes: $1.25 per million tokens
Cache reads: $0.10 per million tokens

Pricing Changes Over Time

AI model pricing has consistently decreased as infrastructure costs fall and competition increases. Always check the Anthropic pricing page at anthropic.com/pricing for the current rates - what you see there will always be more current than any guide.

Real-World Cost Examples

Abstract numbers become meaningful when you apply them to actual use cases.

Example 1: A Simple Q&A Query

A user asks a single question with a 200-token system prompt, a 50-token question, and Claude gives a 300-token answer.

Input tokens: 250 (system prompt + question)
Output tokens: 300
Cost using Sonnet 4.6: (250 x $0.000003) + (300 x $0.000015) = $0.00075 + $0.0045 = $0.00525 per query
At 10,000 queries per month: $52.50

Example 2: Document Summarisation

A user uploads a 10-page PDF (~5,000 tokens) with a 300-token system prompt and asks for a summary (50-token question). Claude produces a 500-token summary.

Input tokens: 5,350
Output tokens: 500
Cost using Sonnet 4.6: (5,350 x $0.000003) + (500 x $0.000015) = $0.01605 + $0.0075 = $0.024 per document
At 1,000 documents per month: $24.00

Example 3: Customer Support Chatbot (Multi-turn)

A typical support conversation has 8 turns. The system prompt is 500 tokens, and each turn grows the history by roughly 200 tokens. Average output per turn is 150 tokens.

Total input tokens across 8 turns (with growing history): approximately 12,000 tokens
Total output tokens: approximately 1,200 tokens
Cost using Sonnet 4.6: (12,000 x $0.000003) + (1,200 x $0.000015) = $0.036 + $0.018 = $0.054 per conversation
At 5,000 conversations per month: $270

Prompt Caching: Your Most Powerful Cost-Reduction Tool

Prompt caching is arguably the most impactful cost-reduction feature available in the Claude API. It works by storing portions of your prompt on Anthropic's servers for subsequent reuse.

How Caching Works

When you enable caching on a portion of your prompt - typically the system prompt and any large, static documents you pass - Anthropic stores that processed prompt on its infrastructure. The next time you make an API call that uses the same cached content, you are charged at the cache read rate rather than the full input rate.

The standard cache duration is 5 minutes. An extended 1-hour cache is also available for content that is accessed less frequently but still important to cache.

Cache Write vs Cache Read Costs

The first time you request caching of a piece of text, you pay the cache write rate (slightly higher than standard input). Every subsequent call that reads from that cache pays the cache read rate:

Cache read (Sonnet): $0.30 per million tokens - that is 90% cheaper than the standard $3.00 input rate
Cache read (Opus): $0.50 per million tokens - 90% cheaper than the standard $5.00 input rate

When Caching Makes Sense

Large system prompts: If your system prompt is 2,000 tokens and you make 10,000 API calls per month, the system prompt alone costs $60 per month at standard rates. With caching, it costs $3 per month for reads - a $57 monthly saving on inputs alone.
Static documents: If you are building a RAG application that retrieves the same reference documents frequently, cache those documents rather than re-uploading them with every call
Conversation history patterns: In long conversations, cache the earlier turns that are unlikely to change, so only the latest few turns are charged at full input rate

The Batch API: 50% Off for Non-Real-Time Work

The Message Batches API processes your requests asynchronously and charges 50% of the standard API price. The trade-off is that batch processing takes up to 24 hours rather than responding immediately.

When to Use the Batch API

Overnight data processing: Classifying, tagging, or summarising large datasets that do not need real-time results
Bulk document analysis: Processing hundreds or thousands of documents where same-day results are acceptable
Training data generation: Generating synthetic datasets or annotations for other models
Report generation: Generating scheduled reports that run overnight and are ready for morning review

Combine Caching and Batch for Maximum Savings

The two biggest cost levers are caching (for repeated context) and batch processing (for non-real-time work). On a large document processing workload with a shared system prompt, combining caching on the system prompt with batch processing on the document calls can reduce your effective cost by 90-95% compared to naive synchronous calls without caching.

Token Counting API: Know Before You Pay

The Token Counting API lets you calculate the exact token count of a message before sending it, without being charged for the actual API call.

This is invaluable for:

Enforcing context limits: Prevent errors by checking that a message will fit within the model's context window before sending
Cost estimation: Show users an estimated cost before they run a large batch job
Prompt optimisation: During development, measure the token cost of different system prompt versions to find the most efficient wording
Dynamic truncation: If a request would exceed your budget or context window, truncate conversation history until it fits

Cost Management Best Practices

Set Spend Limits

Configure monthly spend limits in the Anthropic Console. The API will return a 429 error when the limit is reached rather than continuing to charge.

Use the Right Model for Each Task

Do not use Opus for tasks where Haiku or Sonnet is sufficient. A task that costs $0.05 with Opus costs $0.01 with Sonnet and $0.002 with Haiku. At scale, that difference is enormous.

Optimise Your System Prompts

Every unnecessary word in your system prompt costs money on every API call. Review your system prompts during development and trim anything that does not materially improve Claude's behaviour.

Monitor Usage in the Console

The Usage section of the Anthropic Console breaks down your token consumption by model, day, and API key. Review this weekly to catch unexpected spikes before they become significant bills.

Infinite Loop Risk

Agentic applications that make multiple Claude calls in a loop are at risk of runaway costs if error handling is incomplete. A bug that causes an agent to loop infinitely can consume your entire monthly budget in hours. Always implement maximum iteration limits, spend monitoring, and emergency kill switches in agent code.

Summary

Claude API pricing is usage-based and transparent. The key levers for managing your costs are:

Choosing the right model for each task (Haiku, Sonnet, or Opus)
Enabling prompt caching for large, repeated context - up to 90% savings on cached tokens
Using the Batch API for non-real-time workloads - 50% off standard rates
Using the Token Counting API during development to understand your cost profile before deploying
Setting spend limits in the Console to prevent runaway costs

You now have a solid practical foundation for the Claude API. Before we move into prompt engineering, let us check your understanding with a short knowledge assessment.

In our next post, we test what you have learned in Modules 1 and 2: Claude API Foundations Quiz.

For a deep dive on prompt caching implementation, see Claude Prompt Caching Guide. To understand which model to choose for different task types and budgets, see the Claude model family guide.

Always check the official Anthropic pricing page for current rates - prices change as Anthropic improves infrastructure. The Token Counting API reference documents how to count tokens before sending requests.

This post is part of the Anthropic AI Tutorial Series. Don't forget to check out our previous post: Understanding the Claude Messages API: Roles, Turns, and System Prompts.

External references:

Frequently Asked Questions

Q: How does Anthropic charge for Claude API usage? Anthropic charges per million tokens - separately for input tokens (your prompt, system prompt, conversation history, tool definitions) and output tokens (Claude's generated response). Input tokens are cheaper than output tokens. Prices vary by model: Haiku is the least expensive, Sonnet is mid-tier, and Opus is the most expensive. Check the current pricing at anthropic.com/pricing as rates change over time.

Q: What counts as a token in the Claude API? A token is roughly 3-4 characters of English text, or about 0.75 words. Punctuation, spaces, and code characters each consume tokens. Non-English text and code can be more token-dense. Use the Anthropic Tokenizer tool or the client.beta.messages.count_tokens() method to estimate token counts for your prompts before running large batches.

Q: How does prompt caching reduce Claude API costs? Prompt caching lets you mark a portion of your prompt (typically a long system prompt or large document) as cacheable. When the same prefix appears in subsequent requests within the cache TTL (5 minutes for default cache, 1 hour for extended), Anthropic charges a reduced cache read price instead of the full input price. For applications that send a large static context with many short user messages, prompt caching can reduce input token costs by 80-90%.

Part of the Claude AI Masterclass.