How do I set my Anthropic API key as an environment variable?

On Mac or Linux, run export ANTHROPIC_API_KEY=your-key-here in your terminal, or add it to your .bashrc or .zshrc. On Windows, use setx ANTHROPIC_API_KEY your-key-here in Command Prompt or = 'your-key-here' in PowerShell. The Anthropic SDK reads ANTHROPIC_API_KEY automatically - you do not need to pass it explicitly if the variable is set.

What does max_tokens control in a Claude API call?

max_tokens sets the maximum number of tokens Claude will generate in its response. It does not affect how much you are charged for input tokens. Setting it too low will cause Claude to truncate its response mid-sentence. A safe default is 1024 for short answers, 4096 for longer documents, and up to 8192 for very long structured outputs. You pay for actual output tokens generated, not the max_tokens ceiling.

Why does my first Claude API call fail with an authentication error?

The most common causes are: the API key was not set as an environment variable and the code is looking for ANTHROPIC_API_KEY, the key was copied with leading or trailing whitespace, or the key belongs to an account with no credits remaining. Verify the key value with print(os.environ.get('ANTHROPIC_API_KEY')) before the API call. Check your usage and billing at console.anthropic.com.

Claude API First Call: Python and JavaScript Quickstart

Q: What is the minimum Python code needed to make a Claude API call?

Install the SDK with pip install anthropic, then: import anthropic; client = anthropic.Anthropic(api_key='your-key'); message = client.messages.create(model='claude-sonnet-4-6', max_tokens=1024, messages=[{'role': 'user', 'content': 'Hello!'}]); print(message.content[0].text). Set your API key as the ANTHROPIC_API_KEY environment variable to avoid hardcoding it.

← Back to Claude API Hub

You have your API key. Your SDK is installed. Now it is time to actually talk to Claude from code.

This is the step where everything becomes real. You go from reading about Claude's capabilities to experiencing them directly through the API - the same interface that powers production applications used by millions of people.

This guide gives you complete, working code in both Python and JavaScript for your first Claude API call, explains every line so you understand what it does, and introduces the patterns that will underpin everything you build going forward.

How to Make a Claude API Call

A Claude API call requires three things: an Anthropic client initialised with your API key, a model string identifying which Claude model to use, and a messages array containing at least one user turn. The SDK sends this as a POST request to the Messages API endpoint and returns a response object containing Claude's reply, token usage, and the reason generation stopped.

What We Are Building

We will start with an absolute minimum working example - just enough to send a message and get a response. Then we will layer in additional capabilities:

A basic single-turn message
Reading and understanding the response object
Adding a system prompt to shape Claude's behaviour
A multi-turn conversation
Streaming responses for real-time output
Error handling

By the end of this post, you will have a solid foundation for building any Claude-powered application.

The Minimum Working Example

Python

python

import anthropic

client = anthropic.Anthropic()
# The client automatically reads ANTHROPIC_API_KEY from your environment

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(message.content[0].text)

JavaScript / TypeScript

javascript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
// The client automatically reads ANTHROPIC_API_KEY from your environment

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

console.log(message.content[0].text);

Run either of these and you will see Claude respond: The capital of France is Paris.

That is it - that is your first Claude API call. Now let us understand every part of it.

Breaking Down the Request

The `model` Parameter

This tells the API which Claude model to use. We are using claude-sonnet-4-6 - a strong, cost-effective choice for most tasks. Swap in claude-opus-4-6 for more complex reasoning, or claude-haiku-4-5 for maximum speed and minimum cost.

The `max_tokens` Parameter

This sets the maximum number of tokens Claude can generate in its response. One token is roughly three to four characters in English. Setting this to 1024 allows responses of roughly 700-800 words - more than enough for most queries. If your use case requires very long responses, increase this value.

max_tokens is a Hard Limit

If Claude reaches the max_tokens limit before finishing its response, it will stop mid-sentence. The stop_reason in the response will be 'max_tokens' instead of 'end_turn'. For conversational use, 1024 is usually fine. For document generation or long-form writing, set it higher - Sonnet 4.6 supports up to 64,000 output tokens.

The `messages` Array

This is the core of every Claude API request. The messages array contains the conversation history as a sequence of objects, each with a role and content.

Roles:

user: The human's turn - your question, instruction, or input
assistant: Claude's previous responses in a conversation

Even for a single question, you pass it inside an array. This design means the same API structure handles both one-off queries and multi-turn conversations without any change in format.

Understanding the Response Object

The API returns a Message object with several important fields.

Python - Exploring the Response

python

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain APIs in one sentence."}]
)

print(message.id)           # Unique message ID (e.g. msg_01XFDUDYJgAACzvnptvVoYEL)
print(message.model)        # Which model was used
print(message.stop_reason)  # Why Claude stopped: 'end_turn' or 'max_tokens'
print(message.usage.input_tokens)   # Tokens in your request
print(message.usage.output_tokens)  # Tokens in Claude's response
print(message.content[0].text)      # The actual text response

Key Response Fields

id: A unique identifier for this specific API call - useful for logging and debugging
model: The model that actually processed the request (may differ slightly from what you specified if you used an alias)
stop_reason: end_turn means Claude finished naturally; max_tokens means it hit the limit; tool_use means it wants to call a tool
usage.input_tokens: The number of tokens in your message and system prompt - this is what you pay for on input
usage.output_tokens: The number of tokens in Claude's response - what you pay for on output
content: An array of content blocks. For text responses, this is always a list with one item of type "text"

Adding a System Prompt

A system prompt is a set of instructions you give Claude before the conversation begins. It shapes Claude's persona, role, constraints, and behaviour throughout the entire conversation. It is the most powerful tool you have for customising how Claude responds.

Python

python

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a senior software engineer with 15 years of experience. "
           "Give concise, practical answers. Prefer code examples over long explanations. "
           "Always mention potential pitfalls and edge cases.",
    messages=[
        {"role": "user", "content": "How should I handle API timeouts in Python?"}
    ]
)

print(message.content[0].text)

JavaScript

javascript

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system:
    "You are a senior software engineer with 15 years of experience. " +
    "Give concise, practical answers. Prefer code examples over long explanations. " +
    "Always mention potential pitfalls and edge cases.",
  messages: [
    { role: "user", content: "How should I handle API timeouts in Python?" },
  ],
});

console.log(message.content[0].text);

The system prompt is passed as a top-level string parameter - separate from the messages array. Claude treats it as authoritative instructions that override or supplement what the user asks.

System Prompts are Your Most Powerful Tool

A well-crafted system prompt transforms Claude from a general assistant into a specialised tool. A system prompt can establish Claude's persona, define its scope (what topics it will and will not address), set the response format, specify the audience, and provide background context that every conversation should know. We cover this in depth in the Prompt Engineering module.

Multi-Turn Conversations

To continue a conversation across multiple turns, you include the full message history in each API call. The API itself is stateless - it does not remember previous calls. You are responsible for maintaining the conversation history on your side and sending it with each request.

Python

python

conversation_history = []

# Turn 1
conversation_history.append({
    "role": "user",
    "content": "I want to learn Python. Where should I start?"
})

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=conversation_history
)

# Add Claude's response to history
conversation_history.append({
    "role": "assistant",
    "content": response.content[0].text
})

# Turn 2 - Claude remembers the conversation
conversation_history.append({
    "role": "user",
    "content": "How long will that take if I study two hours per day?"
})

response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=conversation_history
)

print(response2.content[0].text)

This pattern - append user message, call API, append assistant response, repeat - is the foundation of every conversational AI application.

Streaming Responses

By default, the API waits until Claude has finished generating the entire response before returning it. For short responses this is fine. For longer responses, this means your user stares at a blank screen for several seconds.

Streaming sends each piece of Claude's response as it is generated, so you can display text to the user word by word - exactly like how Claude.ai renders responses.

Python Streaming

python

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short paragraph about AI safety."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # New line at the end

JavaScript Streaming

javascript

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Write a short paragraph about AI safety." },
  ],
});

for await (const chunk of stream) {
  if (
    chunk.type === "content_block_delta" &&
    chunk.delta.type === "text_delta"
  ) {
    process.stdout.write(chunk.delta.text);
  }
}

Error Handling

A production-ready integration must handle errors gracefully. The Anthropic SDK throws specific error types that you can catch and handle appropriately.

Python Error Handling

python

import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(message.content[0].text)

except anthropic.AuthenticationError:
    print("Invalid API key. Check your ANTHROPIC_API_KEY environment variable.")

except anthropic.RateLimitError:
    print("Rate limit exceeded. Wait a moment before retrying.")

except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

Common Error Types

AuthenticationError (401): Your API key is missing, expired, or invalid
PermissionDeniedError (403): Your account does not have access to the requested model or feature
RateLimitError (429): You have exceeded your requests-per-minute or tokens-per-minute limit - implement exponential backoff retry logic
InternalServerError (500): A temporary Anthropic server issue - retry with backoff
APIConnectionError: Network issue between your server and Anthropic's API

The SDK Has Built-In Retry Logic

The official Anthropic SDKs automatically retry transient errors like 429 and 500 with exponential backoff by default. You can configure the number of retries and the backoff parameters. For most applications, the default retry behaviour is sufficient without writing custom retry logic yourself.

Summary

You have now made your first Claude API call - and more importantly, you understand what every part of the request and response means. The patterns in this post - the messages array, the system prompt, multi-turn history management, streaming, and error handling - appear in every Claude integration you will ever build.

The next step is going deeper into the Messages API to understand all its parameters, how system prompts interact with conversation turns, and the full range of stop reasons and content types.

In our next post, we cover the Messages API in depth: Understanding the Messages API: Roles, Turns, and System Prompts.

Once you are comfortable with basic API calls, the natural next steps are understanding Claude API pricing and tokens so you can estimate costs, and exploring Claude tool use to give Claude the ability to call external functions. For a complete picture of which model to use for different tasks, see the Claude model family guide.

The official Messages API reference lists every parameter, response field, and error code. The Anthropic Python SDK on GitHub has additional examples including async usage, retries, and timeout configuration.

This post is part of the Anthropic AI Tutorial Series. Don't forget to check out our previous post: How to Get Your Anthropic API Key and Set Up the Console.

External references:

Frequently Asked Questions

Q: What is the minimum Python code needed to make a Claude API call? Install the SDK with pip install anthropic, then: import anthropic; client = anthropic.Anthropic(api_key="your-key"); message = client.messages.create(model="claude-sonnet-4-6", max_tokens=1024, messages=[{"role": "user", "content": "Hello!"}]); print(message.content[0].text). Set your API key as the ANTHROPIC_API_KEY environment variable and omit the api_key argument - the SDK picks it up automatically.

Q: How do you stream a Claude API response instead of waiting for the full reply? Use client.messages.stream(...) as a context manager: with client.messages.stream(model=..., messages=..., max_tokens=...) as stream: for text in stream.text_stream: print(text, end=—, flush=True). Streaming returns tokens as they are generated, improving perceived latency for long responses in chat or interactive applications.

Q: What does the max_tokens parameter control and how should you set it? max_tokens caps the number of tokens Claude will generate in its response. Setting it too low truncates the response mid-sentence; setting it too high wastes no cost if Claude finishes earlier (you are only billed for tokens actually generated). For conversational responses, 1024-2048 is a sensible default. For long documents or code generation, set it to 4096 or higher. Always set an explicit max_tokens - the API requires it.

Part of the Claude AI Masterclass.

How to Make a Claude API Call

What We Are Building

The Minimum Working Example

Python

JavaScript / TypeScript

Breaking Down the Request

The model Parameter

The max_tokens Parameter

max_tokens is a Hard Limit

The messages Array

Understanding the Response Object

Python - Exploring the Response

Key Response Fields

Adding a System Prompt

Python

JavaScript

System Prompts are Your Most Powerful Tool

Multi-Turn Conversations

Python

Streaming Responses

Python Streaming

JavaScript Streaming

Error Handling

Python Error Handling

Common Error Types

The SDK Has Built-In Retry Logic

Summary

Frequently Asked Questions

The `model` Parameter

The `max_tokens` Parameter

The `messages` Array