Why use Amazon Bedrock to access Claude instead of the direct Anthropic API?

Use Bedrock when your infrastructure is already on AWS and you want Claude API calls to stay within your VPC without traversing the public internet, when you need AWS IAM-based authentication instead of API keys, when you want consolidated AWS billing and cost controls, or when compliance requirements mandate AWS-native service usage. The direct Anthropic API is simpler and sometimes cheaper for smaller workloads.

How do I authenticate with Bedrock to call Claude?

Bedrock uses standard AWS IAM authentication - no Anthropic API key is needed. Attach an IAM policy granting bedrock:InvokeModel permission to your IAM role or user, then use the AWS SDK with standard credential resolution (environment variables, instance profile, or assumed role). For local development, configure AWS credentials via aws configure or set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

What is cross-region inference in Bedrock and when should I use it?

Cross-region inference automatically routes requests to Claude model capacity in other AWS regions when your primary region is at capacity. Enable it by using a cross-region inference profile ARN instead of the standard model ARN. Use it for production workloads that cannot tolerate throttling during peak periods, or for applications that need the highest possible availability. It may add 10-50ms of latency from the routing overhead.

How do I control costs when running Claude on Bedrock at scale?

Use Bedrock provisioned throughput for steady predictable workloads - it offers a discounted rate in exchange for a committed hourly spend. Set AWS Budgets alerts to notify you when spend exceeds thresholds. Enable Bedrock model invocation logging to track usage per model and application. Use the smallest Claude model that meets your quality bar, and implement prompt caching for repeated context.

Claude on AWS Bedrock: Production Setup Guide

← Back to Claude API Hub

Calling the Anthropic API directly works perfectly for most projects. But enterprise organisations often have requirements that go beyond what a direct API key allows: AWS-managed billing, data residency guarantees, VPC-private inference with no public internet traffic, compliance logging via CloudWatch, and integration with existing AWS IAM governance.

AWS Bedrock solves all of these. It is Amazon's managed service for running foundation models - including Claude - inside your AWS account, with your existing security controls, without data leaving your VPC.

What is Claude on AWS Bedrock?

Claude on AWS Bedrock is Claude's inference capability delivered through Amazon's fully managed model hosting service. Rather than calling Anthropic's API directly, your application calls the Bedrock Runtime through boto3, and AWS handles authentication via IAM roles, routes traffic through your VPC, and logs invocations to CloudWatch - with identical token pricing to the direct Anthropic API and no vendor lock-in at the request format level.

This guide takes you from zero to a production-ready Claude deployment on Bedrock: model activation, IAM policy, boto3 integration, VPC PrivateLink, CloudWatch observability, and a clear comparison to help you decide which approach is right for your workload.

AWS Bedrock vs Anthropic API: Which Should You Use?

Use the Anthropic API when you are building a SaaS product, a startup, or any workload where you control all the infrastructure and data residency is not a hard requirement
Use AWS Bedrock when your organisation has AWS-first procurement, when data must provably not leave a specific AWS region, when you need VPC-private inference, when you need integration with CloudWatch, S3, or AWS IAM, or when a central AWS bill is required

Bedrock pricing for Claude is identical to Anthropic's direct pricing per token. There is no Bedrock premium. The trade-off is configuration complexity versus governance.

Step 1: Enable Claude Models in the Bedrock Console

By default, foundation model access in Bedrock is disabled. You must request access for each model.

Open the AWS Console and navigate to Amazon Bedrock
In the left navigation, click Model Access
Click Manage Model Access
Find Anthropic in the provider list and tick the models you want: Claude Sonnet, Claude Haiku, Claude Opus
Accept Anthropic's usage policy and click Save Changes
Access is usually approved within minutes. Status will show Access granted

Model Access is Per Region

Bedrock model availability varies by AWS region. Claude is available in us-east-1 (N. Virginia), us-west-2 (Oregon), eu-west-1 (Ireland), eu-central-1 (Frankfurt), ap-southeast-1 (Singapore), and others. Check the Bedrock documentation for the current availability matrix. You must enable models separately in each region you plan to use.

Step 2: IAM Policy for Bedrock Inference

Create a least-privilege IAM policy that allows only the Bedrock inference actions your application needs.

json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockInvokeModels",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6-20250514-v1:0",
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0"
      ]
    },
    {
      "Sid": "BedrockListModels",
      "Effect": "Allow",
      "Action": [
        "bedrock:ListFoundationModels",
        "bedrock:GetFoundationModel"
      ],
      "Resource": "*"
    }
  ]
}

Attach this policy to:

An IAM role used by your EC2 instances, ECS tasks, or Lambda functions - using instance profiles or task roles, not access keys
An IAM user only for local development - never in production

Step 3: boto3 Integration

python

import boto3
import json
from typing import Generator

# --- Bedrock Client Setup ------------------------------------------------------

def get_bedrock_client(region: str = "us-east-1"):
    """
    Create a Bedrock runtime client.
    
    When running on AWS (EC2/ECS/Lambda), credentials come automatically
    from the instance/task/execution role - no access keys needed.
    """
    return boto3.client(
        service_name="bedrock-runtime",
        region_name=region
    )


# Bedrock uses different model ID format from Anthropic SDK
MODEL_IDS = {
    "claude-sonnet": "anthropic.claude-sonnet-4-6-20250514-v1:0",
    "claude-haiku": "anthropic.claude-haiku-4-5-20251001-v1:0",
    "claude-opus": "anthropic.claude-opus-4-6-20250514-v1:0"
}


# --- Standard Invocation ------------------------------------------------------

def invoke_claude(
    prompt: str,
    system: str = None,
    model_key: str = "claude-sonnet",
    max_tokens: int = 1024,
    region: str = "us-east-1"
) -> str:
    """
    Invoke Claude on Bedrock with a simple prompt.
    Returns the assistant's text response.
    """
    client = get_bedrock_client(region)
    
    # Build request body in Anthropic Messages API format
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {"role": "user", "content": prompt}
        ]
    }
    
    if system:
        request_body["system"] = system
    
    response = client.invoke_model(
        modelId=MODEL_IDS[model_key],
        contentType="application/json",
        accept="application/json",
        body=json.dumps(request_body)
    )
    
    response_body = json.loads(response["body"].read())
    return response_body["content"][0]["text"]


# --- Streaming Invocation -----------------------------------------------------

def invoke_claude_streaming(
    prompt: str,
    system: str = None,
    model_key: str = "claude-sonnet",
    max_tokens: int = 2048,
    region: str = "us-east-1"
) -> Generator[str, None, None]:
    """
    Stream Claude's response token by token.
    Yields text chunks as they arrive.
    """
    client = get_bedrock_client(region)
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "messages": [
            {"role": "user", "content": prompt}
        ]
    }
    
    if system:
        request_body["system"] = system
    
    response = client.invoke_model_with_response_stream(
        modelId=MODEL_IDS[model_key],
        contentType="application/json",
        accept="application/json",
        body=json.dumps(request_body)
    )
    
    # Parse the event stream
    for event in response["body"]:
        chunk = json.loads(event["chunk"]["bytes"].decode())
        
        if chunk.get("type") == "content_block_delta":
            delta = chunk.get("delta", {})
            if delta.get("type") == "text_delta":
                yield delta.get("text", "")


# --- Tool Use on Bedrock ------------------------------------------------------

def invoke_claude_with_tools(
    messages: list,
    tools: list,
    system: str = None,
    model_key: str = "claude-sonnet",
    max_tokens: int = 4096,
    region: str = "us-east-1"
) -> dict:
    """
    Invoke Claude with tool definitions on Bedrock.
    Tool use works identically to the Anthropic SDK.
    """
    client = get_bedrock_client(region)
    
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": max_tokens,
        "tools": tools,
        "messages": messages
    }
    
    if system:
        request_body["system"] = system
    
    response = client.invoke_model(
        modelId=MODEL_IDS[model_key],
        contentType="application/json",
        accept="application/json",
        body=json.dumps(request_body)
    )
    
    return json.loads(response["body"].read())


# --- Example Usage -------------------------------------------------------------

if __name__ == "__main__":
    # Standard call
    answer = invoke_claude(
        prompt="Explain the difference between synchronous and asynchronous programming in two paragraphs.",
        system="You are a senior software engineer explaining concepts to junior developers.",
        model_key="claude-sonnet"
    )
    print(answer)
    
    # Streaming call
    print("\nStreaming response:")
    for chunk in invoke_claude_streaming("What is the OSI model?"):
        print(chunk, end="", flush=True)
    print()

Step 4: VPC PrivateLink - Zero Public Internet Traffic

For strict data residency, configure a Bedrock VPC endpoint so inference traffic never leaves AWS's private network.

bash

# Create VPC endpoint for Bedrock Runtime
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc1234def567890 \
  --service-name com.amazonaws.us-east-1.bedrock-runtime \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-0abc1234 subnet-0def5678 \
  --security-group-ids sg-0abc1234 \
  --private-dns-enabled

Once the endpoint is active, all boto3 Bedrock calls from EC2 or ECS resources in that VPC automatically route through the private endpoint. No code changes required - the DNS resolves to the private endpoint IP automatically.

Security Group Configuration

The security group attached to your Bedrock VPC endpoint must allow inbound HTTPS (port 443) from your application's security group. The application's security group must allow outbound HTTPS to the endpoint's security group. This ensures only your application's infrastructure can reach the private Bedrock endpoint.

Step 5: CloudWatch Logging and Observability

Bedrock does not log inference inputs/outputs by default for privacy reasons, but you can enable model invocation logging to S3 or CloudWatch Logs.

python

import boto3

bedrock_mgmt = boto3.client("bedrock", region_name="us-east-1")

# Enable model invocation logging
bedrock_mgmt.put_model_invocation_logging_configuration(
    loggingConfig={
        "cloudWatchConfig": {
            "logGroupName": "/aws/bedrock/model-invocations",
            "roleArn": "arn:aws:iam::123456789012:role/BedrockCloudWatchRole",
            "largeDataDeliveryS3Config": {
                "bucketName": "my-bedrock-logs-bucket",
                "keyPrefix": "bedrock-overflow/"
            }
        },
        "s3Config": {
            "bucketName": "my-bedrock-logs-bucket",
            "keyPrefix": "bedrock-full-logs/"
        },
        "textDataDeliveryEnabled": True,
        "imageDataDeliveryEnabled": False,
        "embeddingDataDeliveryEnabled": False
    }
)

Once enabled, every model invocation creates a log entry with:

Model ID, region, and timestamp
Input and output token counts
Request and response content (if text delivery is enabled - consider data sensitivity)
Latency and stop reason

Use these logs with CloudWatch Insights to build dashboards tracking cost per service, error rates, and latency percentiles.

Cost Management

Bedrock pricing equals Anthropic API pricing - there is no additional cost for using the managed service
Use AWS Cost Allocation Tags on your IAM roles or via resource tagging to attribute Bedrock spend to individual teams or projects
Enable AWS Budgets alerts at $500 / $1,000 thresholds to prevent runaway inference costs during development
Use Claude Haiku for high-volume, lower-complexity tasks and Claude Sonnet for production-quality inference - do not default to Opus for everything
Bedrock Enterprise pricing is available for organisations with consistent high-volume usage - contact AWS for committed-use discounts

Use Bedrock Batch Inference for Large Workloads

Bedrock supports batch inference for large-scale, non-latency-sensitive workloads - call InvokeModel on a batch of prompts stored in S3 and retrieve results asynchronously. Batch inference costs 50% less than on-demand inference, making it ideal for bulk document processing, dataset annotation, or nightly report generation pipelines.

Production Checklist

Model access enabled and confirmed in the correct region
IAM policy scoped to specific model ARNs - not a wildcard bedrock:*
Credentials come from instance/task role - no access keys in application code or environment variables
VPC endpoint configured for private inference if data residency is required
CloudWatch logging enabled with alerts on error rate exceeding 1%
AWS Budgets alert set for expected monthly Bedrock spend
Retry logic with exponential backoff for ThrottlingException and ServiceUnavailableException

Summary

AWS Bedrock is the right choice when your organisation needs Claude's capabilities inside existing AWS governance - VPC-private traffic, IAM access control, AWS billing, and CloudWatch auditing.

No code changes are needed switching from Anthropic SDK to Bedrock - the request/response format is identical, just routed through boto3 and a different endpoint
Model IDs follow the anthropic.claude-<model>-v<n>:0 format - check the Bedrock console for exact current identifiers
VPC PrivateLink ensures zero public internet egress for all model inference
Cost is identical to direct Anthropic API pricing - you gain governance, not additional cost

Final post in the series: Final Knowledge Test + Anthropic AI Series Recap: Are You Ready to Build?.

If you are evaluating whether Bedrock is right for your project, compare it against running Claude through the Claude API directly and the Claude model family guide which covers which models are available on Bedrock versus direct API.

The AWS Bedrock official documentation is the definitive reference for IAM policy samples, current model IDs, and region availability. The Bedrock model IDs reference page is particularly useful - model identifiers change with new versions and should always be sourced from there rather than hard-coded from tutorials.

This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a Data Analyst Agent - CSV Insights in Plain English.

External references:

Frequently Asked Questions

Q: What is Amazon Bedrock and why would you use it to access Claude instead of the direct Anthropic API? Amazon Bedrock is an AWS managed service that provides access to foundation models - including Claude - through a unified AWS API. Use Bedrock when your infrastructure is already on AWS and you want to keep Claude API calls within your VPC (no public internet), use AWS IAM for authentication instead of API keys, consolidate billing through AWS, or meet compliance requirements that require data to stay within an AWS region.

Q: How do you call Claude on AWS Bedrock from Python? Use the boto3 AWS SDK: import boto3; client = boto3.client("bedrock-runtime", region_name="us-east-1"); response = client.invoke_model(modelId="anthropic.claude-sonnet-4-6", body=json.dumps({"anthropic_version": "bedrock-2023-05-31", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]})). The request body follows the Anthropic Messages API format but is wrapped in the Bedrock invoke_model call.

Q: What are the key differences between Anthropic API and Bedrock for production deployments? The Anthropic API offers the latest model versions fastest and uses Anthropic API keys. Bedrock offers AWS-native IAM auth, VPC endpoints, CloudWatch logging, AWS PrivateLink, and cross-region inference. Bedrock model IDs differ slightly (e.g., anthropic.claude-sonnet-4-6) and new models may appear on Bedrock slightly later than on the direct API. For latency-sensitive workloads, test both - network path differences can matter.

Part of the Claude AI Masterclass.