Artificial IntelligenceAnthropicCloud

Deploy Claude on AWS Bedrock: A Production Setup Guide

TT
TopicTrick
Deploy Claude on AWS Bedrock: A Production Setup Guide

Calling the Anthropic API directly works perfectly for most projects. But enterprise organisations often have requirements that go beyond what a direct API key allows: AWS-managed billing, data residency guarantees, VPC-private inference with no public internet traffic, compliance logging via CloudWatch, and integration with existing AWS IAM governance.

AWS Bedrock solves all of these. It is Amazon's managed service for running foundation models — including Claude — inside your AWS account, with your existing security controls, without data leaving your VPC.

This guide takes you from zero to a production-ready Claude deployment on Bedrock: model activation, IAM policy, boto3 integration, VPC PrivateLink, CloudWatch observability, and a clear comparison to help you decide which approach is right for your workload.


AWS Bedrock vs Anthropic API: Which Should You Use?

  • Use the Anthropic API when you are building a SaaS product, a startup, or any workload where you control all the infrastructure and data residency is not a hard requirement
  • Use AWS Bedrock when your organisation has AWS-first procurement, when data must provably not leave a specific AWS region, when you need VPC-private inference, when you need integration with CloudWatch, S3, or AWS IAM, or when a central AWS bill is required

Bedrock pricing for Claude is identical to Anthropic's direct pricing per token. There is no Bedrock premium. The trade-off is configuration complexity versus governance.


Step 1: Enable Claude Models in the Bedrock Console

By default, foundation model access in Bedrock is disabled. You must request access for each model.

  1. Open the AWS Console and navigate to Amazon Bedrock
  2. In the left navigation, click Model Access
  3. Click Manage Model Access
  4. Find Anthropic in the provider list and tick the models you want: Claude Sonnet, Claude Haiku, Claude Opus
  5. Accept Anthropic's usage policy and click Save Changes
  6. Access is usually approved within minutes. Status will show Access granted

Model Access is Per Region

Bedrock model availability varies by AWS region. Claude is available in us-east-1 (N. Virginia), us-west-2 (Oregon), eu-west-1 (Ireland), eu-central-1 (Frankfurt), ap-southeast-1 (Singapore), and others. Check the Bedrock documentation for the current availability matrix. You must enable models separately in each region you plan to use.


    Step 2: IAM Policy for Bedrock Inference

    Create a least-privilege IAM policy that allows only the Bedrock inference actions your application needs.

    json
    1{ 2 "Version": "2012-10-17", 3 "Statement": [ 4 { 5 "Sid": "BedrockInvokeModels", 6 "Effect": "Allow", 7 "Action": [ 8 "bedrock:InvokeModel", 9 "bedrock:InvokeModelWithResponseStream" 10 ], 11 "Resource": [ 12 "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6-20250514-v1:0", 13 "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0" 14 ] 15 }, 16 { 17 "Sid": "BedrockListModels", 18 "Effect": "Allow", 19 "Action": [ 20 "bedrock:ListFoundationModels", 21 "bedrock:GetFoundationModel" 22 ], 23 "Resource": "*" 24 } 25 ] 26}

    Attach this policy to:

    • An IAM role used by your EC2 instances, ECS tasks, or Lambda functions — using instance profiles or task roles, not access keys
    • An IAM user only for local development — never in production

    Step 3: boto3 Integration

    python
    1import boto3 2import json 3from typing import Generator 4 5# ─── Bedrock Client Setup ────────────────────────────────────────────────────── 6 7def get_bedrock_client(region: str = "us-east-1"): 8 """ 9 Create a Bedrock runtime client. 10 11 When running on AWS (EC2/ECS/Lambda), credentials come automatically 12 from the instance/task/execution role — no access keys needed. 13 """ 14 return boto3.client( 15 service_name="bedrock-runtime", 16 region_name=region 17 ) 18 19 20# Bedrock uses different model ID format from Anthropic SDK 21MODEL_IDS = { 22 "claude-sonnet": "anthropic.claude-sonnet-4-6-20250514-v1:0", 23 "claude-haiku": "anthropic.claude-haiku-4-5-20251001-v1:0", 24 "claude-opus": "anthropic.claude-opus-4-6-20250514-v1:0" 25} 26 27 28# ─── Standard Invocation ────────────────────────────────────────────────────── 29 30def invoke_claude( 31 prompt: str, 32 system: str = None, 33 model_key: str = "claude-sonnet", 34 max_tokens: int = 1024, 35 region: str = "us-east-1" 36) -> str: 37 """ 38 Invoke Claude on Bedrock with a simple prompt. 39 Returns the assistant's text response. 40 """ 41 client = get_bedrock_client(region) 42 43 # Build request body in Anthropic Messages API format 44 request_body = { 45 "anthropic_version": "bedrock-2023-05-31", 46 "max_tokens": max_tokens, 47 "messages": [ 48 {"role": "user", "content": prompt} 49 ] 50 } 51 52 if system: 53 request_body["system"] = system 54 55 response = client.invoke_model( 56 modelId=MODEL_IDS[model_key], 57 contentType="application/json", 58 accept="application/json", 59 body=json.dumps(request_body) 60 ) 61 62 response_body = json.loads(response["body"].read()) 63 return response_body["content"][0]["text"] 64 65 66# ─── Streaming Invocation ───────────────────────────────────────────────────── 67 68def invoke_claude_streaming( 69 prompt: str, 70 system: str = None, 71 model_key: str = "claude-sonnet", 72 max_tokens: int = 2048, 73 region: str = "us-east-1" 74) -> Generator[str, None, None]: 75 """ 76 Stream Claude's response token by token. 77 Yields text chunks as they arrive. 78 """ 79 client = get_bedrock_client(region) 80 81 request_body = { 82 "anthropic_version": "bedrock-2023-05-31", 83 "max_tokens": max_tokens, 84 "messages": [ 85 {"role": "user", "content": prompt} 86 ] 87 } 88 89 if system: 90 request_body["system"] = system 91 92 response = client.invoke_model_with_response_stream( 93 modelId=MODEL_IDS[model_key], 94 contentType="application/json", 95 accept="application/json", 96 body=json.dumps(request_body) 97 ) 98 99 # Parse the event stream 100 for event in response["body"]: 101 chunk = json.loads(event["chunk"]["bytes"].decode()) 102 103 if chunk.get("type") == "content_block_delta": 104 delta = chunk.get("delta", {}) 105 if delta.get("type") == "text_delta": 106 yield delta.get("text", "") 107 108 109# ─── Tool Use on Bedrock ────────────────────────────────────────────────────── 110 111def invoke_claude_with_tools( 112 messages: list, 113 tools: list, 114 system: str = None, 115 model_key: str = "claude-sonnet", 116 max_tokens: int = 4096, 117 region: str = "us-east-1" 118) -> dict: 119 """ 120 Invoke Claude with tool definitions on Bedrock. 121 Tool use works identically to the Anthropic SDK. 122 """ 123 client = get_bedrock_client(region) 124 125 request_body = { 126 "anthropic_version": "bedrock-2023-05-31", 127 "max_tokens": max_tokens, 128 "tools": tools, 129 "messages": messages 130 } 131 132 if system: 133 request_body["system"] = system 134 135 response = client.invoke_model( 136 modelId=MODEL_IDS[model_key], 137 contentType="application/json", 138 accept="application/json", 139 body=json.dumps(request_body) 140 ) 141 142 return json.loads(response["body"].read()) 143 144 145# ─── Example Usage ───────────────────────────────────────────────────────────── 146 147if __name__ == "__main__": 148 # Standard call 149 answer = invoke_claude( 150 prompt="Explain the difference between synchronous and asynchronous programming in two paragraphs.", 151 system="You are a senior software engineer explaining concepts to junior developers.", 152 model_key="claude-sonnet" 153 ) 154 print(answer) 155 156 # Streaming call 157 print("\nStreaming response:") 158 for chunk in invoke_claude_streaming("What is the OSI model?"): 159 print(chunk, end="", flush=True) 160 print()

    Step 4: VPC PrivateLink — Zero Public Internet Traffic

    For strict data residency, configure a Bedrock VPC endpoint so inference traffic never leaves AWS's private network.

    bash
    1# Create VPC endpoint for Bedrock Runtime 2aws ec2 create-vpc-endpoint \ 3 --vpc-id vpc-0abc1234def567890 \ 4 --service-name com.amazonaws.us-east-1.bedrock-runtime \ 5 --vpc-endpoint-type Interface \ 6 --subnet-ids subnet-0abc1234 subnet-0def5678 \ 7 --security-group-ids sg-0abc1234 \ 8 --private-dns-enabled

    Once the endpoint is active, all boto3 Bedrock calls from EC2 or ECS resources in that VPC automatically route through the private endpoint. No code changes required — the DNS resolves to the private endpoint IP automatically.

    Security Group Configuration

    The security group attached to your Bedrock VPC endpoint must allow inbound HTTPS (port 443) from your application's security group. The application's security group must allow outbound HTTPS to the endpoint's security group. This ensures only your application's infrastructure can reach the private Bedrock endpoint.


      Step 5: CloudWatch Logging and Observability

      Bedrock does not log inference inputs/outputs by default for privacy reasons, but you can enable model invocation logging to S3 or CloudWatch Logs.

      python
      1import boto3 2 3bedrock_mgmt = boto3.client("bedrock", region_name="us-east-1") 4 5# Enable model invocation logging 6bedrock_mgmt.put_model_invocation_logging_configuration( 7 loggingConfig={ 8 "cloudWatchConfig": { 9 "logGroupName": "/aws/bedrock/model-invocations", 10 "roleArn": "arn:aws:iam::123456789012:role/BedrockCloudWatchRole", 11 "largeDataDeliveryS3Config": { 12 "bucketName": "my-bedrock-logs-bucket", 13 "keyPrefix": "bedrock-overflow/" 14 } 15 }, 16 "s3Config": { 17 "bucketName": "my-bedrock-logs-bucket", 18 "keyPrefix": "bedrock-full-logs/" 19 }, 20 "textDataDeliveryEnabled": True, 21 "imageDataDeliveryEnabled": False, 22 "embeddingDataDeliveryEnabled": False 23 } 24)

      Once enabled, every model invocation creates a log entry with:

      • Model ID, region, and timestamp
      • Input and output token counts
      • Request and response content (if text delivery is enabled — consider data sensitivity)
      • Latency and stop reason

      Use these logs with CloudWatch Insights to build dashboards tracking cost per service, error rates, and latency percentiles.


      Cost Management

      • Bedrock pricing equals Anthropic API pricing — there is no additional cost for using the managed service
      • Use AWS Cost Allocation Tags on your IAM roles or via resource tagging to attribute Bedrock spend to individual teams or projects
      • Enable AWS Budgets alerts at $500 / $1,000 thresholds to prevent runaway inference costs during development
      • Use Claude Haiku for high-volume, lower-complexity tasks and Claude Sonnet for production-quality inference — do not default to Opus for everything
      • Bedrock Enterprise pricing is available for organisations with consistent high-volume usage — contact AWS for committed-use discounts

      Use Bedrock Batch Inference for Large Workloads

      Bedrock supports batch inference for large-scale, non-latency-sensitive workloads — call InvokeModel on a batch of prompts stored in S3 and retrieve results asynchronously. Batch inference costs 50% less than on-demand inference, making it ideal for bulk document processing, dataset annotation, or nightly report generation pipelines.


        Production Checklist

        • Model access enabled and confirmed in the correct region
        • IAM policy scoped to specific model ARNs — not a wildcard bedrock:*
        • Credentials come from instance/task role — no access keys in application code or environment variables
        • VPC endpoint configured for private inference if data residency is required
        • CloudWatch logging enabled with alerts on error rate exceeding 1%
        • AWS Budgets alert set for expected monthly Bedrock spend
        • Retry logic with exponential backoff for ThrottlingException and ServiceUnavailableException

        Summary

        AWS Bedrock is the right choice when your organisation needs Claude's capabilities inside existing AWS governance — VPC-private traffic, IAM access control, AWS billing, and CloudWatch auditing.

        • No code changes are needed switching from Anthropic SDK to Bedrock — the request/response format is identical, just routed through boto3 and a different endpoint
        • Model IDs follow the anthropic.claude-<model>-v<n>:0 format — check the Bedrock console for exact current identifiers
        • VPC PrivateLink ensures zero public internet egress for all model inference
        • Cost is identical to direct Anthropic API pricing — you gain governance, not additional cost

        Final post in the series: Final Knowledge Test + Anthropic AI Series Recap: Are You Ready to Build?.


        This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a Data Analyst Agent — CSV Insights in Plain English.