Deploy Claude on AWS Bedrock: A Production Setup Guide

Calling the Anthropic API directly works perfectly for most projects. But enterprise organisations often have requirements that go beyond what a direct API key allows: AWS-managed billing, data residency guarantees, VPC-private inference with no public internet traffic, compliance logging via CloudWatch, and integration with existing AWS IAM governance.
AWS Bedrock solves all of these. It is Amazon's managed service for running foundation models — including Claude — inside your AWS account, with your existing security controls, without data leaving your VPC.
This guide takes you from zero to a production-ready Claude deployment on Bedrock: model activation, IAM policy, boto3 integration, VPC PrivateLink, CloudWatch observability, and a clear comparison to help you decide which approach is right for your workload.
AWS Bedrock vs Anthropic API: Which Should You Use?
- Use the Anthropic API when you are building a SaaS product, a startup, or any workload where you control all the infrastructure and data residency is not a hard requirement
- Use AWS Bedrock when your organisation has AWS-first procurement, when data must provably not leave a specific AWS region, when you need VPC-private inference, when you need integration with CloudWatch, S3, or AWS IAM, or when a central AWS bill is required
Bedrock pricing for Claude is identical to Anthropic's direct pricing per token. There is no Bedrock premium. The trade-off is configuration complexity versus governance.
Step 1: Enable Claude Models in the Bedrock Console
By default, foundation model access in Bedrock is disabled. You must request access for each model.
- Open the AWS Console and navigate to Amazon Bedrock
- In the left navigation, click Model Access
- Click Manage Model Access
- Find Anthropic in the provider list and tick the models you want: Claude Sonnet, Claude Haiku, Claude Opus
- Accept Anthropic's usage policy and click Save Changes
- Access is usually approved within minutes. Status will show Access granted
Model Access is Per Region
Bedrock model availability varies by AWS region. Claude is available in us-east-1 (N. Virginia), us-west-2 (Oregon), eu-west-1 (Ireland), eu-central-1 (Frankfurt), ap-southeast-1 (Singapore), and others. Check the Bedrock documentation for the current availability matrix. You must enable models separately in each region you plan to use.
Step 2: IAM Policy for Bedrock Inference
Create a least-privilege IAM policy that allows only the Bedrock inference actions your application needs.
1{
2 "Version": "2012-10-17",
3 "Statement": [
4 {
5 "Sid": "BedrockInvokeModels",
6 "Effect": "Allow",
7 "Action": [
8 "bedrock:InvokeModel",
9 "bedrock:InvokeModelWithResponseStream"
10 ],
11 "Resource": [
12 "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-6-20250514-v1:0",
13 "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0"
14 ]
15 },
16 {
17 "Sid": "BedrockListModels",
18 "Effect": "Allow",
19 "Action": [
20 "bedrock:ListFoundationModels",
21 "bedrock:GetFoundationModel"
22 ],
23 "Resource": "*"
24 }
25 ]
26}Attach this policy to:
- An IAM role used by your EC2 instances, ECS tasks, or Lambda functions — using instance profiles or task roles, not access keys
- An IAM user only for local development — never in production
Step 3: boto3 Integration
1import boto3
2import json
3from typing import Generator
4
5# ─── Bedrock Client Setup ──────────────────────────────────────────────────────
6
7def get_bedrock_client(region: str = "us-east-1"):
8 """
9 Create a Bedrock runtime client.
10
11 When running on AWS (EC2/ECS/Lambda), credentials come automatically
12 from the instance/task/execution role — no access keys needed.
13 """
14 return boto3.client(
15 service_name="bedrock-runtime",
16 region_name=region
17 )
18
19
20# Bedrock uses different model ID format from Anthropic SDK
21MODEL_IDS = {
22 "claude-sonnet": "anthropic.claude-sonnet-4-6-20250514-v1:0",
23 "claude-haiku": "anthropic.claude-haiku-4-5-20251001-v1:0",
24 "claude-opus": "anthropic.claude-opus-4-6-20250514-v1:0"
25}
26
27
28# ─── Standard Invocation ──────────────────────────────────────────────────────
29
30def invoke_claude(
31 prompt: str,
32 system: str = None,
33 model_key: str = "claude-sonnet",
34 max_tokens: int = 1024,
35 region: str = "us-east-1"
36) -> str:
37 """
38 Invoke Claude on Bedrock with a simple prompt.
39 Returns the assistant's text response.
40 """
41 client = get_bedrock_client(region)
42
43 # Build request body in Anthropic Messages API format
44 request_body = {
45 "anthropic_version": "bedrock-2023-05-31",
46 "max_tokens": max_tokens,
47 "messages": [
48 {"role": "user", "content": prompt}
49 ]
50 }
51
52 if system:
53 request_body["system"] = system
54
55 response = client.invoke_model(
56 modelId=MODEL_IDS[model_key],
57 contentType="application/json",
58 accept="application/json",
59 body=json.dumps(request_body)
60 )
61
62 response_body = json.loads(response["body"].read())
63 return response_body["content"][0]["text"]
64
65
66# ─── Streaming Invocation ─────────────────────────────────────────────────────
67
68def invoke_claude_streaming(
69 prompt: str,
70 system: str = None,
71 model_key: str = "claude-sonnet",
72 max_tokens: int = 2048,
73 region: str = "us-east-1"
74) -> Generator[str, None, None]:
75 """
76 Stream Claude's response token by token.
77 Yields text chunks as they arrive.
78 """
79 client = get_bedrock_client(region)
80
81 request_body = {
82 "anthropic_version": "bedrock-2023-05-31",
83 "max_tokens": max_tokens,
84 "messages": [
85 {"role": "user", "content": prompt}
86 ]
87 }
88
89 if system:
90 request_body["system"] = system
91
92 response = client.invoke_model_with_response_stream(
93 modelId=MODEL_IDS[model_key],
94 contentType="application/json",
95 accept="application/json",
96 body=json.dumps(request_body)
97 )
98
99 # Parse the event stream
100 for event in response["body"]:
101 chunk = json.loads(event["chunk"]["bytes"].decode())
102
103 if chunk.get("type") == "content_block_delta":
104 delta = chunk.get("delta", {})
105 if delta.get("type") == "text_delta":
106 yield delta.get("text", "")
107
108
109# ─── Tool Use on Bedrock ──────────────────────────────────────────────────────
110
111def invoke_claude_with_tools(
112 messages: list,
113 tools: list,
114 system: str = None,
115 model_key: str = "claude-sonnet",
116 max_tokens: int = 4096,
117 region: str = "us-east-1"
118) -> dict:
119 """
120 Invoke Claude with tool definitions on Bedrock.
121 Tool use works identically to the Anthropic SDK.
122 """
123 client = get_bedrock_client(region)
124
125 request_body = {
126 "anthropic_version": "bedrock-2023-05-31",
127 "max_tokens": max_tokens,
128 "tools": tools,
129 "messages": messages
130 }
131
132 if system:
133 request_body["system"] = system
134
135 response = client.invoke_model(
136 modelId=MODEL_IDS[model_key],
137 contentType="application/json",
138 accept="application/json",
139 body=json.dumps(request_body)
140 )
141
142 return json.loads(response["body"].read())
143
144
145# ─── Example Usage ─────────────────────────────────────────────────────────────
146
147if __name__ == "__main__":
148 # Standard call
149 answer = invoke_claude(
150 prompt="Explain the difference between synchronous and asynchronous programming in two paragraphs.",
151 system="You are a senior software engineer explaining concepts to junior developers.",
152 model_key="claude-sonnet"
153 )
154 print(answer)
155
156 # Streaming call
157 print("\nStreaming response:")
158 for chunk in invoke_claude_streaming("What is the OSI model?"):
159 print(chunk, end="", flush=True)
160 print()Step 4: VPC PrivateLink — Zero Public Internet Traffic
For strict data residency, configure a Bedrock VPC endpoint so inference traffic never leaves AWS's private network.
1# Create VPC endpoint for Bedrock Runtime
2aws ec2 create-vpc-endpoint \
3 --vpc-id vpc-0abc1234def567890 \
4 --service-name com.amazonaws.us-east-1.bedrock-runtime \
5 --vpc-endpoint-type Interface \
6 --subnet-ids subnet-0abc1234 subnet-0def5678 \
7 --security-group-ids sg-0abc1234 \
8 --private-dns-enabledOnce the endpoint is active, all boto3 Bedrock calls from EC2 or ECS resources in that VPC automatically route through the private endpoint. No code changes required — the DNS resolves to the private endpoint IP automatically.
Security Group Configuration
The security group attached to your Bedrock VPC endpoint must allow inbound HTTPS (port 443) from your application's security group. The application's security group must allow outbound HTTPS to the endpoint's security group. This ensures only your application's infrastructure can reach the private Bedrock endpoint.
Step 5: CloudWatch Logging and Observability
Bedrock does not log inference inputs/outputs by default for privacy reasons, but you can enable model invocation logging to S3 or CloudWatch Logs.
1import boto3
2
3bedrock_mgmt = boto3.client("bedrock", region_name="us-east-1")
4
5# Enable model invocation logging
6bedrock_mgmt.put_model_invocation_logging_configuration(
7 loggingConfig={
8 "cloudWatchConfig": {
9 "logGroupName": "/aws/bedrock/model-invocations",
10 "roleArn": "arn:aws:iam::123456789012:role/BedrockCloudWatchRole",
11 "largeDataDeliveryS3Config": {
12 "bucketName": "my-bedrock-logs-bucket",
13 "keyPrefix": "bedrock-overflow/"
14 }
15 },
16 "s3Config": {
17 "bucketName": "my-bedrock-logs-bucket",
18 "keyPrefix": "bedrock-full-logs/"
19 },
20 "textDataDeliveryEnabled": True,
21 "imageDataDeliveryEnabled": False,
22 "embeddingDataDeliveryEnabled": False
23 }
24)Once enabled, every model invocation creates a log entry with:
- Model ID, region, and timestamp
- Input and output token counts
- Request and response content (if text delivery is enabled — consider data sensitivity)
- Latency and stop reason
Use these logs with CloudWatch Insights to build dashboards tracking cost per service, error rates, and latency percentiles.
Cost Management
- Bedrock pricing equals Anthropic API pricing — there is no additional cost for using the managed service
- Use AWS Cost Allocation Tags on your IAM roles or via resource tagging to attribute Bedrock spend to individual teams or projects
- Enable AWS Budgets alerts at $500 / $1,000 thresholds to prevent runaway inference costs during development
- Use Claude Haiku for high-volume, lower-complexity tasks and Claude Sonnet for production-quality inference — do not default to Opus for everything
- Bedrock Enterprise pricing is available for organisations with consistent high-volume usage — contact AWS for committed-use discounts
Use Bedrock Batch Inference for Large Workloads
Bedrock supports batch inference for large-scale, non-latency-sensitive workloads — call InvokeModel on a batch of prompts stored in S3 and retrieve results asynchronously. Batch inference costs 50% less than on-demand inference, making it ideal for bulk document processing, dataset annotation, or nightly report generation pipelines.
Production Checklist
- Model access enabled and confirmed in the correct region
- IAM policy scoped to specific model ARNs — not a wildcard
bedrock:* - Credentials come from instance/task role — no access keys in application code or environment variables
- VPC endpoint configured for private inference if data residency is required
- CloudWatch logging enabled with alerts on error rate exceeding 1%
- AWS Budgets alert set for expected monthly Bedrock spend
- Retry logic with exponential backoff for ThrottlingException and ServiceUnavailableException
Summary
AWS Bedrock is the right choice when your organisation needs Claude's capabilities inside existing AWS governance — VPC-private traffic, IAM access control, AWS billing, and CloudWatch auditing.
- No code changes are needed switching from Anthropic SDK to Bedrock — the request/response format is identical, just routed through boto3 and a different endpoint
- Model IDs follow the
anthropic.claude-<model>-v<n>:0format — check the Bedrock console for exact current identifiers - VPC PrivateLink ensures zero public internet egress for all model inference
- Cost is identical to direct Anthropic API pricing — you gain governance, not additional cost
Final post in the series: Final Knowledge Test + Anthropic AI Series Recap: Are You Ready to Build?.
This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a Data Analyst Agent — CSV Insights in Plain English.
