How do I send an image to Claude via the API?

Include an image content block in your message with type: image and either a base64-encoded data source or a URL source. For base64, set source.type to base64, provide the media_type (image/jpeg, image/png, etc.), and the base64-encoded data string. For URLs, set source.type to url and provide the publicly accessible image URL. The Files API is more efficient for images sent across multiple requests.

How many pages of a PDF can Claude analyse?

Claude can process multi-page PDFs, with each page rendered as an image consuming tokens proportional to its visual complexity. Very long PDFs (100+ pages) can consume a significant portion of the context window. For large documents, consider extracting only the relevant pages before sending, or using a RAG approach to retrieve specific sections rather than sending the full document.

Can Claude extract structured data from images like invoices or forms?

Yes - combining vision with tool use or structured output instructions is a powerful pattern for document data extraction. Send the image and instruct Claude to extract specific fields (invoice number, date, line items, totals) as JSON. This works reliably for clearly printed documents. Handwritten documents, low-resolution scans, or complex table layouts may require additional validation and retry logic.

Claude Vision API: OCR, Charts & PDF Extraction (2026)

← Back to Claude API Hub

Text is only one of the ways humans communicate and work. An enormous proportion of business information lives in images - screenshots of systems, scanned documents, charts in presentations, photos of physical infrastructure, and pages from PDFs. Until AI models gained vision capabilities, processing this content required manual extraction into text before any AI could work with it.

Claude's vision capabilities change that. You can send Claude an image, a PDF page, or a screenshot and ask it to read, analyse, reason about, and extract information from what it sees. This post covers everything you need to know about working with visual content in the Claude API - from the mechanics of sending images to practical extraction patterns for real-world business documents.

What Can Claude Vision Analyse?

Claude vision accepts JPEG, PNG, GIF (first frame), and WebP images up to 20MB, as well as PDFs via the Files API. It can read text in scanned documents, describe and analyse charts and graphs, identify UI elements in screenshots, interpret architecture and flowchart diagrams, and extract structured data from forms and invoices. Claude cannot process video or audio content.

What Claude Can See and Understand

Claude's vision capabilities handle a broad range of visual content:

Photographs: People, objects, scenes, physical environments, and products
Screenshots: UI interfaces, error messages, dashboards, web pages
Charts and graphs: Bar charts, line graphs, pie charts - Claude can read values and describe trends
Documents and forms: Scanned text, handwritten notes, form fields, table data
Diagrams: Architecture diagrams, flowcharts, network maps, org charts
PDFs: Multi-page documents including text, images, and tables within PDFs
Code screenshots: Claude can read and reason about code visible in an image

Claude cannot process video files, animated GIFs (it sees the first frame), or audio embedded in media. For video, you would extract frames and send them as individual images.

Sending an Image to Claude

Images are passed as content blocks within the messages array. Claude supports both base64-encoded images and URL-referenced images.

From a URL

The simplest method - if your image is publicly accessible, pass the URL directly:

python

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/architecture-diagram.png"
                    }
                },
                {
                    "type": "text",
                    "text": "Describe the components in this architecture diagram and identify any potential single points of failure."
                }
            ]
        }
    ]
)

print(response.content[0].text)

From a Local File (Base64 Encoding)

For private images or files that are not publicly hosted:

python

import anthropic
import base64
from pathlib import Path

client = anthropic.Anthropic()

# Read and encode the image
image_path = Path("screenshot.png")
image_data = base64.standard_b64encode(image_path.read_bytes()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "What error is shown in this screenshot? What is the most likely cause?"
                }
            ]
        }
    ]
)

Supported Image Formats

Claude supports JPEG, PNG, GIF (first frame), and WebP image formats. For PDFs, you have two options: convert pages to images with a library like pdf2image and send them as base64 images, or use the Files API to upload the full PDF. The Files API approach is covered in the next post and is recommended for multi-page PDF processing.

Supported Image Sizes and Limits

Maximum image file size: 20MB per image
Maximum images per request: up to 100 images
Very large images are automatically downscaled internally - you do not need to resize them before sending, but small images are handled more efficiently
For optimal performance, resize images to 1280px on the longest edge before sending - beyond this size, the model quality gain is negligible but token cost increases

Practical Use Cases with Code

Document Data Extraction

Extract structured data from a scanned invoice, receipt, or form:

python

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {"type": "base64", "media_type": "image/jpeg", "data": invoice_base64}
                },
                {
                    "type": "text",
                    "text": """Extract the following fields from this invoice and return as JSON:
{
  "vendor_name": "",
  "invoice_number": "",
  "invoice_date": "",
  "due_date": "",
  "total_amount": "",
  "line_items": [{"description": "", "quantity": 0, "unit_price": 0, "total": 0}]
}
Return only the JSON. Do not include any explanation."""
                }
            ]
        }
    ]
)

import json
invoice_data = json.loads(response.content[0].text)

Chart and Graph Analysis

python

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {"type": "url", "url": "https://example.com/quarterly-sales.png"}
                },
                {
                    "type": "text",
                    "text": "Analyse this sales chart. What is the overall trend? Which quarter had the highest growth? Are there any anomalies worth investigating?"
                }
            ]
        }
    ]
)

Multi-Image Comparison

Claude can work with multiple images in a single request - useful for comparing before/after states, different designs, or multiple document pages:

python

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Here are two versions of the same login page. Image 1 is the current version:"},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": v1_base64}},
                {"type": "text", "text": "Image 2 is the proposed redesign:"},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": v2_base64}},
                {"type": "text", "text": "From a UX perspective, what are the key differences? Which version would likely convert better and why?"}
            ]
        }
    ]
)

Label Your Images in Multi-Image Prompts

When sending multiple images in a single request, always label them explicitly in your text content - 'Image 1:', 'Image 2:', and so on. Claude can refer to images by the labels you provide, which makes the conversation much clearer and prevents confusion when Claude needs to distinguish between images in its response.

Processing PDFs with Vision

For PDF documents, you have two main approaches:

Approach 1 - Convert Pages to Images

Use a Python library to convert PDF pages to images, then send each page as an image:

python

from pdf2image import convert_from_path
import base64
from io import BytesIO

# Convert PDF pages to PIL Images
pages = convert_from_path("contract.pdf", dpi=200)

# Encode each page
encoded_pages = []
for page in pages:
    buffer = BytesIO()
    page.save(buffer, format="JPEG", quality=85)
    encoded_pages.append(base64.standard_b64encode(buffer.getvalue()).decode("utf-8"))

# Build content blocks for the first 5 pages
content = [{"type": "text", "text": "Review this contract and identify all obligations, payment terms, and termination clauses."}]
for i, page_data in enumerate(encoded_pages[:5]):
    content.append({"type": "text", "text": f"Page {i+1}:"})
    content.append({
        "type": "image",
        "source": {"type": "base64", "media_type": "image/jpeg", "data": page_data}
    })

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    messages=[{"role": "user", "content": content}]
)

Approach 2 - Files API (Recommended for Multi-Page PDFs)

The Files API (covered in the next post) lets you upload a PDF once and reference it by file ID, which is cleaner and more efficient for large documents.

Accessibility and Responsible Use

Do not use Claude's vision to identify individuals by face - this raises significant privacy concerns and Anthropic's policies restrict this use case
Be transparent with users when their submitted images are being processed by AI
For healthcare documents, legal contracts, and other sensitive materials, ensure your data handling complies with applicable regulations before sending images to external APIs

Image Data and Privacy

Every image you send to Claude via the API leaves your environment. Treat image data with the same care you would apply to any sensitive text data. Do not send images containing personal identification information, medical records, or confidential business data unless you have assessed the data processing implications and your users have given appropriate consent.

Summary

Claude's vision capabilities unlock a category of automation that was previously impossible with text-only AI: working directly with the visual information that fills modern business workflows. From invoice extraction to infrastructure diagram analysis, Claude can be the reading layer that processes your visual content and converts it into actionable structured data.

Core takeaways:

Use URL source for publicly accessible images - simpler and no encoding overhead
Use base64 encoding for private, local, or dynamically generated images
Use multiple images in one request for comparison and multi-page documents
Label images explicitly when sending more than one in a request
Resize large images to ~1280px longest edge for optimal cost efficiency

Next up: Claude Files API Tutorial: Upload Once, Use Many Times.

Vision analysis is commonly combined with Claude structured outputs to extract data from images into validated JSON - for example, parsing invoice fields from a scanned image into a database record. For computer-based visual automation, see Claude computer use explained.

The Anthropic vision documentation lists current image format support, size limits, and PDF processing guidelines. The pdf2image Python library is the standard tool for converting PDF pages to images before sending them to Claude's vision API.

This post is part of the Anthropic AI Tutorial Series. Previous post: Claude Web Search Tool: Real-Time Data in Your AI App.

External references:

Frequently Asked Questions

Q: What types of visual inputs can Claude analyse? Claude Vision supports images in JPEG, PNG, GIF, and WebP formats, plus PDFs (which are rendered page by page). You can pass images via base64 encoding, a URL (for publicly accessible images), or a file ID from the Files API. Claude can describe image content, extract text (OCR), read charts and diagrams, analyse screenshots, compare multiple images, and answer questions about visual content.

Q: How are tokens counted for image inputs in the Claude API? Image token cost depends on image dimensions. Claude tiles large images into smaller sub-images for processing; more tiles = more tokens. A small image (under 200x200 pixels) uses roughly 85 base tokens. A full-page screenshot at 1080p can use 1,500-2,000 tokens. Resize images to the minimum resolution needed for your task - if you are extracting text from a document, 1000px wide is usually sufficient and much cheaper than sending a 4K scan.

Q: How do you extract text from a PDF using Claude? Pass the PDF as a document content block: {"type": "document", "source": {"type": "base64", "media_type": "application/pdf", "data": "<base64>"}} and ask Claude to extract the text. For multi-page PDFs, Claude processes all pages but context window limits apply - very long PDFs may need to be split. The Files API is ideal for PDFs you will reference repeatedly, as it avoids re-uploading the base64 payload each time.

Part of the Claude AI Masterclass.