Build Your First AI Coding Agent with the Claude API

Every AI coding agent you have heard of — Cursor, Devin, GitHub Copilot Coding Agent — is built on the same foundation: an LLM with access to tools that can read and write files, execute commands, and observe the results. The specific tools differ; the loop is the same.
What Is an AI Coding Agent, Exactly?
An AI coding agent is a loop: Claude decides which tool to call, your code executes that tool, the result feeds back to Claude, and the cycle repeats until the task is done. The agent reads files, writes code, runs tests, reads the results, and iterates — all without human intervention. Built with Claude's tool use API and a sandboxed executor, this architecture is the foundation of every major commercial coding agent.
In this project you will build that loop yourself. By the end, you will have a working AI coding agent that can:
- Read any file in your project directory
- Write and edit source files
- Execute shell commands (run tests, install packages, run scripts)
- List directory contents
- Search files for patterns
- Iterate on failures until the task is complete
This is not a wrapper around Claude Code. You are building the agent loop from scratch so you understand exactly how it works — and so you can extend it for your own use cases.
Prerequisites
pip install anthropicPython 3.11 or later. Set your API key:
export ANTHROPIC_API_KEY="your-api-key"You should already understand how Claude's tool use works. If not, read Claude Tool Use Explained first.
The Architecture
The agent is built from three layers:
┌─────────────────────────────────────────────┐
│ Tool Executor │
│ (safely runs the tools Claude requests) │
├─────────────────────────────────────────────┤
│ Agent Loop │
│ (sends messages, handles tool calls, │
│ feeds results back, iterates) │
├─────────────────────────────────────────────┤
│ Claude API │
│ (reasons, plans, decides which tools to │
│ call and with what arguments) │
└─────────────────────────────────────────────┘Claude never executes code directly — it returns JSON specifying which tool to call and what arguments to pass. Your Python code runs the tool, captures the output, and returns it to Claude. Claude decides what to do next. This loop repeats until Claude returns a final response with no tool calls.
Step 1: Define the Tools
Tools are defined as JSON schemas. Claude reads these to understand what is available and what arguments each tool expects.
# agent/tools.py
TOOLS = [
{
"name": "read_file",
"description": (
"Read the complete contents of a file. "
"Use this to understand existing code before making changes."
),
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Relative path to the file from the project root"
}
},
"required": ["path"]
}
},
{
"name": "write_file",
"description": (
"Write content to a file, creating it if it does not exist "
"or overwriting it if it does. Use for creating new files or "
"replacing file contents entirely."
),
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Relative path to the file"},
"content": {"type": "string", "description": "The complete file content to write"}
},
"required": ["path", "content"]
}
},
{
"name": "edit_file",
"description": (
"Replace a specific string in a file with new content. "
"The old_string must exist exactly once in the file. "
"Use for targeted edits without rewriting the whole file."
),
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"old_string": {"type": "string", "description": "Exact text to find and replace"},
"new_string": {"type": "string", "description": "Replacement text"}
},
"required": ["path", "old_string", "new_string"]
}
},
{
"name": "run_command",
"description": (
"Execute a shell command in the project directory. "
"Use for running tests, installing packages, or executing scripts. "
"Returns stdout, stderr, and exit code."
),
"input_schema": {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "Shell command to run (e.g. 'pytest tests/', 'pip install requests')"
},
"timeout_seconds": {
"type": "integer",
"description": "Maximum execution time in seconds (default: 30)",
"default": 30
}
},
"required": ["command"]
}
},
{
"name": "list_directory",
"description": "List files and directories at a given path.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Relative path to list (use '.' for project root)"
}
},
"required": ["path"]
}
},
{
"name": "search_files",
"description": (
"Search for a string pattern across all files in the project. "
"Returns matching file paths and line numbers."
),
"input_schema": {
"type": "object",
"properties": {
"pattern": {"type": "string", "description": "String or regex to search for"},
"file_extension": {
"type": "string",
"description": "Filter by file extension (e.g. '.py', '.ts'). Optional.",
"default": ""
}
},
"required": ["pattern"]
}
}
]Step 2: Implement the Tool Executor
The tool executor is the bridge between Claude's JSON requests and your filesystem. Security is paramount here — restrict all file operations to the project directory.
# agent/executor.py
import os
import re
import subprocess
from pathlib import Path
class ToolExecutor:
"""
Executes tools requested by Claude.
All file operations are restricted to project_root.
"""
def __init__(self, project_root: str):
self.root = Path(project_root).resolve()
def _safe_path(self, relative_path: str) -> Path:
"""
Resolve a relative path and verify it stays within project root.
Raises ValueError if the path would escape the project directory.
"""
resolved = (self.root / relative_path).resolve()
if not str(resolved).startswith(str(self.root)):
raise ValueError(
f"Path traversal blocked: '{relative_path}' resolves outside project root"
)
return resolved
def read_file(self, path: str) -> str:
safe = self._safe_path(path)
if not safe.exists():
return f"ERROR: File does not exist: {path}"
if not safe.is_file():
return f"ERROR: Path is not a file: {path}"
try:
return safe.read_text(encoding="utf-8")
except Exception as e:
return f"ERROR: Could not read file: {e}"
def write_file(self, path: str, content: str) -> str:
safe = self._safe_path(path)
safe.parent.mkdir(parents=True, exist_ok=True)
safe.write_text(content, encoding="utf-8")
return f"OK: Written {len(content)} bytes to {path}"
def edit_file(self, path: str, old_string: str, new_string: str) -> str:
safe = self._safe_path(path)
if not safe.exists():
return f"ERROR: File does not exist: {path}"
content = safe.read_text(encoding="utf-8")
count = content.count(old_string)
if count == 0:
return f"ERROR: old_string not found in {path}"
if count > 1:
return f"ERROR: old_string appears {count} times in {path} — must be unique"
new_content = content.replace(old_string, new_string, 1)
safe.write_text(new_content, encoding="utf-8")
return f"OK: Replaced 1 occurrence in {path}"
def run_command(self, command: str, timeout_seconds: int = 30) -> str:
# Block obviously dangerous commands
blocked = ["rm -rf", "sudo", "mkfs", "dd if=", ":(){:|:&};:"]
for blocked_cmd in blocked:
if blocked_cmd in command.lower():
return f"ERROR: Command blocked for safety: contains '{blocked_cmd}'"
try:
result = subprocess.run(
command,
shell=True,
cwd=str(self.root),
capture_output=True,
text=True,
timeout=timeout_seconds
)
output = []
if result.stdout:
output.append(f"STDOUT:\n{result.stdout}")
if result.stderr:
output.append(f"STDERR:\n{result.stderr}")
output.append(f"EXIT CODE: {result.returncode}")
return "\n".join(output) if output else f"EXIT CODE: {result.returncode}"
except subprocess.TimeoutExpired:
return f"ERROR: Command timed out after {timeout_seconds} seconds"
except Exception as e:
return f"ERROR: {e}"
def list_directory(self, path: str) -> str:
safe = self._safe_path(path)
if not safe.exists():
return f"ERROR: Path does not exist: {path}"
lines = []
for item in sorted(safe.iterdir()):
rel = item.relative_to(self.root)
marker = "/" if item.is_dir() else ""
lines.append(f"{rel}{marker}")
return "\n".join(lines) if lines else "(empty directory)"
def search_files(self, pattern: str, file_extension: str = "") -> str:
matches = []
for file_path in self.root.rglob("*"):
if not file_path.is_file():
continue
if file_extension and not file_path.suffix == file_extension:
continue
# Skip common non-source directories
parts = file_path.parts
if any(p in parts for p in [".git", "node_modules", "__pycache__", ".venv", "venv"]):
continue
try:
content = file_path.read_text(encoding="utf-8", errors="ignore")
for i, line in enumerate(content.splitlines(), 1):
if re.search(pattern, line):
rel = file_path.relative_to(self.root)
matches.append(f"{rel}:{i}: {line.strip()}")
except Exception:
continue
if not matches:
return f"No matches found for pattern: {pattern}"
return "\n".join(matches[:50]) # cap at 50 matches
def execute(self, tool_name: str, tool_input: dict) -> str:
"""Dispatch a tool call from Claude."""
dispatch = {
"read_file": lambda i: self.read_file(i["path"]),
"write_file": lambda i: self.write_file(i["path"], i["content"]),
"edit_file": lambda i: self.edit_file(i["path"], i["old_string"], i["new_string"]),
"run_command": lambda i: self.run_command(i["command"], i.get("timeout_seconds", 30)),
"list_directory": lambda i: self.list_directory(i["path"]),
"search_files": lambda i: self.search_files(i["pattern"], i.get("file_extension", "")),
}
if tool_name not in dispatch:
return f"ERROR: Unknown tool: {tool_name}"
try:
return dispatch[tool_name](tool_input)
except (KeyError, TypeError) as e:
return f"ERROR: Invalid arguments for {tool_name}: {e}"Path Traversal Protection
The _safe_path method resolves the full absolute path and checks it starts with the project root. This blocks directory traversal attacks like '../../etc/passwd'. Never skip this check — Claude may produce unexpected paths when reasoning about a task.
Step 3: The Agent Loop
The agent loop is the core of the system. It sends messages to Claude, handles tool calls, feeds results back, and repeats until Claude signals completion.
# agent/loop.py
import anthropic
from .tools import TOOLS
from .executor import ToolExecutor
SYSTEM_PROMPT = """You are an expert software engineer and coding agent.
Your job is to complete coding tasks by using the tools available to you:
- Read files to understand the existing codebase before making changes
- Write and edit files to implement the requested changes
- Run commands to verify your changes work (run tests, execute scripts)
- Search files to find relevant code
- List directories to understand project structure
Guidelines:
1. ALWAYS read relevant files before making changes to understand existing patterns
2. Make targeted, minimal changes — don't rewrite files unnecessarily
3. After writing code, run tests to verify correctness
4. If tests fail, read the error output carefully and fix the root cause
5. Keep iterating until tests pass or you determine the task is complete
6. Communicate clearly what you are doing and why at each step
You are operating in a sandboxed environment. All file operations are restricted to the project directory."""
class CodingAgent:
def __init__(
self,
project_root: str,
model: str = "claude-sonnet-4-6",
max_iterations: int = 30,
verbose: bool = True,
):
self.client = anthropic.Anthropic()
self.executor = ToolExecutor(project_root)
self.model = model
self.max_iterations = max_iterations
self.verbose = verbose
def _log(self, msg: str) -> None:
if self.verbose:
print(msg)
def run(self, task: str) -> str:
"""
Run the agent on a task description.
Returns Claude's final response text.
"""
self._log(f"\n{'='*60}")
self._log(f"TASK: {task}")
self._log(f"{'='*60}\n")
messages = [{"role": "user", "content": task}]
iteration = 0
while iteration < self.max_iterations:
iteration += 1
self._log(f"[Iteration {iteration}] Calling Claude...")
response = self.client.messages.create(
model=self.model,
max_tokens=8096,
system=SYSTEM_PROMPT,
tools=TOOLS,
messages=messages,
)
self._log(f"[Iteration {iteration}] Stop reason: {response.stop_reason}")
# Append Claude's response to message history
messages.append({"role": "assistant", "content": response.content})
# If no tool calls, Claude is done
if response.stop_reason == "end_turn":
# Extract the final text response
final_text = ""
for block in response.content:
if hasattr(block, "text"):
final_text += block.text
self._log(f"\n[DONE] Final response:\n{final_text}")
return final_text
# Process all tool calls in this response
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
tool_name = block.name
tool_input = block.input
tool_use_id = block.id
self._log(f"\n[Tool] {tool_name}({tool_input})")
# Execute the tool
result = self.executor.execute(tool_name, tool_input)
# Truncate very long outputs to avoid context overflow
if len(result) > 8000:
result = result[:8000] + "\n... [output truncated at 8000 chars]"
self._log(f"[Result] {result[:200]}{'...' if len(result) > 200 else ''}")
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": result,
})
# Return tool results to Claude
messages.append({"role": "user", "content": tool_results})
return f"Agent stopped after {self.max_iterations} iterations without completing the task."Step 4: Run It on a Real Task
Create a test project to run the agent against:
mkdir test_project
cd test_project
# Create a simple Python module with a bug
cat > calculator.py << 'EOF'
def add(a, b):
return a + b
def subtract(a, b):
return a - b
def multiply(a, b):
return a * b
def divide(a, b):
return a / b # Bug: no division by zero check
EOF
# Create a test file with a failing test
cat > test_calculator.py << 'EOF'
import pytest
from calculator import add, subtract, multiply, divide
def test_add():
assert add(2, 3) == 5
def test_subtract():
assert subtract(10, 4) == 6
def test_multiply():
assert multiply(3, 4) == 12
def test_divide_normal():
assert divide(10, 2) == 5.0
def test_divide_by_zero():
with pytest.raises(ValueError, match="Cannot divide by zero"):
divide(10, 0)
EOFNow run the agent:
# run_agent.py
from agent.loop import CodingAgent
agent = CodingAgent(project_root="./test_project", verbose=True)
result = agent.run(
"The test_divide_by_zero test is failing. "
"Fix the divide function in calculator.py to raise a ValueError "
"with the message 'Cannot divide by zero' when the divisor is zero. "
"Run the tests to confirm all 5 pass."
)What you will observe:
============================================================
TASK: The test_divide_by_zero test is failing. Fix the divide function...
============================================================
[Iteration 1] Calling Claude...
[Iteration 1] Stop reason: tool_use
[Tool] read_file({'path': 'calculator.py'})
[Result] def add(a, b):
return a + b...
[Tool] read_file({'path': 'test_calculator.py'})
[Result] import pytest
from calculator import add...
[Iteration 2] Calling Claude...
[Iteration 2] Stop reason: tool_use
[Tool] edit_file({'path': 'calculator.py', 'old_string': 'def divide(a, b):\n return a / b # Bug: no division by zero check', 'new_string': 'def divide(a, b):\n if b == 0:\n raise ValueError("Cannot divide by zero")\n return a / b'})
[Result] OK: Replaced 1 occurrence in calculator.py
[Iteration 3] Calling Claude...
[Iteration 3] Stop reason: tool_use
[Tool] run_command({'command': 'pytest test_calculator.py -v'})
[Result] STDOUT:
test_calculator.py::test_add PASSED
test_calculator.py::test_subtract PASSED
test_calculator.py::test_multiply PASSED
test_calculator.py::test_divide_normal PASSED
test_calculator.py::test_divide_by_zero PASSED
5 passed in 0.12s
EXIT CODE: 0
[Iteration 4] Calling Claude...
[Iteration 4] Stop reason: end_turn
[DONE] Final response:
All 5 tests are now passing. I fixed the `divide` function in `calculator.py` by adding a check for division by zero before performing the division. When `b == 0`, the function now raises a `ValueError` with the message "Cannot divide by zero" as expected by the test.The agent autonomously read both files, identified the fix, edited the file, ran the tests, confirmed all 5 passed, and reported completion. Total: 4 iterations, no human intervention.
Step 5: Try a More Complex Task
result = agent.run(
"Add a square_root function to calculator.py. "
"It should raise a ValueError('Cannot take square root of negative number') "
"for negative inputs. "
"Write comprehensive tests for it in test_calculator.py. "
"Make sure all tests pass."
)The agent will: read calculator.py, decide where to add the function, check for import math or add it, write the function, read test_calculator.py, add test cases, run all tests, fix any issues, confirm pass.
Common Failure Modes and Fixes
Agent loops without progress: Add a stagnation check — if the same tool is called with the same arguments twice, break the loop:
# In the agent loop, track tool call history
tool_call_history = set()
for block in response.content:
if block.type == "tool_use":
call_sig = f"{block.name}:{json.dumps(block.input, sort_keys=True)}"
if call_sig in tool_call_history:
return "Agent is stuck in a loop — duplicate tool call detected."
tool_call_history.add(call_sig)Context window overflow on large files: Truncate file reads over a character limit and tell Claude:
MAX_FILE_CHARS = 20_000
def read_file(self, path: str) -> str:
content = safe.read_text(encoding="utf-8")
if len(content) > MAX_FILE_CHARS:
return content[:MAX_FILE_CHARS] + f"\n... [truncated — file is {len(content)} chars total]"
return contentDangerous command execution: Extend the blocklist in run_command and consider adding an approval step for destructive operations in production agents.
Full Project Structure
agent/
├── __init__.py
├── tools.py ← Tool definitions (JSON schemas)
├── executor.py ← Tool implementations (filesystem + shell)
└── loop.py ← Agent loop (Claude API + message history)
test_project/
├── calculator.py ← Target codebase
└── test_calculator.py
run_agent.py ← Entry pointKey Takeaways
- An AI coding agent is an agentic loop: Claude decides which tools to call → your code executes them → results feed back to Claude → repeat
- Claude never executes code directly — it returns tool call requests as structured JSON; your code runs the tools safely
- Path traversal protection in the file tools is non-negotiable — always resolve and validate paths before any filesystem operation
- Blocklisting dangerous commands in the shell tool prevents the agent from accidentally executing destructive operations
- The agent works best on well-defined tasks with testable outcomes — it can iterate on failing tests and confirm success automatically
- Adding context truncation and loop detection makes your agent significantly more robust in production
What's Next in the AI Coding Agents Series
- What Are AI Coding Agents?
- AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
- Build Your First AI Coding Agent with the Claude API ← you are here
- Build an Automated GitHub PR Review Agent
- Build an Autonomous Bug Fixer Agent
- AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production
This post is part of the AI Coding Agents Series. Previous post: AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code.
For the underlying concepts, see Claude Tool Use Explained, Claude Agentic Loop Explained, and Claude Structured Outputs and JSON. For security considerations when running agents in production, see Basic Threat Detection for Developers.
External Resources
- Anthropic Tool Use documentation — the full reference for defining tools and handling tool_use responses.
- Anthropic Messages API reference — complete parameter reference for the messages.create call used in the agent loop.
