Artificial IntelligenceSoftware DevelopmentProjects

Build Your First AI Coding Agent with the Claude API

TT
TopicTrick
Build Your First AI Coding Agent with the Claude API

Every AI coding agent you have heard of — Cursor, Devin, GitHub Copilot Coding Agent — is built on the same foundation: an LLM with access to tools that can read and write files, execute commands, and observe the results. The specific tools differ; the loop is the same.

In this project you will build that loop yourself. By the end, you will have a working AI coding agent that can:

  • Read any file in your project directory
  • Write and edit source files
  • Execute shell commands (run tests, install packages, run scripts)
  • List directory contents
  • Search files for patterns
  • Iterate on failures until the task is complete

This is not a wrapper around Claude Code. You are building the agent loop from scratch so you understand exactly how it works — and so you can extend it for your own use cases.


Prerequisites

bash
1pip install anthropic

Python 3.11 or later. Set your API key:

bash
1export ANTHROPIC_API_KEY="your-api-key"

You should already understand how Claude's tool use works. If not, read Claude Tool Use Explained first.


The Architecture

The agent is built from three layers:

┌─────────────────────────────────────────────┐ │ Tool Executor │ │ (safely runs the tools Claude requests) │ ├─────────────────────────────────────────────┤ │ Agent Loop │ │ (sends messages, handles tool calls, │ │ feeds results back, iterates) │ ├─────────────────────────────────────────────┤ │ Claude API │ │ (reasons, plans, decides which tools to │ │ call and with what arguments) │ └─────────────────────────────────────────────┘

Claude never executes code directly — it returns JSON specifying which tool to call and what arguments to pass. Your Python code runs the tool, captures the output, and returns it to Claude. Claude decides what to do next. This loop repeats until Claude returns a final response with no tool calls.


Step 1: Define the Tools

Tools are defined as JSON schemas. Claude reads these to understand what is available and what arguments each tool expects.

python
1# agent/tools.py 2 3TOOLS = [ 4 { 5 "name": "read_file", 6 "description": ( 7 "Read the complete contents of a file. " 8 "Use this to understand existing code before making changes." 9 ), 10 "input_schema": { 11 "type": "object", 12 "properties": { 13 "path": { 14 "type": "string", 15 "description": "Relative path to the file from the project root" 16 } 17 }, 18 "required": ["path"] 19 } 20 }, 21 { 22 "name": "write_file", 23 "description": ( 24 "Write content to a file, creating it if it does not exist " 25 "or overwriting it if it does. Use for creating new files or " 26 "replacing file contents entirely." 27 ), 28 "input_schema": { 29 "type": "object", 30 "properties": { 31 "path": {"type": "string", "description": "Relative path to the file"}, 32 "content": {"type": "string", "description": "The complete file content to write"} 33 }, 34 "required": ["path", "content"] 35 } 36 }, 37 { 38 "name": "edit_file", 39 "description": ( 40 "Replace a specific string in a file with new content. " 41 "The old_string must exist exactly once in the file. " 42 "Use for targeted edits without rewriting the whole file." 43 ), 44 "input_schema": { 45 "type": "object", 46 "properties": { 47 "path": {"type": "string"}, 48 "old_string": {"type": "string", "description": "Exact text to find and replace"}, 49 "new_string": {"type": "string", "description": "Replacement text"} 50 }, 51 "required": ["path", "old_string", "new_string"] 52 } 53 }, 54 { 55 "name": "run_command", 56 "description": ( 57 "Execute a shell command in the project directory. " 58 "Use for running tests, installing packages, or executing scripts. " 59 "Returns stdout, stderr, and exit code." 60 ), 61 "input_schema": { 62 "type": "object", 63 "properties": { 64 "command": { 65 "type": "string", 66 "description": "Shell command to run (e.g. 'pytest tests/', 'pip install requests')" 67 }, 68 "timeout_seconds": { 69 "type": "integer", 70 "description": "Maximum execution time in seconds (default: 30)", 71 "default": 30 72 } 73 }, 74 "required": ["command"] 75 } 76 }, 77 { 78 "name": "list_directory", 79 "description": "List files and directories at a given path.", 80 "input_schema": { 81 "type": "object", 82 "properties": { 83 "path": { 84 "type": "string", 85 "description": "Relative path to list (use '.' for project root)" 86 } 87 }, 88 "required": ["path"] 89 } 90 }, 91 { 92 "name": "search_files", 93 "description": ( 94 "Search for a string pattern across all files in the project. " 95 "Returns matching file paths and line numbers." 96 ), 97 "input_schema": { 98 "type": "object", 99 "properties": { 100 "pattern": {"type": "string", "description": "String or regex to search for"}, 101 "file_extension": { 102 "type": "string", 103 "description": "Filter by file extension (e.g. '.py', '.ts'). Optional.", 104 "default": "" 105 } 106 }, 107 "required": ["pattern"] 108 } 109 } 110]

Step 2: Implement the Tool Executor

The tool executor is the bridge between Claude's JSON requests and your filesystem. Security is paramount here — restrict all file operations to the project directory.

python
1# agent/executor.py 2import os 3import re 4import subprocess 5from pathlib import Path 6 7 8class ToolExecutor: 9 """ 10 Executes tools requested by Claude. 11 All file operations are restricted to project_root. 12 """ 13 14 def __init__(self, project_root: str): 15 self.root = Path(project_root).resolve() 16 17 def _safe_path(self, relative_path: str) -> Path: 18 """ 19 Resolve a relative path and verify it stays within project root. 20 Raises ValueError if the path would escape the project directory. 21 """ 22 resolved = (self.root / relative_path).resolve() 23 if not str(resolved).startswith(str(self.root)): 24 raise ValueError( 25 f"Path traversal blocked: '{relative_path}' resolves outside project root" 26 ) 27 return resolved 28 29 def read_file(self, path: str) -> str: 30 safe = self._safe_path(path) 31 if not safe.exists(): 32 return f"ERROR: File does not exist: {path}" 33 if not safe.is_file(): 34 return f"ERROR: Path is not a file: {path}" 35 try: 36 return safe.read_text(encoding="utf-8") 37 except Exception as e: 38 return f"ERROR: Could not read file: {e}" 39 40 def write_file(self, path: str, content: str) -> str: 41 safe = self._safe_path(path) 42 safe.parent.mkdir(parents=True, exist_ok=True) 43 safe.write_text(content, encoding="utf-8") 44 return f"OK: Written {len(content)} bytes to {path}" 45 46 def edit_file(self, path: str, old_string: str, new_string: str) -> str: 47 safe = self._safe_path(path) 48 if not safe.exists(): 49 return f"ERROR: File does not exist: {path}" 50 content = safe.read_text(encoding="utf-8") 51 count = content.count(old_string) 52 if count == 0: 53 return f"ERROR: old_string not found in {path}" 54 if count > 1: 55 return f"ERROR: old_string appears {count} times in {path} — must be unique" 56 new_content = content.replace(old_string, new_string, 1) 57 safe.write_text(new_content, encoding="utf-8") 58 return f"OK: Replaced 1 occurrence in {path}" 59 60 def run_command(self, command: str, timeout_seconds: int = 30) -> str: 61 # Block obviously dangerous commands 62 blocked = ["rm -rf", "sudo", "mkfs", "dd if=", ":(){:|:&};:"] 63 for blocked_cmd in blocked: 64 if blocked_cmd in command.lower(): 65 return f"ERROR: Command blocked for safety: contains '{blocked_cmd}'" 66 67 try: 68 result = subprocess.run( 69 command, 70 shell=True, 71 cwd=str(self.root), 72 capture_output=True, 73 text=True, 74 timeout=timeout_seconds 75 ) 76 output = [] 77 if result.stdout: 78 output.append(f"STDOUT:\n{result.stdout}") 79 if result.stderr: 80 output.append(f"STDERR:\n{result.stderr}") 81 output.append(f"EXIT CODE: {result.returncode}") 82 return "\n".join(output) if output else f"EXIT CODE: {result.returncode}" 83 except subprocess.TimeoutExpired: 84 return f"ERROR: Command timed out after {timeout_seconds} seconds" 85 except Exception as e: 86 return f"ERROR: {e}" 87 88 def list_directory(self, path: str) -> str: 89 safe = self._safe_path(path) 90 if not safe.exists(): 91 return f"ERROR: Path does not exist: {path}" 92 lines = [] 93 for item in sorted(safe.iterdir()): 94 rel = item.relative_to(self.root) 95 marker = "/" if item.is_dir() else "" 96 lines.append(f"{rel}{marker}") 97 return "\n".join(lines) if lines else "(empty directory)" 98 99 def search_files(self, pattern: str, file_extension: str = "") -> str: 100 matches = [] 101 for file_path in self.root.rglob("*"): 102 if not file_path.is_file(): 103 continue 104 if file_extension and not file_path.suffix == file_extension: 105 continue 106 # Skip common non-source directories 107 parts = file_path.parts 108 if any(p in parts for p in [".git", "node_modules", "__pycache__", ".venv", "venv"]): 109 continue 110 try: 111 content = file_path.read_text(encoding="utf-8", errors="ignore") 112 for i, line in enumerate(content.splitlines(), 1): 113 if re.search(pattern, line): 114 rel = file_path.relative_to(self.root) 115 matches.append(f"{rel}:{i}: {line.strip()}") 116 except Exception: 117 continue 118 if not matches: 119 return f"No matches found for pattern: {pattern}" 120 return "\n".join(matches[:50]) # cap at 50 matches 121 122 def execute(self, tool_name: str, tool_input: dict) -> str: 123 """Dispatch a tool call from Claude.""" 124 dispatch = { 125 "read_file": lambda i: self.read_file(i["path"]), 126 "write_file": lambda i: self.write_file(i["path"], i["content"]), 127 "edit_file": lambda i: self.edit_file(i["path"], i["old_string"], i["new_string"]), 128 "run_command": lambda i: self.run_command(i["command"], i.get("timeout_seconds", 30)), 129 "list_directory": lambda i: self.list_directory(i["path"]), 130 "search_files": lambda i: self.search_files(i["pattern"], i.get("file_extension", "")), 131 } 132 if tool_name not in dispatch: 133 return f"ERROR: Unknown tool: {tool_name}" 134 try: 135 return dispatch[tool_name](tool_input) 136 except (KeyError, TypeError) as e: 137 return f"ERROR: Invalid arguments for {tool_name}: {e}"

Path Traversal Protection

The _safe_path method resolves the full absolute path and checks it starts with the project root. This blocks directory traversal attacks like '../../etc/passwd'. Never skip this check — Claude may produce unexpected paths when reasoning about a task.


    Step 3: The Agent Loop

    The agent loop is the core of the system. It sends messages to Claude, handles tool calls, feeds results back, and repeats until Claude signals completion.

    python
    1# agent/loop.py 2import anthropic 3from .tools import TOOLS 4from .executor import ToolExecutor 5 6SYSTEM_PROMPT = """You are an expert software engineer and coding agent. 7 8Your job is to complete coding tasks by using the tools available to you: 9- Read files to understand the existing codebase before making changes 10- Write and edit files to implement the requested changes 11- Run commands to verify your changes work (run tests, execute scripts) 12- Search files to find relevant code 13- List directories to understand project structure 14 15Guidelines: 161. ALWAYS read relevant files before making changes to understand existing patterns 172. Make targeted, minimal changes — don't rewrite files unnecessarily 183. After writing code, run tests to verify correctness 194. If tests fail, read the error output carefully and fix the root cause 205. Keep iterating until tests pass or you determine the task is complete 216. Communicate clearly what you are doing and why at each step 22 23You are operating in a sandboxed environment. All file operations are restricted to the project directory.""" 24 25 26class CodingAgent: 27 def __init__( 28 self, 29 project_root: str, 30 model: str = "claude-sonnet-4-6", 31 max_iterations: int = 30, 32 verbose: bool = True, 33 ): 34 self.client = anthropic.Anthropic() 35 self.executor = ToolExecutor(project_root) 36 self.model = model 37 self.max_iterations = max_iterations 38 self.verbose = verbose 39 40 def _log(self, msg: str) -> None: 41 if self.verbose: 42 print(msg) 43 44 def run(self, task: str) -> str: 45 """ 46 Run the agent on a task description. 47 Returns Claude's final response text. 48 """ 49 self._log(f"\n{'='*60}") 50 self._log(f"TASK: {task}") 51 self._log(f"{'='*60}\n") 52 53 messages = [{"role": "user", "content": task}] 54 iteration = 0 55 56 while iteration < self.max_iterations: 57 iteration += 1 58 self._log(f"[Iteration {iteration}] Calling Claude...") 59 60 response = self.client.messages.create( 61 model=self.model, 62 max_tokens=8096, 63 system=SYSTEM_PROMPT, 64 tools=TOOLS, 65 messages=messages, 66 ) 67 68 self._log(f"[Iteration {iteration}] Stop reason: {response.stop_reason}") 69 70 # Append Claude's response to message history 71 messages.append({"role": "assistant", "content": response.content}) 72 73 # If no tool calls, Claude is done 74 if response.stop_reason == "end_turn": 75 # Extract the final text response 76 final_text = "" 77 for block in response.content: 78 if hasattr(block, "text"): 79 final_text += block.text 80 self._log(f"\n[DONE] Final response:\n{final_text}") 81 return final_text 82 83 # Process all tool calls in this response 84 if response.stop_reason == "tool_use": 85 tool_results = [] 86 87 for block in response.content: 88 if block.type != "tool_use": 89 continue 90 91 tool_name = block.name 92 tool_input = block.input 93 tool_use_id = block.id 94 95 self._log(f"\n[Tool] {tool_name}({tool_input})") 96 97 # Execute the tool 98 result = self.executor.execute(tool_name, tool_input) 99 100 # Truncate very long outputs to avoid context overflow 101 if len(result) > 8000: 102 result = result[:8000] + "\n... [output truncated at 8000 chars]" 103 104 self._log(f"[Result] {result[:200]}{'...' if len(result) > 200 else ''}") 105 106 tool_results.append({ 107 "type": "tool_result", 108 "tool_use_id": tool_use_id, 109 "content": result, 110 }) 111 112 # Return tool results to Claude 113 messages.append({"role": "user", "content": tool_results}) 114 115 return f"Agent stopped after {self.max_iterations} iterations without completing the task."

    Step 4: Run It on a Real Task

    Create a test project to run the agent against:

    bash
    1mkdir test_project 2cd test_project 3 4# Create a simple Python module with a bug 5cat > calculator.py << 'EOF' 6def add(a, b): 7 return a + b 8 9def subtract(a, b): 10 return a - b 11 12def multiply(a, b): 13 return a * b 14 15def divide(a, b): 16 return a / b # Bug: no division by zero check 17EOF 18 19# Create a test file with a failing test 20cat > test_calculator.py << 'EOF' 21import pytest 22from calculator import add, subtract, multiply, divide 23 24def test_add(): 25 assert add(2, 3) == 5 26 27def test_subtract(): 28 assert subtract(10, 4) == 6 29 30def test_multiply(): 31 assert multiply(3, 4) == 12 32 33def test_divide_normal(): 34 assert divide(10, 2) == 5.0 35 36def test_divide_by_zero(): 37 with pytest.raises(ValueError, match="Cannot divide by zero"): 38 divide(10, 0) 39EOF

    Now run the agent:

    python
    1# run_agent.py 2from agent.loop import CodingAgent 3 4agent = CodingAgent(project_root="./test_project", verbose=True) 5 6result = agent.run( 7 "The test_divide_by_zero test is failing. " 8 "Fix the divide function in calculator.py to raise a ValueError " 9 "with the message 'Cannot divide by zero' when the divisor is zero. " 10 "Run the tests to confirm all 5 pass." 11)

    What you will observe:

    ============================================================ TASK: The test_divide_by_zero test is failing. Fix the divide function... ============================================================ [Iteration 1] Calling Claude... [Iteration 1] Stop reason: tool_use [Tool] read_file({'path': 'calculator.py'}) [Result] def add(a, b): return a + b... [Tool] read_file({'path': 'test_calculator.py'}) [Result] import pytest from calculator import add... [Iteration 2] Calling Claude... [Iteration 2] Stop reason: tool_use [Tool] edit_file({'path': 'calculator.py', 'old_string': 'def divide(a, b):\n return a / b # Bug: no division by zero check', 'new_string': 'def divide(a, b):\n if b == 0:\n raise ValueError("Cannot divide by zero")\n return a / b'}) [Result] OK: Replaced 1 occurrence in calculator.py [Iteration 3] Calling Claude... [Iteration 3] Stop reason: tool_use [Tool] run_command({'command': 'pytest test_calculator.py -v'}) [Result] STDOUT: test_calculator.py::test_add PASSED test_calculator.py::test_subtract PASSED test_calculator.py::test_multiply PASSED test_calculator.py::test_divide_normal PASSED test_calculator.py::test_divide_by_zero PASSED 5 passed in 0.12s EXIT CODE: 0 [Iteration 4] Calling Claude... [Iteration 4] Stop reason: end_turn [DONE] Final response: All 5 tests are now passing. I fixed the `divide` function in `calculator.py` by adding a check for division by zero before performing the division. When `b == 0`, the function now raises a `ValueError` with the message "Cannot divide by zero" as expected by the test.

    The agent autonomously read both files, identified the fix, edited the file, ran the tests, confirmed all 5 passed, and reported completion. Total: 4 iterations, no human intervention.


    Step 5: Try a More Complex Task

    python
    1result = agent.run( 2 "Add a square_root function to calculator.py. " 3 "It should raise a ValueError('Cannot take square root of negative number') " 4 "for negative inputs. " 5 "Write comprehensive tests for it in test_calculator.py. " 6 "Make sure all tests pass." 7)

    The agent will: read calculator.py, decide where to add the function, check for import math or add it, write the function, read test_calculator.py, add test cases, run all tests, fix any issues, confirm pass.


    Common Failure Modes and Fixes

    Agent loops without progress: Add a stagnation check — if the same tool is called with the same arguments twice, break the loop:

    python
    1# In the agent loop, track tool call history 2tool_call_history = set() 3 4for block in response.content: 5 if block.type == "tool_use": 6 call_sig = f"{block.name}:{json.dumps(block.input, sort_keys=True)}" 7 if call_sig in tool_call_history: 8 return "Agent is stuck in a loop — duplicate tool call detected." 9 tool_call_history.add(call_sig)

    Context window overflow on large files: Truncate file reads over a character limit and tell Claude:

    python
    1MAX_FILE_CHARS = 20_000 2 3def read_file(self, path: str) -> str: 4 content = safe.read_text(encoding="utf-8") 5 if len(content) > MAX_FILE_CHARS: 6 return content[:MAX_FILE_CHARS] + f"\n... [truncated — file is {len(content)} chars total]" 7 return content

    Dangerous command execution: Extend the blocklist in run_command and consider adding an approval step for destructive operations in production agents.


    Full Project Structure

    agent/ ├── __init__.py ├── tools.py ← Tool definitions (JSON schemas) ├── executor.py ← Tool implementations (filesystem + shell) └── loop.py ← Agent loop (Claude API + message history) test_project/ ├── calculator.py ← Target codebase └── test_calculator.py run_agent.py ← Entry point

    Key Takeaways

    • An AI coding agent is an agentic loop: Claude decides which tools to call → your code executes them → results feed back to Claude → repeat
    • Claude never executes code directly — it returns tool call requests as structured JSON; your code runs the tools safely
    • Path traversal protection in the file tools is non-negotiable — always resolve and validate paths before any filesystem operation
    • Blocklisting dangerous commands in the shell tool prevents the agent from accidentally executing destructive operations
    • The agent works best on well-defined tasks with testable outcomes — it can iterate on failing tests and confirm success automatically
    • Adding context truncation and loop detection makes your agent significantly more robust in production

    What's Next in the AI Coding Agents Series

    1. What Are AI Coding Agents?
    2. AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
    3. Build Your First AI Coding Agent with the Claude API ← you are here
    4. Build an Automated GitHub PR Review Agent
    5. Build an Autonomous Bug Fixer Agent
    6. AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production

    This post is part of the AI Coding Agents Series. Previous post: AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code.