Artificial IntelligenceSoftware DevelopmentProjects

Build an Autonomous Bug Fixer Agent with Claude

TT
TopicTrick
Build an Autonomous Bug Fixer Agent with Claude

Debugging is expensive. A developer encounters a bug report, finds the failing test, reads the code, forms a hypothesis, makes a change, re-runs the tests, finds it did not work, and tries again. This loop can take 30 minutes for a simple bug and hours for a subtle one.

A bug fixer agent compresses that loop. Given a failing test (or an error description), the agent reads the relevant code, reasons about the root cause, applies a fix, runs the tests, and keeps iterating until the tests pass. For well-defined, reproducible bugs — the kind that come with a failing test — this process can be fully automated.

In this project you will build an autonomous bug fixer agent that accepts a bug report (described in natural language, or as a failing test command), explores the codebase, fixes the bug, verifies the fix with tests, and produces a clean diff of its changes.

This project extends the agent loop architecture from Build Your First AI Coding Agent. If you have not built that agent yet, read that post first — this project reuses the same ToolExecutor and TOOLS definitions.


Prerequisites

bash
1pip install anthropic

Reuse the agent/tools.py and agent/executor.py from the previous project. This post adds the bug fixer's specialised loop and prompt on top of that foundation.


What Makes a Bug Fixer Different from a General Agent

A general coding agent handles open-ended tasks. A bug fixer has a specific, measurable success condition: all targeted tests pass. This tighter loop allows for a more focused architecture:

  1. Reproduce: run the failing test(s) to confirm the failure and capture the error
  2. Explore: read the relevant source files to understand the code involved
  3. Hypothesise: reason about what is causing the failure
  4. Fix: make a targeted change to address the root cause
  5. Verify: run the tests again to confirm the fix works
  6. Check regressions: run the full test suite to confirm nothing else broke
  7. Report: produce a diff and explanation

The agent iterates steps 3–5 until either the tests pass or it exhausts its retry limit.


Step 1: Bug Fixer System Prompt

The system prompt is more constrained than a general agent — it focuses the model on root cause analysis and minimal, targeted fixes.

python
1# bug_fixer/prompts.py 2 3BUG_FIXER_SYSTEM = """You are an expert software engineer specialising in debugging and root cause analysis. 4 5Your job is to fix bugs in a codebase. You have access to tools that let you read files, edit files, run commands, search code, and list directories. 6 7## Your Process 8 91. **Reproduce first**: Always run the failing test(s) before touch anything to confirm the failure and see the exact error 102. **Read before writing**: Read the relevant source files to understand the code before making any changes 113. **Minimal fixes**: Make the smallest possible change that fixes the bug. Do not refactor, improve, or extend code beyond what is necessary 124. **Fix root causes**: Don't mask bugs with try/except. Fix the actual problem 135. **Verify your fix**: After every edit, run the failing tests to check if they pass 146. **Check for regressions**: Once the target tests pass, run the full test suite to ensure no regressions 157. **Stop when done**: Once all targeted tests pass and the full suite passes, stop. Do not continue making changes 16 17## Rules 18 19- Never delete tests to make them pass 20- Never change test assertions to make them pass — fix the source code, not the tests 21- If you cannot find the root cause after 5 attempts, explain clearly what you tried and why you are stuck 22- Be explicit about your reasoning at each step 23 24When you have successfully fixed the bug, end your response with exactly: 25BUG_FIXED: <one-line description of what you changed and why> 26 27If you cannot fix the bug, end with: 28BUG_UNFIXED: <explanation of what you tried and why you could not resolve it>""" 29 30 31def build_fix_prompt(bug_report: str, test_command: str | None = None) -> str: 32 parts = [f"## Bug Report\n\n{bug_report}"] 33 if test_command: 34 parts.append(f"## Test Command\n\nRun this command to reproduce the failure:\n```\n{test_command}\n```") 35 parts.append( 36 "\nStart by running the failing tests to see the exact error, " 37 "then explore the relevant code to identify and fix the root cause." 38 ) 39 return "\n\n".join(parts)

Step 2: The Bug Fixer Agent

python
1# bug_fixer/agent.py 2import re 3import subprocess 4from pathlib import Path 5from dataclasses import dataclass, field 6import anthropic 7 8from agent.tools import TOOLS 9from agent.executor import ToolExecutor 10from bug_fixer.prompts import BUG_FIXER_SYSTEM, build_fix_prompt 11 12 13@dataclass 14class BugFixResult: 15 success: bool 16 description: str 17 iterations: int 18 files_changed: list[str] = field(default_factory=list) 19 diff: str = "" 20 error: str = "" 21 22 23class BugFixerAgent: 24 def __init__( 25 self, 26 project_root: str, 27 model: str = "claude-sonnet-4-6", 28 max_iterations: int = 20, 29 verbose: bool = True, 30 ): 31 self.root = Path(project_root).resolve() 32 self.client = anthropic.Anthropic() 33 self.executor = ToolExecutor(project_root) 34 self.model = model 35 self.max_iterations = max_iterations 36 self.verbose = verbose 37 38 def _log(self, msg: str) -> None: 39 if self.verbose: 40 print(msg) 41 42 def _snapshot(self) -> dict[str, str]: 43 """Capture current state of tracked source files for diff generation.""" 44 snapshot = {} 45 for ext in [".py", ".js", ".ts", ".go", ".java", ".rb", ".rs"]: 46 for path in self.root.rglob(f"*{ext}"): 47 parts = path.parts 48 if any(p in parts for p in [".git", "node_modules", "__pycache__", ".venv", "venv"]): 49 continue 50 try: 51 snapshot[str(path.relative_to(self.root))] = path.read_text(encoding="utf-8") 52 except Exception: 53 pass 54 return snapshot 55 56 def _compute_diff(self, before: dict[str, str], after: dict[str, str]) -> tuple[list[str], str]: 57 """Compute which files changed and produce a simple unified diff summary.""" 58 changed = [] 59 diff_lines = [] 60 61 for path, after_content in after.items(): 62 before_content = before.get(path, "") 63 if before_content != after_content: 64 changed.append(path) 65 diff_lines.append(f"--- a/{path}") 66 diff_lines.append(f"+++ b/{path}") 67 b_lines = before_content.splitlines() 68 a_lines = after_content.splitlines() 69 for b, a in zip(b_lines, a_lines): 70 if b != a: 71 diff_lines.append(f"- {b}") 72 diff_lines.append(f"+ {a}") 73 74 # New files created by the agent 75 for path in set(after.keys()) - set(before.keys()): 76 changed.append(path) 77 diff_lines.append(f"NEW FILE: {path}") 78 79 return changed, "\n".join(diff_lines) 80 81 def fix(self, bug_report: str, test_command: str | None = None) -> BugFixResult: 82 """ 83 Run the bug fixer on a bug report. 84 Returns a BugFixResult with success status, diff, and description. 85 """ 86 self._log(f"\n{'='*60}") 87 self._log(f"BUG REPORT: {bug_report[:100]}...") 88 self._log(f"{'='*60}\n") 89 90 # Snapshot before state 91 before_snapshot = self._snapshot() 92 93 messages = [ 94 {"role": "user", "content": build_fix_prompt(bug_report, test_command)} 95 ] 96 97 iteration = 0 98 final_text = "" 99 100 while iteration < self.max_iterations: 101 iteration += 1 102 self._log(f"[Iteration {iteration}/{self.max_iterations}]") 103 104 response = self.client.messages.create( 105 model=self.model, 106 max_tokens=8096, 107 system=BUG_FIXER_SYSTEM, 108 tools=TOOLS, 109 messages=messages, 110 ) 111 112 messages.append({"role": "assistant", "content": response.content}) 113 114 # Extract any text blocks for logging 115 for block in response.content: 116 if hasattr(block, "text") and block.text.strip(): 117 self._log(f"[Claude] {block.text[:300]}") 118 119 # Check for termination signals in text blocks 120 for block in response.content: 121 if hasattr(block, "text"): 122 if "BUG_FIXED:" in block.text or "BUG_UNFIXED:" in block.text: 123 final_text = block.text 124 125 if response.stop_reason == "end_turn": 126 break 127 128 # Execute tool calls 129 if response.stop_reason == "tool_use": 130 tool_results = [] 131 for block in response.content: 132 if block.type != "tool_use": 133 continue 134 self._log(f" [Tool] {block.name}({str(block.input)[:100]})") 135 result = self.executor.execute(block.name, block.input) 136 if len(result) > 6000: 137 result = result[:6000] + "\n...[truncated]" 138 self._log(f" [Result] {result[:150]}{'...' if len(result) > 150 else ''}") 139 tool_results.append({ 140 "type": "tool_result", 141 "tool_use_id": block.id, 142 "content": result, 143 }) 144 messages.append({"role": "user", "content": tool_results}) 145 146 # Snapshot after state and compute diff 147 after_snapshot = self._snapshot() 148 files_changed, diff = self._compute_diff(before_snapshot, after_snapshot) 149 150 # Determine success 151 success = "BUG_FIXED:" in final_text 152 description = "" 153 if success: 154 match = re.search(r"BUG_FIXED:\s*(.+)", final_text) 155 description = match.group(1).strip() if match else "Bug fixed" 156 else: 157 match = re.search(r"BUG_UNFIXED:\s*(.+)", final_text, re.DOTALL) 158 description = match.group(1).strip()[:500] if match else "Agent did not resolve the bug" 159 160 self._log(f"\n{'='*60}") 161 self._log(f"RESULT: {'SUCCESS' if success else 'FAILED'}") 162 self._log(f"FILES CHANGED: {files_changed}") 163 self._log(f"DESCRIPTION: {description}") 164 self._log(f"{'='*60}\n") 165 166 return BugFixResult( 167 success=success, 168 description=description, 169 iterations=iteration, 170 files_changed=files_changed, 171 diff=diff, 172 )

Step 3: Create a Test Codebase with Bugs

Let's set up a realistic project with multiple bugs to fix:

python
1# setup_test_project.py 2from pathlib import Path 3 4root = Path("./buggy_project") 5root.mkdir(exist_ok=True) 6 7# Source file with several bugs 8(root / "user_service.py").write_text(''' 9import hashlib 10from datetime import datetime 11 12 13class UserService: 14 def __init__(self): 15 self.users = {} 16 17 def create_user(self, username: str, email: str, password: str) -> dict: 18 """Create a new user account.""" 19 if username in self.users: 20 raise ValueError(f"Username {username} already exists") 21 22 # Bug 1: MD5 is not suitable for password hashing, but that's a quality issue. 23 # Actual bug: should hash the password, not store it plain 24 user = { 25 "username": username, 26 "email": email, 27 "password": password, # Bug: storing plain text password 28 "created_at": datetime.utcnow().isoformat(), 29 "active": True, 30 } 31 self.users[username] = user 32 return {"username": username, "email": email, "created_at": user["created_at"]} 33 34 def authenticate(self, username: str, password: str) -> bool: 35 """Check if username/password is correct.""" 36 if username not in self.users: 37 return False 38 return self.users[username]["password"] == password 39 40 def get_user(self, username: str) -> dict | None: 41 """Get a user by username. Returns None if not found.""" 42 return self.users.get(username) 43 44 def deactivate_user(self, username: str) -> None: 45 """Deactivate a user account.""" 46 if username not in self.users: 47 raise KeyError(f"User {username} not found") 48 self.users[username]["active"] = Flase # Bug 2: typo — Flase instead of False 49 50 def get_active_users(self) -> list[str]: 51 """Return list of active usernames.""" 52 # Bug 3: returns all users, not just active ones 53 return [u for u in self.users] 54''', encoding="utf-8") 55 56# Test file 57(root / "test_user_service.py").write_text(''' 58import pytest 59from user_service import UserService 60 61 62@pytest.fixture 63def service(): 64 return UserService() 65 66 67def test_create_user(service): 68 result = service.create_user("alice", "alice@example.com", "secret123") 69 assert result["username"] == "alice" 70 assert result["email"] == "alice@example.com" 71 assert "password" not in result # password must not be in return value 72 73def test_duplicate_user_raises(service): 74 service.create_user("alice", "alice@example.com", "secret") 75 with pytest.raises(ValueError): 76 service.create_user("alice", "other@example.com", "other") 77 78def test_authenticate_correct_password(service): 79 service.create_user("alice", "alice@example.com", "secret123") 80 assert service.authenticate("alice", "secret123") is True 81 82def test_authenticate_wrong_password(service): 83 service.create_user("alice", "alice@example.com", "secret123") 84 assert service.authenticate("alice", "wrongpassword") is False 85 86def test_authenticate_unknown_user(service): 87 assert service.authenticate("nobody", "anything") is False 88 89def test_deactivate_user(service): 90 service.create_user("alice", "alice@example.com", "secret") 91 service.deactivate_user("alice") 92 user = service.get_user("alice") 93 assert user["active"] is False # Tests the typo bug 94 95def test_deactivate_nonexistent_raises(service): 96 with pytest.raises(KeyError): 97 service.deactivate_user("nobody") 98 99def test_get_active_users(service): 100 service.create_user("alice", "alice@example.com", "secret") 101 service.create_user("bob", "bob@example.com", "secret") 102 service.deactivate_user("bob") 103 active = service.get_active_users() 104 assert "alice" in active 105 assert "bob" not in active # Tests the filtering bug 106''', encoding="utf-8") 107 108print("Test project created in ./buggy_project/") 109print("Run: cd buggy_project && pytest -v to see failures")

Run this to create the project, then see the failures:

bash
1python setup_test_project.py 2cd buggy_project 3pytest -v

You should see failures on test_deactivate_user (typo bug) and test_get_active_users (filtering bug).


Step 4: Run the Bug Fixer

python
1# run_bug_fixer.py 2from bug_fixer.agent import BugFixerAgent 3 4agent = BugFixerAgent(project_root="./buggy_project", verbose=True) 5 6result = agent.fix( 7 bug_report=( 8 "Two tests are failing in test_user_service.py:\n" 9 "1. test_deactivate_user — deactivating a user does not set active to False\n" 10 "2. test_get_active_users — get_active_users returns ALL users, not just active ones\n\n" 11 "Fix both bugs in user_service.py. Do not modify the tests." 12 ), 13 test_command="pytest test_user_service.py -v" 14) 15 16print(f"\nSuccess: {result.success}") 17print(f"Files changed: {result.files_changed}") 18print(f"Iterations: {result.iterations}") 19print(f"Description: {result.description}") 20if result.diff: 21 print(f"\nDiff:\n{result.diff}")

Expected agent behaviour:

[Iteration 1] [Tool] run_command({'command': 'pytest test_user_service.py -v'}) [Result] FAILED test_user_service.py::test_deactivate_user FAILED test_user_service.py::test_get_active_users ...NameError: name 'Flase' is not defined... [Iteration 2] [Tool] read_file({'path': 'user_service.py'}) [Result] ...def deactivate_user... self.users[username]["active"] = Flase... ...def get_active_users... return [u for u in self.users]... [Iteration 3] [Tool] edit_file({'path': 'user_service.py', 'old_string': '= Flase', 'new_string': '= False'}) [Tool] edit_file({'path': 'user_service.py', 'old_string': 'return [u for u in self.users]', 'new_string': 'return [u for u, data in self.users.items() if data["active"]]'}) [Iteration 4] [Tool] run_command({'command': 'pytest test_user_service.py -v'}) [Result] 8 passed in 0.08s [Iteration 5] [Tool] run_command({'command': 'pytest -v'}) [Result] 8 passed in 0.08s EXIT CODE: 0 BUG_FIXED: Fixed NameError typo 'Flase' → 'False' in deactivate_user, and fixed get_active_users to filter by active=True using dict.items() ============================================================ RESULT: SUCCESS FILES CHANGED: ['user_service.py']

Step 5: Handling Edge Cases

Bug Cannot Be Reproduced

Sometimes a bug report is vague. The agent handles this gracefully because it runs the tests first:

python
1result = agent.fix( 2 bug_report="Something is wrong with the user service. Users can't log in.", 3 test_command="pytest test_user_service.py::test_authenticate_correct_password -v" 4)

If the test passes, Claude will report it cannot reproduce the bug and describe what it checked.

Multiple Related Bugs in Different Files

python
1result = agent.fix( 2 bug_report=( 3 "pytest tests/ is reporting 3 failures across auth.py and validation.py. " 4 "The error messages are in the test output." 5 ), 6 test_command="pytest tests/ -v --tb=short" 7)

The agent will explore both files, identify related issues, and fix them in a single session.

No Test — Error Description Only

python
1result = agent.fix( 2 bug_report=( 3 "Running the app with `python app.py` produces this traceback:\n\n" 4 "AttributeError: 'NoneType' object has no attribute 'split'\n" 5 "File 'app.py', line 47, in parse_config\n" 6 " parts = config_value.split(',')\n\n" 7 "This happens when the CONFIG_VALUE environment variable is not set." 8 ), 9 test_command="python app.py" 10)

The agent will read app.py, find the relevant code, add a None check, and verify the fix.


Integrating with GitHub Issues

To automatically fix bugs reported as GitHub issues:

python
1# github_bug_fixer.py 2import os 3from github import Github 4from bug_fixer.agent import BugFixerAgent 5 6def fix_github_issue(repo_name: str, issue_number: int, project_root: str) -> None: 7 gh = Github(os.environ["GITHUB_TOKEN"]) 8 repo = gh.get_repo(repo_name) 9 issue = repo.get_issue(issue_number) 10 11 # Build bug report from issue 12 bug_report = f"Title: {issue.title}\n\n{issue.body or '(no description)'}" 13 14 # Run the fixer 15 agent = BugFixerAgent(project_root=project_root) 16 result = agent.fix(bug_report=bug_report) 17 18 # Post result as issue comment 19 if result.success: 20 comment = ( 21 f"🤖 **AI Bug Fixer**: Fixed in {result.iterations} iterations.\n\n" 22 f"**Change:** {result.description}\n\n" 23 f"**Files modified:** {', '.join(result.files_changed)}\n\n" 24 f"A PR has been opened with the fix." 25 ) 26 else: 27 comment = ( 28 f"🤖 **AI Bug Fixer**: Could not automatically fix this issue after " 29 f"{result.iterations} iterations.\n\n" 30 f"**What was tried:** {result.description}\n\n" 31 f"This bug requires human investigation." 32 ) 33 34 issue.create_comment(comment)

Key Takeaways

  • A bug fixer agent differs from a general agent in having a measurable success condition — tests passing
  • Always reproduce first: running the failing test before any edits anchors every subsequent decision in evidence
  • Minimal changes are the key constraint — agents that over-fix introduce regressions. The system prompt must enforce this explicitly
  • Termination signals (BUG_FIXED: / BUG_UNFIXED:) give the orchestrator a reliable way to parse the outcome without LLM parsing of free-form text
  • Regression testing after the targeted fix is non-optional — agents can fix one thing and break another
  • Diff generation lets you review exactly what the agent changed before merging to production

What's Next in the AI Coding Agents Series

  1. What Are AI Coding Agents?
  2. AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
  3. Build Your First AI Coding Agent with the Claude API
  4. Build an Automated GitHub PR Review Agent
  5. Build an Autonomous Bug Fixer Agent ← you are here
  6. AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production

This post is part of the AI Coding Agents Series. Previous post: Build an Automated GitHub PR Review Agent.