Build an Autonomous Bug Fixer Agent with Claude

Debugging is expensive. A developer encounters a bug report, finds the failing test, reads the code, forms a hypothesis, makes a change, re-runs the tests, finds it did not work, and tries again. This loop can take 30 minutes for a simple bug and hours for a subtle one.
A bug fixer agent compresses that loop. Given a failing test (or an error description), the agent reads the relevant code, reasons about the root cause, applies a fix, runs the tests, and keeps iterating until the tests pass. For well-defined, reproducible bugs — the kind that come with a failing test — this process can be fully automated.
In this project you will build an autonomous bug fixer agent that accepts a bug report (described in natural language, or as a failing test command), explores the codebase, fixes the bug, verifies the fix with tests, and produces a clean diff of its changes.
This project extends the agent loop architecture from Build Your First AI Coding Agent. If you have not built that agent yet, read that post first — this project reuses the same ToolExecutor and TOOLS definitions.
Prerequisites
1pip install anthropicReuse the agent/tools.py and agent/executor.py from the previous project. This post adds the bug fixer's specialised loop and prompt on top of that foundation.
What Makes a Bug Fixer Different from a General Agent
A general coding agent handles open-ended tasks. A bug fixer has a specific, measurable success condition: all targeted tests pass. This tighter loop allows for a more focused architecture:
- Reproduce: run the failing test(s) to confirm the failure and capture the error
- Explore: read the relevant source files to understand the code involved
- Hypothesise: reason about what is causing the failure
- Fix: make a targeted change to address the root cause
- Verify: run the tests again to confirm the fix works
- Check regressions: run the full test suite to confirm nothing else broke
- Report: produce a diff and explanation
The agent iterates steps 3–5 until either the tests pass or it exhausts its retry limit.
Step 1: Bug Fixer System Prompt
The system prompt is more constrained than a general agent — it focuses the model on root cause analysis and minimal, targeted fixes.
1# bug_fixer/prompts.py
2
3BUG_FIXER_SYSTEM = """You are an expert software engineer specialising in debugging and root cause analysis.
4
5Your job is to fix bugs in a codebase. You have access to tools that let you read files, edit files, run commands, search code, and list directories.
6
7## Your Process
8
91. **Reproduce first**: Always run the failing test(s) before touch anything to confirm the failure and see the exact error
102. **Read before writing**: Read the relevant source files to understand the code before making any changes
113. **Minimal fixes**: Make the smallest possible change that fixes the bug. Do not refactor, improve, or extend code beyond what is necessary
124. **Fix root causes**: Don't mask bugs with try/except. Fix the actual problem
135. **Verify your fix**: After every edit, run the failing tests to check if they pass
146. **Check for regressions**: Once the target tests pass, run the full test suite to ensure no regressions
157. **Stop when done**: Once all targeted tests pass and the full suite passes, stop. Do not continue making changes
16
17## Rules
18
19- Never delete tests to make them pass
20- Never change test assertions to make them pass — fix the source code, not the tests
21- If you cannot find the root cause after 5 attempts, explain clearly what you tried and why you are stuck
22- Be explicit about your reasoning at each step
23
24When you have successfully fixed the bug, end your response with exactly:
25BUG_FIXED: <one-line description of what you changed and why>
26
27If you cannot fix the bug, end with:
28BUG_UNFIXED: <explanation of what you tried and why you could not resolve it>"""
29
30
31def build_fix_prompt(bug_report: str, test_command: str | None = None) -> str:
32 parts = [f"## Bug Report\n\n{bug_report}"]
33 if test_command:
34 parts.append(f"## Test Command\n\nRun this command to reproduce the failure:\n```\n{test_command}\n```")
35 parts.append(
36 "\nStart by running the failing tests to see the exact error, "
37 "then explore the relevant code to identify and fix the root cause."
38 )
39 return "\n\n".join(parts)Step 2: The Bug Fixer Agent
1# bug_fixer/agent.py
2import re
3import subprocess
4from pathlib import Path
5from dataclasses import dataclass, field
6import anthropic
7
8from agent.tools import TOOLS
9from agent.executor import ToolExecutor
10from bug_fixer.prompts import BUG_FIXER_SYSTEM, build_fix_prompt
11
12
13@dataclass
14class BugFixResult:
15 success: bool
16 description: str
17 iterations: int
18 files_changed: list[str] = field(default_factory=list)
19 diff: str = ""
20 error: str = ""
21
22
23class BugFixerAgent:
24 def __init__(
25 self,
26 project_root: str,
27 model: str = "claude-sonnet-4-6",
28 max_iterations: int = 20,
29 verbose: bool = True,
30 ):
31 self.root = Path(project_root).resolve()
32 self.client = anthropic.Anthropic()
33 self.executor = ToolExecutor(project_root)
34 self.model = model
35 self.max_iterations = max_iterations
36 self.verbose = verbose
37
38 def _log(self, msg: str) -> None:
39 if self.verbose:
40 print(msg)
41
42 def _snapshot(self) -> dict[str, str]:
43 """Capture current state of tracked source files for diff generation."""
44 snapshot = {}
45 for ext in [".py", ".js", ".ts", ".go", ".java", ".rb", ".rs"]:
46 for path in self.root.rglob(f"*{ext}"):
47 parts = path.parts
48 if any(p in parts for p in [".git", "node_modules", "__pycache__", ".venv", "venv"]):
49 continue
50 try:
51 snapshot[str(path.relative_to(self.root))] = path.read_text(encoding="utf-8")
52 except Exception:
53 pass
54 return snapshot
55
56 def _compute_diff(self, before: dict[str, str], after: dict[str, str]) -> tuple[list[str], str]:
57 """Compute which files changed and produce a simple unified diff summary."""
58 changed = []
59 diff_lines = []
60
61 for path, after_content in after.items():
62 before_content = before.get(path, "")
63 if before_content != after_content:
64 changed.append(path)
65 diff_lines.append(f"--- a/{path}")
66 diff_lines.append(f"+++ b/{path}")
67 b_lines = before_content.splitlines()
68 a_lines = after_content.splitlines()
69 for b, a in zip(b_lines, a_lines):
70 if b != a:
71 diff_lines.append(f"- {b}")
72 diff_lines.append(f"+ {a}")
73
74 # New files created by the agent
75 for path in set(after.keys()) - set(before.keys()):
76 changed.append(path)
77 diff_lines.append(f"NEW FILE: {path}")
78
79 return changed, "\n".join(diff_lines)
80
81 def fix(self, bug_report: str, test_command: str | None = None) -> BugFixResult:
82 """
83 Run the bug fixer on a bug report.
84 Returns a BugFixResult with success status, diff, and description.
85 """
86 self._log(f"\n{'='*60}")
87 self._log(f"BUG REPORT: {bug_report[:100]}...")
88 self._log(f"{'='*60}\n")
89
90 # Snapshot before state
91 before_snapshot = self._snapshot()
92
93 messages = [
94 {"role": "user", "content": build_fix_prompt(bug_report, test_command)}
95 ]
96
97 iteration = 0
98 final_text = ""
99
100 while iteration < self.max_iterations:
101 iteration += 1
102 self._log(f"[Iteration {iteration}/{self.max_iterations}]")
103
104 response = self.client.messages.create(
105 model=self.model,
106 max_tokens=8096,
107 system=BUG_FIXER_SYSTEM,
108 tools=TOOLS,
109 messages=messages,
110 )
111
112 messages.append({"role": "assistant", "content": response.content})
113
114 # Extract any text blocks for logging
115 for block in response.content:
116 if hasattr(block, "text") and block.text.strip():
117 self._log(f"[Claude] {block.text[:300]}")
118
119 # Check for termination signals in text blocks
120 for block in response.content:
121 if hasattr(block, "text"):
122 if "BUG_FIXED:" in block.text or "BUG_UNFIXED:" in block.text:
123 final_text = block.text
124
125 if response.stop_reason == "end_turn":
126 break
127
128 # Execute tool calls
129 if response.stop_reason == "tool_use":
130 tool_results = []
131 for block in response.content:
132 if block.type != "tool_use":
133 continue
134 self._log(f" [Tool] {block.name}({str(block.input)[:100]})")
135 result = self.executor.execute(block.name, block.input)
136 if len(result) > 6000:
137 result = result[:6000] + "\n...[truncated]"
138 self._log(f" [Result] {result[:150]}{'...' if len(result) > 150 else ''}")
139 tool_results.append({
140 "type": "tool_result",
141 "tool_use_id": block.id,
142 "content": result,
143 })
144 messages.append({"role": "user", "content": tool_results})
145
146 # Snapshot after state and compute diff
147 after_snapshot = self._snapshot()
148 files_changed, diff = self._compute_diff(before_snapshot, after_snapshot)
149
150 # Determine success
151 success = "BUG_FIXED:" in final_text
152 description = ""
153 if success:
154 match = re.search(r"BUG_FIXED:\s*(.+)", final_text)
155 description = match.group(1).strip() if match else "Bug fixed"
156 else:
157 match = re.search(r"BUG_UNFIXED:\s*(.+)", final_text, re.DOTALL)
158 description = match.group(1).strip()[:500] if match else "Agent did not resolve the bug"
159
160 self._log(f"\n{'='*60}")
161 self._log(f"RESULT: {'SUCCESS' if success else 'FAILED'}")
162 self._log(f"FILES CHANGED: {files_changed}")
163 self._log(f"DESCRIPTION: {description}")
164 self._log(f"{'='*60}\n")
165
166 return BugFixResult(
167 success=success,
168 description=description,
169 iterations=iteration,
170 files_changed=files_changed,
171 diff=diff,
172 )Step 3: Create a Test Codebase with Bugs
Let's set up a realistic project with multiple bugs to fix:
1# setup_test_project.py
2from pathlib import Path
3
4root = Path("./buggy_project")
5root.mkdir(exist_ok=True)
6
7# Source file with several bugs
8(root / "user_service.py").write_text('''
9import hashlib
10from datetime import datetime
11
12
13class UserService:
14 def __init__(self):
15 self.users = {}
16
17 def create_user(self, username: str, email: str, password: str) -> dict:
18 """Create a new user account."""
19 if username in self.users:
20 raise ValueError(f"Username {username} already exists")
21
22 # Bug 1: MD5 is not suitable for password hashing, but that's a quality issue.
23 # Actual bug: should hash the password, not store it plain
24 user = {
25 "username": username,
26 "email": email,
27 "password": password, # Bug: storing plain text password
28 "created_at": datetime.utcnow().isoformat(),
29 "active": True,
30 }
31 self.users[username] = user
32 return {"username": username, "email": email, "created_at": user["created_at"]}
33
34 def authenticate(self, username: str, password: str) -> bool:
35 """Check if username/password is correct."""
36 if username not in self.users:
37 return False
38 return self.users[username]["password"] == password
39
40 def get_user(self, username: str) -> dict | None:
41 """Get a user by username. Returns None if not found."""
42 return self.users.get(username)
43
44 def deactivate_user(self, username: str) -> None:
45 """Deactivate a user account."""
46 if username not in self.users:
47 raise KeyError(f"User {username} not found")
48 self.users[username]["active"] = Flase # Bug 2: typo — Flase instead of False
49
50 def get_active_users(self) -> list[str]:
51 """Return list of active usernames."""
52 # Bug 3: returns all users, not just active ones
53 return [u for u in self.users]
54''', encoding="utf-8")
55
56# Test file
57(root / "test_user_service.py").write_text('''
58import pytest
59from user_service import UserService
60
61
62@pytest.fixture
63def service():
64 return UserService()
65
66
67def test_create_user(service):
68 result = service.create_user("alice", "alice@example.com", "secret123")
69 assert result["username"] == "alice"
70 assert result["email"] == "alice@example.com"
71 assert "password" not in result # password must not be in return value
72
73def test_duplicate_user_raises(service):
74 service.create_user("alice", "alice@example.com", "secret")
75 with pytest.raises(ValueError):
76 service.create_user("alice", "other@example.com", "other")
77
78def test_authenticate_correct_password(service):
79 service.create_user("alice", "alice@example.com", "secret123")
80 assert service.authenticate("alice", "secret123") is True
81
82def test_authenticate_wrong_password(service):
83 service.create_user("alice", "alice@example.com", "secret123")
84 assert service.authenticate("alice", "wrongpassword") is False
85
86def test_authenticate_unknown_user(service):
87 assert service.authenticate("nobody", "anything") is False
88
89def test_deactivate_user(service):
90 service.create_user("alice", "alice@example.com", "secret")
91 service.deactivate_user("alice")
92 user = service.get_user("alice")
93 assert user["active"] is False # Tests the typo bug
94
95def test_deactivate_nonexistent_raises(service):
96 with pytest.raises(KeyError):
97 service.deactivate_user("nobody")
98
99def test_get_active_users(service):
100 service.create_user("alice", "alice@example.com", "secret")
101 service.create_user("bob", "bob@example.com", "secret")
102 service.deactivate_user("bob")
103 active = service.get_active_users()
104 assert "alice" in active
105 assert "bob" not in active # Tests the filtering bug
106''', encoding="utf-8")
107
108print("Test project created in ./buggy_project/")
109print("Run: cd buggy_project && pytest -v to see failures")Run this to create the project, then see the failures:
1python setup_test_project.py
2cd buggy_project
3pytest -vYou should see failures on test_deactivate_user (typo bug) and test_get_active_users (filtering bug).
Step 4: Run the Bug Fixer
1# run_bug_fixer.py
2from bug_fixer.agent import BugFixerAgent
3
4agent = BugFixerAgent(project_root="./buggy_project", verbose=True)
5
6result = agent.fix(
7 bug_report=(
8 "Two tests are failing in test_user_service.py:\n"
9 "1. test_deactivate_user — deactivating a user does not set active to False\n"
10 "2. test_get_active_users — get_active_users returns ALL users, not just active ones\n\n"
11 "Fix both bugs in user_service.py. Do not modify the tests."
12 ),
13 test_command="pytest test_user_service.py -v"
14)
15
16print(f"\nSuccess: {result.success}")
17print(f"Files changed: {result.files_changed}")
18print(f"Iterations: {result.iterations}")
19print(f"Description: {result.description}")
20if result.diff:
21 print(f"\nDiff:\n{result.diff}")Expected agent behaviour:
[Iteration 1]
[Tool] run_command({'command': 'pytest test_user_service.py -v'})
[Result] FAILED test_user_service.py::test_deactivate_user
FAILED test_user_service.py::test_get_active_users
...NameError: name 'Flase' is not defined...
[Iteration 2]
[Tool] read_file({'path': 'user_service.py'})
[Result] ...def deactivate_user... self.users[username]["active"] = Flase...
...def get_active_users... return [u for u in self.users]...
[Iteration 3]
[Tool] edit_file({'path': 'user_service.py', 'old_string': '= Flase', 'new_string': '= False'})
[Tool] edit_file({'path': 'user_service.py',
'old_string': 'return [u for u in self.users]',
'new_string': 'return [u for u, data in self.users.items() if data["active"]]'})
[Iteration 4]
[Tool] run_command({'command': 'pytest test_user_service.py -v'})
[Result] 8 passed in 0.08s
[Iteration 5]
[Tool] run_command({'command': 'pytest -v'})
[Result] 8 passed in 0.08s
EXIT CODE: 0
BUG_FIXED: Fixed NameError typo 'Flase' → 'False' in deactivate_user,
and fixed get_active_users to filter by active=True using dict.items()
============================================================
RESULT: SUCCESS
FILES CHANGED: ['user_service.py']
Step 5: Handling Edge Cases
Bug Cannot Be Reproduced
Sometimes a bug report is vague. The agent handles this gracefully because it runs the tests first:
1result = agent.fix(
2 bug_report="Something is wrong with the user service. Users can't log in.",
3 test_command="pytest test_user_service.py::test_authenticate_correct_password -v"
4)If the test passes, Claude will report it cannot reproduce the bug and describe what it checked.
Multiple Related Bugs in Different Files
1result = agent.fix(
2 bug_report=(
3 "pytest tests/ is reporting 3 failures across auth.py and validation.py. "
4 "The error messages are in the test output."
5 ),
6 test_command="pytest tests/ -v --tb=short"
7)The agent will explore both files, identify related issues, and fix them in a single session.
No Test — Error Description Only
1result = agent.fix(
2 bug_report=(
3 "Running the app with `python app.py` produces this traceback:\n\n"
4 "AttributeError: 'NoneType' object has no attribute 'split'\n"
5 "File 'app.py', line 47, in parse_config\n"
6 " parts = config_value.split(',')\n\n"
7 "This happens when the CONFIG_VALUE environment variable is not set."
8 ),
9 test_command="python app.py"
10)The agent will read app.py, find the relevant code, add a None check, and verify the fix.
Integrating with GitHub Issues
To automatically fix bugs reported as GitHub issues:
1# github_bug_fixer.py
2import os
3from github import Github
4from bug_fixer.agent import BugFixerAgent
5
6def fix_github_issue(repo_name: str, issue_number: int, project_root: str) -> None:
7 gh = Github(os.environ["GITHUB_TOKEN"])
8 repo = gh.get_repo(repo_name)
9 issue = repo.get_issue(issue_number)
10
11 # Build bug report from issue
12 bug_report = f"Title: {issue.title}\n\n{issue.body or '(no description)'}"
13
14 # Run the fixer
15 agent = BugFixerAgent(project_root=project_root)
16 result = agent.fix(bug_report=bug_report)
17
18 # Post result as issue comment
19 if result.success:
20 comment = (
21 f"🤖 **AI Bug Fixer**: Fixed in {result.iterations} iterations.\n\n"
22 f"**Change:** {result.description}\n\n"
23 f"**Files modified:** {', '.join(result.files_changed)}\n\n"
24 f"A PR has been opened with the fix."
25 )
26 else:
27 comment = (
28 f"🤖 **AI Bug Fixer**: Could not automatically fix this issue after "
29 f"{result.iterations} iterations.\n\n"
30 f"**What was tried:** {result.description}\n\n"
31 f"This bug requires human investigation."
32 )
33
34 issue.create_comment(comment)Key Takeaways
- A bug fixer agent differs from a general agent in having a measurable success condition — tests passing
- Always reproduce first: running the failing test before any edits anchors every subsequent decision in evidence
- Minimal changes are the key constraint — agents that over-fix introduce regressions. The system prompt must enforce this explicitly
- Termination signals (
BUG_FIXED:/BUG_UNFIXED:) give the orchestrator a reliable way to parse the outcome without LLM parsing of free-form text - Regression testing after the targeted fix is non-optional — agents can fix one thing and break another
- Diff generation lets you review exactly what the agent changed before merging to production
What's Next in the AI Coding Agents Series
- What Are AI Coding Agents?
- AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
- Build Your First AI Coding Agent with the Claude API
- Build an Automated GitHub PR Review Agent
- Build an Autonomous Bug Fixer Agent ← you are here
- AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production
This post is part of the AI Coding Agents Series. Previous post: Build an Automated GitHub PR Review Agent.
