Artificial IntelligenceSoftware DevelopmentProjects

Build an Autonomous Bug Fixer Agent with Claude API

TT
TopicTrick Team
Build an Autonomous Bug Fixer Agent with Claude API

Debugging is expensive. A developer encounters a bug report, finds the failing test, reads the code, forms a hypothesis, makes a change, re-runs the tests, finds it did not work, and tries again. This loop can take 30 minutes for a simple bug and hours for a subtle one.

What Does an Autonomous Bug Fixer Agent Do?

A bug fixer agent accepts a bug report or failing test command, runs the tests to reproduce the failure, reads the relevant source files, identifies the root cause, applies a minimal targeted fix, re-runs the full test suite to verify no regressions, and outputs a diff — all autonomously. Built with Claude's tool use API, it compresses a 30-minute debugging cycle into 5 iterations.

A bug fixer agent compresses that loop. Given a failing test (or an error description), the agent reads the relevant code, reasons about the root cause, applies a fix, runs the tests, and keeps iterating until the tests pass. For well-defined, reproducible bugs — the kind that come with a failing test — this process can be fully automated.

In this project you will build an autonomous bug fixer agent that accepts a bug report (described in natural language, or as a failing test command), explores the codebase, fixes the bug, verifies the fix with tests, and produces a clean diff of its changes.

This project extends the agent loop architecture from Build Your First AI Coding Agent. If you have not built that agent yet, read that post first — this project reuses the same ToolExecutor and TOOLS definitions.


Prerequisites

bash
pip install anthropic

Reuse the agent/tools.py and agent/executor.py from the previous project. This post adds the bug fixer's specialised loop and prompt on top of that foundation.


What Makes a Bug Fixer Different from a General Agent

A general coding agent handles open-ended tasks. A bug fixer has a specific, measurable success condition: all targeted tests pass. This tighter loop allows for a more focused architecture:

  1. Reproduce: run the failing test(s) to confirm the failure and capture the error
  2. Explore: read the relevant source files to understand the code involved
  3. Hypothesise: reason about what is causing the failure
  4. Fix: make a targeted change to address the root cause
  5. Verify: run the tests again to confirm the fix works
  6. Check regressions: run the full test suite to confirm nothing else broke
  7. Report: produce a diff and explanation

The agent iterates steps 3–5 until either the tests pass or it exhausts its retry limit.


Step 1: Bug Fixer System Prompt

The system prompt is more constrained than a general agent — it focuses the model on root cause analysis and minimal, targeted fixes.

python
# bug_fixer/prompts.py

BUG_FIXER_SYSTEM = """You are an expert software engineer specialising in debugging and root cause analysis.

Your job is to fix bugs in a codebase. You have access to tools that let you read files, edit files, run commands, search code, and list directories.

## Your Process

1. **Reproduce first**: Always run the failing test(s) before touch anything to confirm the failure and see the exact error
2. **Read before writing**: Read the relevant source files to understand the code before making any changes
3. **Minimal fixes**: Make the smallest possible change that fixes the bug. Do not refactor, improve, or extend code beyond what is necessary
4. **Fix root causes**: Don't mask bugs with try/except. Fix the actual problem
5. **Verify your fix**: After every edit, run the failing tests to check if they pass
6. **Check for regressions**: Once the target tests pass, run the full test suite to ensure no regressions
7. **Stop when done**: Once all targeted tests pass and the full suite passes, stop. Do not continue making changes

## Rules

- Never delete tests to make them pass
- Never change test assertions to make them pass — fix the source code, not the tests
- If you cannot find the root cause after 5 attempts, explain clearly what you tried and why you are stuck
- Be explicit about your reasoning at each step

When you have successfully fixed the bug, end your response with exactly:
BUG_FIXED: <one-line description of what you changed and why>

If you cannot fix the bug, end with:
BUG_UNFIXED: <explanation of what you tried and why you could not resolve it>"""


def build_fix_prompt(bug_report: str, test_command: str | None = None) -> str:
    parts = [f"## Bug Report\n\n{bug_report}"]
    if test_command:
        parts.append(f"## Test Command\n\nRun this command to reproduce the failure:\n```\n{test_command}\n```")
    parts.append(
        "\nStart by running the failing tests to see the exact error, "
        "then explore the relevant code to identify and fix the root cause."
    )
    return "\n\n".join(parts)

Step 2: The Bug Fixer Agent

python
# bug_fixer/agent.py
import re
import subprocess
from pathlib import Path
from dataclasses import dataclass, field
import anthropic

from agent.tools import TOOLS
from agent.executor import ToolExecutor
from bug_fixer.prompts import BUG_FIXER_SYSTEM, build_fix_prompt


@dataclass
class BugFixResult:
    success: bool
    description: str
    iterations: int
    files_changed: list[str] = field(default_factory=list)
    diff: str = ""
    error: str = ""


class BugFixerAgent:
    def __init__(
        self,
        project_root: str,
        model: str = "claude-sonnet-4-6",
        max_iterations: int = 20,
        verbose: bool = True,
    ):
        self.root = Path(project_root).resolve()
        self.client = anthropic.Anthropic()
        self.executor = ToolExecutor(project_root)
        self.model = model
        self.max_iterations = max_iterations
        self.verbose = verbose

    def _log(self, msg: str) -> None:
        if self.verbose:
            print(msg)

    def _snapshot(self) -> dict[str, str]:
        """Capture current state of tracked source files for diff generation."""
        snapshot = {}
        for ext in [".py", ".js", ".ts", ".go", ".java", ".rb", ".rs"]:
            for path in self.root.rglob(f"*{ext}"):
                parts = path.parts
                if any(p in parts for p in [".git", "node_modules", "__pycache__", ".venv", "venv"]):
                    continue
                try:
                    snapshot[str(path.relative_to(self.root))] = path.read_text(encoding="utf-8")
                except Exception:
                    pass
        return snapshot

    def _compute_diff(self, before: dict[str, str], after: dict[str, str]) -> tuple[list[str], str]:
        """Compute which files changed and produce a simple unified diff summary."""
        changed = []
        diff_lines = []

        for path, after_content in after.items():
            before_content = before.get(path, "")
            if before_content != after_content:
                changed.append(path)
                diff_lines.append(f"--- a/{path}")
                diff_lines.append(f"+++ b/{path}")
                b_lines = before_content.splitlines()
                a_lines = after_content.splitlines()
                for b, a in zip(b_lines, a_lines):
                    if b != a:
                        diff_lines.append(f"- {b}")
                        diff_lines.append(f"+ {a}")

        # New files created by the agent
        for path in set(after.keys()) - set(before.keys()):
            changed.append(path)
            diff_lines.append(f"NEW FILE: {path}")

        return changed, "\n".join(diff_lines)

    def fix(self, bug_report: str, test_command: str | None = None) -> BugFixResult:
        """
        Run the bug fixer on a bug report.
        Returns a BugFixResult with success status, diff, and description.
        """
        self._log(f"\n{'='*60}")
        self._log(f"BUG REPORT: {bug_report[:100]}...")
        self._log(f"{'='*60}\n")

        # Snapshot before state
        before_snapshot = self._snapshot()

        messages = [
            {"role": "user", "content": build_fix_prompt(bug_report, test_command)}
        ]

        iteration = 0
        final_text = ""

        while iteration < self.max_iterations:
            iteration += 1
            self._log(f"[Iteration {iteration}/{self.max_iterations}]")

            response = self.client.messages.create(
                model=self.model,
                max_tokens=8096,
                system=BUG_FIXER_SYSTEM,
                tools=TOOLS,
                messages=messages,
            )

            messages.append({"role": "assistant", "content": response.content})

            # Extract any text blocks for logging
            for block in response.content:
                if hasattr(block, "text") and block.text.strip():
                    self._log(f"[Claude] {block.text[:300]}")

            # Check for termination signals in text blocks
            for block in response.content:
                if hasattr(block, "text"):
                    if "BUG_FIXED:" in block.text or "BUG_UNFIXED:" in block.text:
                        final_text = block.text

            if response.stop_reason == "end_turn":
                break

            # Execute tool calls
            if response.stop_reason == "tool_use":
                tool_results = []
                for block in response.content:
                    if block.type != "tool_use":
                        continue
                    self._log(f"  [Tool] {block.name}({str(block.input)[:100]})")
                    result = self.executor.execute(block.name, block.input)
                    if len(result) > 6000:
                        result = result[:6000] + "\n...[truncated]"
                    self._log(f"  [Result] {result[:150]}{'...' if len(result) > 150 else ''}")
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })
                messages.append({"role": "user", "content": tool_results})

        # Snapshot after state and compute diff
        after_snapshot = self._snapshot()
        files_changed, diff = self._compute_diff(before_snapshot, after_snapshot)

        # Determine success
        success = "BUG_FIXED:" in final_text
        description = ""
        if success:
            match = re.search(r"BUG_FIXED:\s*(.+)", final_text)
            description = match.group(1).strip() if match else "Bug fixed"
        else:
            match = re.search(r"BUG_UNFIXED:\s*(.+)", final_text, re.DOTALL)
            description = match.group(1).strip()[:500] if match else "Agent did not resolve the bug"

        self._log(f"\n{'='*60}")
        self._log(f"RESULT: {'SUCCESS' if success else 'FAILED'}")
        self._log(f"FILES CHANGED: {files_changed}")
        self._log(f"DESCRIPTION: {description}")
        self._log(f"{'='*60}\n")

        return BugFixResult(
            success=success,
            description=description,
            iterations=iteration,
            files_changed=files_changed,
            diff=diff,
        )

Step 3: Create a Test Codebase with Bugs

Let's set up a realistic project with multiple bugs to fix:

python
# setup_test_project.py
from pathlib import Path

root = Path("./buggy_project")
root.mkdir(exist_ok=True)

# Source file with several bugs
(root / "user_service.py").write_text('''
import hashlib
from datetime import datetime


class UserService:
    def __init__(self):
        self.users = {}

    def create_user(self, username: str, email: str, password: str) -> dict:
        """Create a new user account."""
        if username in self.users:
            raise ValueError(f"Username {username} already exists")

        # Bug 1: MD5 is not suitable for password hashing, but that's a quality issue.
        # Actual bug: should hash the password, not store it plain
        user = {
            "username": username,
            "email": email,
            "password": password,  # Bug: storing plain text password
            "created_at": datetime.utcnow().isoformat(),
            "active": True,
        }
        self.users[username] = user
        return {"username": username, "email": email, "created_at": user["created_at"]}

    def authenticate(self, username: str, password: str) -> bool:
        """Check if username/password is correct."""
        if username not in self.users:
            return False
        return self.users[username]["password"] == password

    def get_user(self, username: str) -> dict | None:
        """Get a user by username. Returns None if not found."""
        return self.users.get(username)

    def deactivate_user(self, username: str) -> None:
        """Deactivate a user account."""
        if username not in self.users:
            raise KeyError(f"User {username} not found")
        self.users[username]["active"] = Flase  # Bug 2: typo — Flase instead of False

    def get_active_users(self) -> list[str]:
        """Return list of active usernames."""
        # Bug 3: returns all users, not just active ones
        return [u for u in self.users]
''', encoding="utf-8")

# Test file
(root / "test_user_service.py").write_text('''
import pytest
from user_service import UserService


@pytest.fixture
def service():
    return UserService()


def test_create_user(service):
    result = service.create_user("alice", "alice@example.com", "secret123")
    assert result["username"] == "alice"
    assert result["email"] == "alice@example.com"
    assert "password" not in result  # password must not be in return value

def test_duplicate_user_raises(service):
    service.create_user("alice", "alice@example.com", "secret")
    with pytest.raises(ValueError):
        service.create_user("alice", "other@example.com", "other")

def test_authenticate_correct_password(service):
    service.create_user("alice", "alice@example.com", "secret123")
    assert service.authenticate("alice", "secret123") is True

def test_authenticate_wrong_password(service):
    service.create_user("alice", "alice@example.com", "secret123")
    assert service.authenticate("alice", "wrongpassword") is False

def test_authenticate_unknown_user(service):
    assert service.authenticate("nobody", "anything") is False

def test_deactivate_user(service):
    service.create_user("alice", "alice@example.com", "secret")
    service.deactivate_user("alice")
    user = service.get_user("alice")
    assert user["active"] is False   # Tests the typo bug

def test_deactivate_nonexistent_raises(service):
    with pytest.raises(KeyError):
        service.deactivate_user("nobody")

def test_get_active_users(service):
    service.create_user("alice", "alice@example.com", "secret")
    service.create_user("bob", "bob@example.com", "secret")
    service.deactivate_user("bob")
    active = service.get_active_users()
    assert "alice" in active
    assert "bob" not in active   # Tests the filtering bug
''', encoding="utf-8")

print("Test project created in ./buggy_project/")
print("Run: cd buggy_project && pytest -v  to see failures")

Run this to create the project, then see the failures:

bash
python setup_test_project.py
cd buggy_project
pytest -v

You should see failures on test_deactivate_user (typo bug) and test_get_active_users (filtering bug).


Step 4: Run the Bug Fixer

python
# run_bug_fixer.py
from bug_fixer.agent import BugFixerAgent

agent = BugFixerAgent(project_root="./buggy_project", verbose=True)

result = agent.fix(
    bug_report=(
        "Two tests are failing in test_user_service.py:\n"
        "1. test_deactivate_user — deactivating a user does not set active to False\n"
        "2. test_get_active_users — get_active_users returns ALL users, not just active ones\n\n"
        "Fix both bugs in user_service.py. Do not modify the tests."
    ),
    test_command="pytest test_user_service.py -v"
)

print(f"\nSuccess: {result.success}")
print(f"Files changed: {result.files_changed}")
print(f"Iterations: {result.iterations}")
print(f"Description: {result.description}")
if result.diff:
    print(f"\nDiff:\n{result.diff}")

Expected agent behaviour:

text
[Iteration 1]
[Tool] run_command({'command': 'pytest test_user_service.py -v'})
[Result] FAILED test_user_service.py::test_deactivate_user
         FAILED test_user_service.py::test_get_active_users
         ...NameError: name 'Flase' is not defined...

[Iteration 2]
[Tool] read_file({'path': 'user_service.py'})
[Result] ...def deactivate_user... self.users[username]["active"] = Flase...
         ...def get_active_users... return [u for u in self.users]...

[Iteration 3]
[Tool] edit_file({'path': 'user_service.py', 'old_string': '= Flase', 'new_string': '= False'})
[Tool] edit_file({'path': 'user_service.py',
    'old_string': 'return [u for u in self.users]',
    'new_string': 'return [u for u, data in self.users.items() if data["active"]]'})

[Iteration 4]
[Tool] run_command({'command': 'pytest test_user_service.py -v'})
[Result] 8 passed in 0.08s

[Iteration 5]
[Tool] run_command({'command': 'pytest -v'})
[Result] 8 passed in 0.08s
EXIT CODE: 0

BUG_FIXED: Fixed NameError typo 'Flase' → 'False' in deactivate_user,
and fixed get_active_users to filter by active=True using dict.items()

============================================================
RESULT: SUCCESS
FILES CHANGED: ['user_service.py']

Step 5: Handling Edge Cases

Bug Cannot Be Reproduced

Sometimes a bug report is vague. The agent handles this gracefully because it runs the tests first:

python
result = agent.fix(
    bug_report="Something is wrong with the user service. Users can't log in.",
    test_command="pytest test_user_service.py::test_authenticate_correct_password -v"
)

If the test passes, Claude will report it cannot reproduce the bug and describe what it checked.

Multiple Related Bugs in Different Files

python
result = agent.fix(
    bug_report=(
        "pytest tests/ is reporting 3 failures across auth.py and validation.py. "
        "The error messages are in the test output."
    ),
    test_command="pytest tests/ -v --tb=short"
)

The agent will explore both files, identify related issues, and fix them in a single session.

No Test — Error Description Only

python
result = agent.fix(
    bug_report=(
        "Running the app with `python app.py` produces this traceback:\n\n"
        "AttributeError: 'NoneType' object has no attribute 'split'\n"
        "File 'app.py', line 47, in parse_config\n"
        "  parts = config_value.split(',')\n\n"
        "This happens when the CONFIG_VALUE environment variable is not set."
    ),
    test_command="python app.py"
)

The agent will read app.py, find the relevant code, add a None check, and verify the fix.


Integrating with GitHub Issues

To automatically fix bugs reported as GitHub issues:

python
# github_bug_fixer.py
import os
from github import Github
from bug_fixer.agent import BugFixerAgent

def fix_github_issue(repo_name: str, issue_number: int, project_root: str) -> None:
    gh = Github(os.environ["GITHUB_TOKEN"])
    repo = gh.get_repo(repo_name)
    issue = repo.get_issue(issue_number)

    # Build bug report from issue
    bug_report = f"Title: {issue.title}\n\n{issue.body or '(no description)'}"

    # Run the fixer
    agent = BugFixerAgent(project_root=project_root)
    result = agent.fix(bug_report=bug_report)

    # Post result as issue comment
    if result.success:
        comment = (
            f"🤖 **AI Bug Fixer**: Fixed in {result.iterations} iterations.\n\n"
            f"**Change:** {result.description}\n\n"
            f"**Files modified:** {', '.join(result.files_changed)}\n\n"
            f"A PR has been opened with the fix."
        )
    else:
        comment = (
            f"🤖 **AI Bug Fixer**: Could not automatically fix this issue after "
            f"{result.iterations} iterations.\n\n"
            f"**What was tried:** {result.description}\n\n"
            f"This bug requires human investigation."
        )

    issue.create_comment(comment)

Key Takeaways

  • A bug fixer agent differs from a general agent in having a measurable success condition — tests passing
  • Always reproduce first: running the failing test before any edits anchors every subsequent decision in evidence
  • Minimal changes are the key constraint — agents that over-fix introduce regressions. The system prompt must enforce this explicitly
  • Termination signals (BUG_FIXED: / BUG_UNFIXED:) give the orchestrator a reliable way to parse the outcome without LLM parsing of free-form text
  • Regression testing after the targeted fix is non-optional — agents can fix one thing and break another
  • Diff generation lets you review exactly what the agent changed before merging to production

What's Next in the AI Coding Agents Series

  1. What Are AI Coding Agents?
  2. AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
  3. Build Your First AI Coding Agent with the Claude API
  4. Build an Automated GitHub PR Review Agent
  5. Build an Autonomous Bug Fixer Agent ← you are here
  6. AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production

This post is part of the AI Coding Agents Series. Previous post: Build an Automated GitHub PR Review Agent.

To integrate this bug fixer agent into your CI/CD pipeline, see AI Coding Agents in CI/CD: Automate Reviews and Bug Fixes. For the agentic loop fundamentals, see Claude Agentic Loop Explained and Claude Tool Use Explained.

External Resources

  • pytest documentation — the test framework used throughout this project for reproducing and verifying bug fixes.
  • Python subprocess module — official docs for the subprocess calls used in the tool executor's run_command method.