Artificial IntelligenceSoftware DevelopmentDevOps

AI Coding Agents in CI/CD: Automate Code Reviews and Bug Fixes in Production

TT
TopicTrick
AI Coding Agents in CI/CD: Automate Code Reviews and Bug Fixes in Production

Running an AI coding agent locally is impressive. Running it reliably in production — triggered automatically by CI/CD events, bounded by cost controls, gated by human approval where appropriate, and observable when something goes wrong — is a different engineering challenge.

This post is about that second step. You have built the agents (PR reviewer, bug fixer, general coding agent). Now you will integrate them into a production CI/CD pipeline with the safety and observability patterns that make autonomous agents trustworthy.

This post assumes you have built the agents from the previous posts in this series: PR Review Agent and Bug Fixer Agent. We are orchestrating those agents here, not rebuilding them.


The Target Architecture

The complete CI/CD integration has four automated workflows:

┌─────────────────────────────────────────────────────────────────────┐ │ Developer pushes code │ └───────────────────────────────┬─────────────────────────────────────┘ │ ┌─────────────────▼──────────────────┐ │ GitHub Actions: PR opened/updated │ └──────────┬─────────────────────────┘ │ ┌───────────────▼──────────────────────────┐ │ 1. AI PR Review Agent │ │ Posts review, requests changes │ └───────────────┬──────────────────────────┘ │ (if: review passes) │ ┌───────────────▼──────────────────────────┐ │ 2. CI Test Suite runs │ │ (your existing tests) │ └───────────────┬──────────────────────────┘ │ ┌───────────────▼──────────────────────────┐ │ 3. Auto-fix Agent (if tests fail) │ │ Attempts fix, opens new commit │ └───────────────┬──────────────────────────┘ │ (human approval gate) │ ┌───────────────▼──────────────────────────┐ │ 4. Nightly maintenance agent │ │ Runs on schedule: fix flaky tests, │ │ update deps, clean tech debt │ └──────────────────────────────────────────┘

Each workflow is a separate GitHub Actions job. They are connected by outputs, conditions, and manual approval gates where warranted.


Workflow 1: Automated PR Review

yaml
1# .github/workflows/ai-pr-review.yml 2name: AI PR Review 3 4on: 5 pull_request: 6 types: [opened, synchronize, ready_for_review] 7 8concurrency: 9 group: ai-review-${{ github.event.pull_request.number }} 10 cancel-in-progress: true # Cancel previous review if PR is updated 11 12jobs: 13 ai-review: 14 runs-on: ubuntu-latest 15 # Security: only run on PRs from the same repo (not forks) 16 if: | 17 github.event.pull_request.draft == false && 18 github.event.pull_request.head.repo.full_name == github.repository 19 timeout-minutes: 10 20 21 outputs: 22 verdict: ${{ steps.review.outputs.verdict }} 23 issues_count: ${{ steps.review.outputs.issues_count }} 24 25 steps: 26 - uses: actions/checkout@v4 27 with: 28 fetch-depth: 0 29 30 - uses: actions/setup-python@v5 31 with: 32 python-version: "3.11" 33 34 - name: Install dependencies 35 run: pip install anthropic PyGithub 36 37 - name: Run AI review 38 id: review 39 env: 40 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} 41 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 42 PR_NUMBER: ${{ github.event.pull_request.number }} 43 REPO_NAME: ${{ github.repository }} 44 run: | 45 python - <<'EOF' 46 import os, json 47 from pr_reviewer.agent import PRReviewAgent 48 49 agent = PRReviewAgent( 50 github_token=os.environ["GITHUB_TOKEN"], 51 repo_name=os.environ["REPO_NAME"], 52 ) 53 review = agent.review_pr( 54 int(os.environ["PR_NUMBER"]), 55 post_to_github=True 56 ) 57 58 # Export outputs for downstream jobs 59 verdict = review.get("verdict", "COMMENT") 60 issues = [i for i in review.get("issues", []) if i["severity"] in ("critical", "major")] 61 print(f"verdict={verdict}") 62 print(f"issues_count={len(issues)}") 63 64 with open(os.environ["GITHUB_OUTPUT"], "a") as f: 65 f.write(f"verdict={verdict}\n") 66 f.write(f"issues_count={len(issues)}\n") 67 EOF 68 69 - name: Block merge on critical issues 70 if: steps.review.outputs.verdict == 'REQUEST_CHANGES' 71 run: | 72 echo "AI review requested changes. Merge blocked until issues are resolved." 73 exit 1

Workflow 2: Auto-Fix on Test Failure

This workflow triggers when tests fail on a PR and attempts an automatic fix:

yaml
1# .github/workflows/ai-auto-fix.yml 2name: AI Auto-Fix 3 4on: 5 workflow_run: 6 workflows: ["CI Tests"] # Name of your test workflow 7 types: [completed] 8 # Also allow manual trigger 9 workflow_dispatch: 10 inputs: 11 pr_number: 12 description: "PR number to attempt auto-fix on" 13 required: true 14 15jobs: 16 attempt-fix: 17 runs-on: ubuntu-latest 18 # Only run when tests fail on a PR branch (not main) 19 if: | 20 github.event.workflow_run.conclusion == 'failure' && 21 github.event.workflow_run.head_branch != 'main' && 22 github.event.workflow_run.head_branch != 'master' 23 timeout-minutes: 20 24 25 steps: 26 - uses: actions/checkout@v4 27 with: 28 ref: ${{ github.event.workflow_run.head_sha }} 29 token: ${{ secrets.GITHUB_TOKEN }} 30 31 - uses: actions/setup-python@v5 32 with: 33 python-version: "3.11" 34 35 - name: Install dependencies 36 run: pip install anthropic pytest 37 38 - name: Configure git 39 run: | 40 git config user.name "AI Bug Fixer Bot" 41 git config user.email "ai-bot@yourcompany.com" 42 43 - name: Run auto-fix agent 44 id: autofix 45 env: 46 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} 47 run: | 48 python - <<'EOF' 49 import os, json 50 from bug_fixer.agent import BugFixerAgent 51 52 # Get the test failure output 53 # In production, parse this from the CI test output artifact 54 bug_report = ( 55 "The CI test suite failed on this PR. " 56 "Run 'pytest -v --tb=short' to see the failing tests. " 57 "Fix the failing tests without modifying the test files. " 58 "Run the full test suite to confirm no regressions." 59 ) 60 61 agent = BugFixerAgent( 62 project_root=".", 63 max_iterations=15 64 ) 65 result = agent.fix( 66 bug_report=bug_report, 67 test_command="pytest -v --tb=short" 68 ) 69 70 # Write result to file for next step 71 with open("fix_result.json", "w") as f: 72 json.dump({ 73 "success": result.success, 74 "description": result.description, 75 "files_changed": result.files_changed, 76 "iterations": result.iterations, 77 }, f) 78 79 if result.success: 80 print("Auto-fix succeeded") 81 exit(0) 82 else: 83 print(f"Auto-fix failed: {result.description}") 84 exit(1) 85 EOF 86 87 - name: Commit and push fix 88 if: steps.autofix.outcome == 'success' 89 run: | 90 # Only commit if there are changes 91 if git diff --quiet; then 92 echo "No file changes to commit" 93 exit 1 94 fi 95 96 RESULT=$(cat fix_result.json | python3 -c "import sys,json; print(json.load(sys.stdin)['description'])") 97 git add -A 98 git commit -m "fix: AI auto-fix — $RESULT [skip ci]" 99 git push 100 101 - name: Post fix comment to PR 102 if: always() 103 env: 104 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 105 REPO_NAME: ${{ github.repository }} 106 run: | 107 python - <<'EOF' 108 import os, json 109 from github import Github 110 111 with open("fix_result.json") as f: 112 result = json.load(f) 113 114 gh = Github(os.environ["GITHUB_TOKEN"]) 115 repo = gh.get_repo(os.environ["REPO_NAME"]) 116 117 # Find the PR for this branch 118 head_sha = "${{ github.event.workflow_run.head_sha }}" 119 prs = repo.get_pulls(state="open") 120 pr = next((p for p in prs if p.head.sha == head_sha), None) 121 122 if pr: 123 if result["success"]: 124 body = ( 125 f"🤖 **Auto-Fix Applied**\n\n" 126 f"The CI failure was automatically fixed in {result['iterations']} iterations.\n\n" 127 f"**Change:** {result['description']}\n\n" 128 f"**Files modified:** {', '.join(result['files_changed'])}\n\n" 129 f"Please review the commit and re-run CI to confirm." 130 ) 131 else: 132 body = ( 133 f"🤖 **Auto-Fix Failed**\n\n" 134 f"The AI bug fixer could not automatically resolve the CI failure " 135 f"after {result['iterations']} iterations.\n\n" 136 f"**What was tried:** {result['description'][:500]}\n\n" 137 f"Manual investigation is required." 138 ) 139 pr.get_issue().create_comment(body) 140 EOF

Auto-Push Safety

Automatically pushing AI-generated commits to PR branches is powerful but carries risk. Consider requiring a human approval step (GitHub Environments with required reviewers) before pushing to the branch. Use '[skip ci]' in auto-fix commit messages to prevent infinite CI loops. Never auto-push directly to main or master.


    Workflow 3: Human-in-the-Loop Gate

    For higher-stakes operations, require explicit human approval before the agent acts:

    yaml
    1# .github/workflows/ai-dependency-update.yml 2name: AI Dependency Update 3 4on: 5 schedule: 6 - cron: "0 9 * * 1" # Every Monday at 9am UTC 7 workflow_dispatch: 8 9jobs: 10 generate-update-plan: 11 runs-on: ubuntu-latest 12 environment: ai-actions # <- GitHub Environment with required reviewers 13 # This job pauses until a reviewer approves it in the GitHub Actions UI 14 steps: 15 - uses: actions/checkout@v4 16 - uses: actions/setup-python@v5 17 with: 18 python-version: "3.11" 19 - name: Run dependency update agent 20 env: 21 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} 22 run: | 23 python run_dependency_agent.py

    To set up the environment gate:

    1. Go to repository Settings → Environments → New environment
    2. Name it ai-actions
    3. Add Required reviewers — your senior engineers
    4. Any job using environment: ai-actions will pause and send a notification requesting approval

    Workflow 4: Nightly Maintenance Agent

    A scheduled agent that handles low-risk maintenance tasks overnight:

    yaml
    1# .github/workflows/ai-nightly-maintenance.yml 2name: AI Nightly Maintenance 3 4on: 5 schedule: 6 - cron: "0 2 * * *" # 2am UTC every night 7 8jobs: 9 maintenance: 10 runs-on: ubuntu-latest 11 timeout-minutes: 30 12 13 strategy: 14 matrix: 15 task: 16 - name: "fix-flaky-tests" 17 prompt: "Find and fix any flaky tests (tests that sometimes pass and sometimes fail without code changes). Run the test suite 3 times and identify tests with inconsistent results." 18 - name: "update-type-hints" 19 prompt: "Add missing Python type hints to functions that lack them in the src/ directory. Focus on public API functions. Run mypy to verify no type errors are introduced." 20 - name: "remove-dead-code" 21 prompt: "Identify and remove unused imports and obviously dead code (unreachable code, unused variables). Run tests to verify nothing breaks." 22 23 steps: 24 - uses: actions/checkout@v4 25 - uses: actions/setup-python@v5 26 with: 27 python-version: "3.11" 28 - name: Install dependencies 29 run: pip install anthropic pytest mypy 30 - name: Run maintenance task 31 env: 32 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} 33 TASK_NAME: ${{ matrix.task.name }} 34 TASK_PROMPT: ${{ matrix.task.prompt }} 35 run: | 36 python run_maintenance_agent.py 37 - name: Open PR if changes were made 38 run: | 39 if ! git diff --quiet; then 40 git config user.name "AI Maintenance Bot" 41 git config user.email "ai-bot@yourcompany.com" 42 git checkout -b "ai-maintenance/$TASK_NAME-$(date +%Y%m%d)" 43 git add -A 44 git commit -m "chore(ai): $TASK_NAME maintenance" 45 git push origin HEAD 46 gh pr create \ 47 --title "AI Maintenance: $TASK_NAME" \ 48 --body "Automated maintenance task. Please review before merging." \ 49 --label "ai-generated" 50 fi 51 env: 52 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 53 TASK_NAME: ${{ matrix.task.name }}

    Cost Controls

    Without controls, autonomous agents can run up significant API costs. Three layers of protection:

    1. Per-Run Token Budget

    python
    1# Wrap the Anthropic client with token tracking 2class BudgetedClient: 3 def __init__(self, max_tokens_per_run: int = 100_000): 4 self.client = anthropic.Anthropic() 5 self.tokens_used = 0 6 self.max_tokens = max_tokens_per_run 7 8 def messages_create(self, **kwargs) -> anthropic.types.Message: 9 if self.tokens_used >= self.max_tokens: 10 raise RuntimeError( 11 f"Token budget exceeded: used {self.tokens_used}/{self.max_tokens}" 12 ) 13 response = self.client.messages.create(**kwargs) 14 self.tokens_used += response.usage.input_tokens + response.usage.output_tokens 15 return response 16 17 @property 18 def estimated_cost_usd(self) -> float: 19 # claude-sonnet-4-6 pricing: $3/M input, $15/M output (approximate) 20 return self.tokens_used * 0.000009 # conservative average

    2. Iteration Limits

    Always set max_iterations conservatively. Most tasks complete in 5–10 iterations. An agent still running at iteration 25 has either hit an edge case or is looping — either way, stopping it is the right call.

    python
    1agent = BugFixerAgent(max_iterations=15) # Stop after 15 iterations

    3. GitHub Actions Monthly Spending Limit

    In GitHub Settings → Billing, set a spending limit for Actions. This caps compute costs regardless of how many workflows run.


    Observability: Tracking Agent Activity

    In production, you need to know what your agents did, how long they ran, and how much they cost.

    python
    1# observability/agent_logger.py 2import json 3import os 4from datetime import datetime 5from pathlib import Path 6 7 8class AgentLogger: 9 """ 10 Logs agent runs to a JSONL file for auditing and cost tracking. 11 Each line is one agent run. 12 """ 13 14 def __init__(self, log_path: str = "./agent_runs.jsonl"): 15 self.log_path = Path(log_path) 16 17 def log_run( 18 self, 19 agent_type: str, 20 task: str, 21 success: bool, 22 iterations: int, 23 tokens_used: int, 24 files_changed: list[str], 25 error: str = "", 26 ) -> None: 27 entry = { 28 "timestamp": datetime.utcnow().isoformat(), 29 "agent_type": agent_type, 30 "task": task[:200], 31 "success": success, 32 "iterations": iterations, 33 "tokens_used": tokens_used, 34 "estimated_cost_usd": round(tokens_used * 0.000009, 4), 35 "files_changed": files_changed, 36 "error": error[:500] if error else "", 37 "github_run_id": os.environ.get("GITHUB_RUN_ID", "local"), 38 "github_actor": os.environ.get("GITHUB_ACTOR", ""), 39 } 40 41 with open(self.log_path, "a", encoding="utf-8") as f: 42 f.write(json.dumps(entry) + "\n") 43 44 def summary(self, last_n_runs: int = 50) -> dict: 45 """Return aggregated metrics for the last N runs.""" 46 if not self.log_path.exists(): 47 return {} 48 49 lines = self.log_path.read_text().strip().splitlines() 50 runs = [json.loads(line) for line in lines[-last_n_runs:]] 51 52 return { 53 "total_runs": len(runs), 54 "success_rate": sum(r["success"] for r in runs) / len(runs), 55 "avg_iterations": sum(r["iterations"] for r in runs) / len(runs), 56 "total_tokens": sum(r["tokens_used"] for r in runs), 57 "total_cost_usd": sum(r["estimated_cost_usd"] for r in runs), 58 "most_changed_files": _top_files(runs), 59 } 60 61 62def _top_files(runs: list[dict]) -> list[str]: 63 from collections import Counter 64 counter = Counter() 65 for run in runs: 66 for f in run.get("files_changed", []): 67 counter[f] += 1 68 return [f for f, _ in counter.most_common(10)]

    Production Safety Checklist

    Before enabling autonomous AI agents in your CI/CD pipeline, verify each item:

    • Fork PR protection: workflows check that the PR is from the same repo, not a fork
    • Branch protection: agents cannot push directly to main or master
    • Iteration limits: every agent has a max_iterations ceiling
    • Token budget: per-run token limit prevents runaway costs
    • Path traversal protection: file tool restricts all operations to the project directory
    • Command blocklist: shell tool blocks destructive commands (rm -rf, sudo)
    • Skip-CI commits: auto-fix commits include [skip ci] to prevent CI loops
    • Human approval gates: high-stakes actions route through GitHub Environments with required reviewers
    • Agent run logging: every run is logged with token count, outcome, and files changed
    • Secrets are secrets: .env files are in .gitignore; secrets are in GitHub Secrets, not hardcoded
    • Timeout on all jobs: every GitHub Actions job has a timeout-minutes to cap runaway compute

    The Developer Experience

    When this pipeline is running in production, the developer experience looks like this:

    1. Developer opens a PR
    2. Within 2–3 minutes: AI review appears as a PR review comment with structured feedback
    3. If the review requests changes, the PR is blocked until addressed
    4. Developer pushes fixes, CI runs, tests pass
    5. If tests fail: an auto-fix attempt is made within 5 minutes; if successful, a fix commit appears; if not, a comment explains what the agent tried
    6. PR is approved and merged
    7. Overnight: maintenance agents run on a schedule, keeping the codebase clean

    Human reviewers are still needed for architecture, security decisions, and the AI review output itself — but all of the first-pass mechanical work is handled automatically.


    Key Takeaways

    • CI/CD integration turns one-off agent scripts into continuous, automated engineering assistance
    • GitHub Actions provides the orchestration layer: events, outputs, conditions, concurrency controls, and secrets management
    • Human approval gates via GitHub Environments are the right tool for higher-stakes agent actions
    • Cost control requires three layers: per-run token budgets, iteration limits, and platform-level spending caps
    • Observability — structured logging of every agent run — is what lets you confidently expand agent autonomy over time as you verify reliability
    • Start narrow: deploy the PR review agent first (read-only, low risk), measure quality, then gradually expand to auto-fix and maintenance

    AI Coding Agents Series — Complete

    You have now completed the full AI Coding Agents Series:

    1. What Are AI Coding Agents?
    2. AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
    3. Build Your First AI Coding Agent with the Claude API
    4. Build an Automated GitHub PR Review Agent
    5. Build an Autonomous Bug Fixer Agent
    6. AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production ← you are here

    For more agent architecture patterns, see the Anthropic AI Tutorial Series — particularly the posts on AI Agents and Model Context Protocol.