AI Coding Agents in CI/CD: Automate Code Reviews and Bug Fixes in Production

Running an AI coding agent locally is impressive. Running it reliably in production — triggered automatically by CI/CD events, bounded by cost controls, gated by human approval where appropriate, and observable when something goes wrong — is a different engineering challenge.
This post is about that second step. You have built the agents (PR reviewer, bug fixer, general coding agent). Now you will integrate them into a production CI/CD pipeline with the safety and observability patterns that make autonomous agents trustworthy.
This post assumes you have built the agents from the previous posts in this series: PR Review Agent and Bug Fixer Agent. We are orchestrating those agents here, not rebuilding them.
The Target Architecture
The complete CI/CD integration has four automated workflows:
┌─────────────────────────────────────────────────────────────────────┐
│ Developer pushes code │
└───────────────────────────────┬─────────────────────────────────────┘
│
┌─────────────────▼──────────────────┐
│ GitHub Actions: PR opened/updated │
└──────────┬─────────────────────────┘
│
┌───────────────▼──────────────────────────┐
│ 1. AI PR Review Agent │
│ Posts review, requests changes │
└───────────────┬──────────────────────────┘
│
(if: review passes)
│
┌───────────────▼──────────────────────────┐
│ 2. CI Test Suite runs │
│ (your existing tests) │
└───────────────┬──────────────────────────┘
│
┌───────────────▼──────────────────────────┐
│ 3. Auto-fix Agent (if tests fail) │
│ Attempts fix, opens new commit │
└───────────────┬──────────────────────────┘
│
(human approval gate)
│
┌───────────────▼──────────────────────────┐
│ 4. Nightly maintenance agent │
│ Runs on schedule: fix flaky tests, │
│ update deps, clean tech debt │
└──────────────────────────────────────────┘
Each workflow is a separate GitHub Actions job. They are connected by outputs, conditions, and manual approval gates where warranted.
Workflow 1: Automated PR Review
1# .github/workflows/ai-pr-review.yml
2name: AI PR Review
3
4on:
5 pull_request:
6 types: [opened, synchronize, ready_for_review]
7
8concurrency:
9 group: ai-review-${{ github.event.pull_request.number }}
10 cancel-in-progress: true # Cancel previous review if PR is updated
11
12jobs:
13 ai-review:
14 runs-on: ubuntu-latest
15 # Security: only run on PRs from the same repo (not forks)
16 if: |
17 github.event.pull_request.draft == false &&
18 github.event.pull_request.head.repo.full_name == github.repository
19 timeout-minutes: 10
20
21 outputs:
22 verdict: ${{ steps.review.outputs.verdict }}
23 issues_count: ${{ steps.review.outputs.issues_count }}
24
25 steps:
26 - uses: actions/checkout@v4
27 with:
28 fetch-depth: 0
29
30 - uses: actions/setup-python@v5
31 with:
32 python-version: "3.11"
33
34 - name: Install dependencies
35 run: pip install anthropic PyGithub
36
37 - name: Run AI review
38 id: review
39 env:
40 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
41 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
42 PR_NUMBER: ${{ github.event.pull_request.number }}
43 REPO_NAME: ${{ github.repository }}
44 run: |
45 python - <<'EOF'
46 import os, json
47 from pr_reviewer.agent import PRReviewAgent
48
49 agent = PRReviewAgent(
50 github_token=os.environ["GITHUB_TOKEN"],
51 repo_name=os.environ["REPO_NAME"],
52 )
53 review = agent.review_pr(
54 int(os.environ["PR_NUMBER"]),
55 post_to_github=True
56 )
57
58 # Export outputs for downstream jobs
59 verdict = review.get("verdict", "COMMENT")
60 issues = [i for i in review.get("issues", []) if i["severity"] in ("critical", "major")]
61 print(f"verdict={verdict}")
62 print(f"issues_count={len(issues)}")
63
64 with open(os.environ["GITHUB_OUTPUT"], "a") as f:
65 f.write(f"verdict={verdict}\n")
66 f.write(f"issues_count={len(issues)}\n")
67 EOF
68
69 - name: Block merge on critical issues
70 if: steps.review.outputs.verdict == 'REQUEST_CHANGES'
71 run: |
72 echo "AI review requested changes. Merge blocked until issues are resolved."
73 exit 1Workflow 2: Auto-Fix on Test Failure
This workflow triggers when tests fail on a PR and attempts an automatic fix:
1# .github/workflows/ai-auto-fix.yml
2name: AI Auto-Fix
3
4on:
5 workflow_run:
6 workflows: ["CI Tests"] # Name of your test workflow
7 types: [completed]
8 # Also allow manual trigger
9 workflow_dispatch:
10 inputs:
11 pr_number:
12 description: "PR number to attempt auto-fix on"
13 required: true
14
15jobs:
16 attempt-fix:
17 runs-on: ubuntu-latest
18 # Only run when tests fail on a PR branch (not main)
19 if: |
20 github.event.workflow_run.conclusion == 'failure' &&
21 github.event.workflow_run.head_branch != 'main' &&
22 github.event.workflow_run.head_branch != 'master'
23 timeout-minutes: 20
24
25 steps:
26 - uses: actions/checkout@v4
27 with:
28 ref: ${{ github.event.workflow_run.head_sha }}
29 token: ${{ secrets.GITHUB_TOKEN }}
30
31 - uses: actions/setup-python@v5
32 with:
33 python-version: "3.11"
34
35 - name: Install dependencies
36 run: pip install anthropic pytest
37
38 - name: Configure git
39 run: |
40 git config user.name "AI Bug Fixer Bot"
41 git config user.email "ai-bot@yourcompany.com"
42
43 - name: Run auto-fix agent
44 id: autofix
45 env:
46 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
47 run: |
48 python - <<'EOF'
49 import os, json
50 from bug_fixer.agent import BugFixerAgent
51
52 # Get the test failure output
53 # In production, parse this from the CI test output artifact
54 bug_report = (
55 "The CI test suite failed on this PR. "
56 "Run 'pytest -v --tb=short' to see the failing tests. "
57 "Fix the failing tests without modifying the test files. "
58 "Run the full test suite to confirm no regressions."
59 )
60
61 agent = BugFixerAgent(
62 project_root=".",
63 max_iterations=15
64 )
65 result = agent.fix(
66 bug_report=bug_report,
67 test_command="pytest -v --tb=short"
68 )
69
70 # Write result to file for next step
71 with open("fix_result.json", "w") as f:
72 json.dump({
73 "success": result.success,
74 "description": result.description,
75 "files_changed": result.files_changed,
76 "iterations": result.iterations,
77 }, f)
78
79 if result.success:
80 print("Auto-fix succeeded")
81 exit(0)
82 else:
83 print(f"Auto-fix failed: {result.description}")
84 exit(1)
85 EOF
86
87 - name: Commit and push fix
88 if: steps.autofix.outcome == 'success'
89 run: |
90 # Only commit if there are changes
91 if git diff --quiet; then
92 echo "No file changes to commit"
93 exit 1
94 fi
95
96 RESULT=$(cat fix_result.json | python3 -c "import sys,json; print(json.load(sys.stdin)['description'])")
97 git add -A
98 git commit -m "fix: AI auto-fix — $RESULT [skip ci]"
99 git push
100
101 - name: Post fix comment to PR
102 if: always()
103 env:
104 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
105 REPO_NAME: ${{ github.repository }}
106 run: |
107 python - <<'EOF'
108 import os, json
109 from github import Github
110
111 with open("fix_result.json") as f:
112 result = json.load(f)
113
114 gh = Github(os.environ["GITHUB_TOKEN"])
115 repo = gh.get_repo(os.environ["REPO_NAME"])
116
117 # Find the PR for this branch
118 head_sha = "${{ github.event.workflow_run.head_sha }}"
119 prs = repo.get_pulls(state="open")
120 pr = next((p for p in prs if p.head.sha == head_sha), None)
121
122 if pr:
123 if result["success"]:
124 body = (
125 f"🤖 **Auto-Fix Applied**\n\n"
126 f"The CI failure was automatically fixed in {result['iterations']} iterations.\n\n"
127 f"**Change:** {result['description']}\n\n"
128 f"**Files modified:** {', '.join(result['files_changed'])}\n\n"
129 f"Please review the commit and re-run CI to confirm."
130 )
131 else:
132 body = (
133 f"🤖 **Auto-Fix Failed**\n\n"
134 f"The AI bug fixer could not automatically resolve the CI failure "
135 f"after {result['iterations']} iterations.\n\n"
136 f"**What was tried:** {result['description'][:500]}\n\n"
137 f"Manual investigation is required."
138 )
139 pr.get_issue().create_comment(body)
140 EOFAuto-Push Safety
Automatically pushing AI-generated commits to PR branches is powerful but carries risk. Consider requiring a human approval step (GitHub Environments with required reviewers) before pushing to the branch. Use '[skip ci]' in auto-fix commit messages to prevent infinite CI loops. Never auto-push directly to main or master.
Workflow 3: Human-in-the-Loop Gate
For higher-stakes operations, require explicit human approval before the agent acts:
1# .github/workflows/ai-dependency-update.yml
2name: AI Dependency Update
3
4on:
5 schedule:
6 - cron: "0 9 * * 1" # Every Monday at 9am UTC
7 workflow_dispatch:
8
9jobs:
10 generate-update-plan:
11 runs-on: ubuntu-latest
12 environment: ai-actions # <- GitHub Environment with required reviewers
13 # This job pauses until a reviewer approves it in the GitHub Actions UI
14 steps:
15 - uses: actions/checkout@v4
16 - uses: actions/setup-python@v5
17 with:
18 python-version: "3.11"
19 - name: Run dependency update agent
20 env:
21 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
22 run: |
23 python run_dependency_agent.pyTo set up the environment gate:
- Go to repository Settings → Environments → New environment
- Name it
ai-actions - Add Required reviewers — your senior engineers
- Any job using
environment: ai-actionswill pause and send a notification requesting approval
Workflow 4: Nightly Maintenance Agent
A scheduled agent that handles low-risk maintenance tasks overnight:
1# .github/workflows/ai-nightly-maintenance.yml
2name: AI Nightly Maintenance
3
4on:
5 schedule:
6 - cron: "0 2 * * *" # 2am UTC every night
7
8jobs:
9 maintenance:
10 runs-on: ubuntu-latest
11 timeout-minutes: 30
12
13 strategy:
14 matrix:
15 task:
16 - name: "fix-flaky-tests"
17 prompt: "Find and fix any flaky tests (tests that sometimes pass and sometimes fail without code changes). Run the test suite 3 times and identify tests with inconsistent results."
18 - name: "update-type-hints"
19 prompt: "Add missing Python type hints to functions that lack them in the src/ directory. Focus on public API functions. Run mypy to verify no type errors are introduced."
20 - name: "remove-dead-code"
21 prompt: "Identify and remove unused imports and obviously dead code (unreachable code, unused variables). Run tests to verify nothing breaks."
22
23 steps:
24 - uses: actions/checkout@v4
25 - uses: actions/setup-python@v5
26 with:
27 python-version: "3.11"
28 - name: Install dependencies
29 run: pip install anthropic pytest mypy
30 - name: Run maintenance task
31 env:
32 ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
33 TASK_NAME: ${{ matrix.task.name }}
34 TASK_PROMPT: ${{ matrix.task.prompt }}
35 run: |
36 python run_maintenance_agent.py
37 - name: Open PR if changes were made
38 run: |
39 if ! git diff --quiet; then
40 git config user.name "AI Maintenance Bot"
41 git config user.email "ai-bot@yourcompany.com"
42 git checkout -b "ai-maintenance/$TASK_NAME-$(date +%Y%m%d)"
43 git add -A
44 git commit -m "chore(ai): $TASK_NAME maintenance"
45 git push origin HEAD
46 gh pr create \
47 --title "AI Maintenance: $TASK_NAME" \
48 --body "Automated maintenance task. Please review before merging." \
49 --label "ai-generated"
50 fi
51 env:
52 GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
53 TASK_NAME: ${{ matrix.task.name }}Cost Controls
Without controls, autonomous agents can run up significant API costs. Three layers of protection:
1. Per-Run Token Budget
1# Wrap the Anthropic client with token tracking
2class BudgetedClient:
3 def __init__(self, max_tokens_per_run: int = 100_000):
4 self.client = anthropic.Anthropic()
5 self.tokens_used = 0
6 self.max_tokens = max_tokens_per_run
7
8 def messages_create(self, **kwargs) -> anthropic.types.Message:
9 if self.tokens_used >= self.max_tokens:
10 raise RuntimeError(
11 f"Token budget exceeded: used {self.tokens_used}/{self.max_tokens}"
12 )
13 response = self.client.messages.create(**kwargs)
14 self.tokens_used += response.usage.input_tokens + response.usage.output_tokens
15 return response
16
17 @property
18 def estimated_cost_usd(self) -> float:
19 # claude-sonnet-4-6 pricing: $3/M input, $15/M output (approximate)
20 return self.tokens_used * 0.000009 # conservative average2. Iteration Limits
Always set max_iterations conservatively. Most tasks complete in 5–10 iterations. An agent still running at iteration 25 has either hit an edge case or is looping — either way, stopping it is the right call.
1agent = BugFixerAgent(max_iterations=15) # Stop after 15 iterations3. GitHub Actions Monthly Spending Limit
In GitHub Settings → Billing, set a spending limit for Actions. This caps compute costs regardless of how many workflows run.
Observability: Tracking Agent Activity
In production, you need to know what your agents did, how long they ran, and how much they cost.
1# observability/agent_logger.py
2import json
3import os
4from datetime import datetime
5from pathlib import Path
6
7
8class AgentLogger:
9 """
10 Logs agent runs to a JSONL file for auditing and cost tracking.
11 Each line is one agent run.
12 """
13
14 def __init__(self, log_path: str = "./agent_runs.jsonl"):
15 self.log_path = Path(log_path)
16
17 def log_run(
18 self,
19 agent_type: str,
20 task: str,
21 success: bool,
22 iterations: int,
23 tokens_used: int,
24 files_changed: list[str],
25 error: str = "",
26 ) -> None:
27 entry = {
28 "timestamp": datetime.utcnow().isoformat(),
29 "agent_type": agent_type,
30 "task": task[:200],
31 "success": success,
32 "iterations": iterations,
33 "tokens_used": tokens_used,
34 "estimated_cost_usd": round(tokens_used * 0.000009, 4),
35 "files_changed": files_changed,
36 "error": error[:500] if error else "",
37 "github_run_id": os.environ.get("GITHUB_RUN_ID", "local"),
38 "github_actor": os.environ.get("GITHUB_ACTOR", ""),
39 }
40
41 with open(self.log_path, "a", encoding="utf-8") as f:
42 f.write(json.dumps(entry) + "\n")
43
44 def summary(self, last_n_runs: int = 50) -> dict:
45 """Return aggregated metrics for the last N runs."""
46 if not self.log_path.exists():
47 return {}
48
49 lines = self.log_path.read_text().strip().splitlines()
50 runs = [json.loads(line) for line in lines[-last_n_runs:]]
51
52 return {
53 "total_runs": len(runs),
54 "success_rate": sum(r["success"] for r in runs) / len(runs),
55 "avg_iterations": sum(r["iterations"] for r in runs) / len(runs),
56 "total_tokens": sum(r["tokens_used"] for r in runs),
57 "total_cost_usd": sum(r["estimated_cost_usd"] for r in runs),
58 "most_changed_files": _top_files(runs),
59 }
60
61
62def _top_files(runs: list[dict]) -> list[str]:
63 from collections import Counter
64 counter = Counter()
65 for run in runs:
66 for f in run.get("files_changed", []):
67 counter[f] += 1
68 return [f for f, _ in counter.most_common(10)]Production Safety Checklist
Before enabling autonomous AI agents in your CI/CD pipeline, verify each item:
- ☐ Fork PR protection: workflows check that the PR is from the same repo, not a fork
- ☐ Branch protection: agents cannot push directly to
mainormaster - ☐ Iteration limits: every agent has a
max_iterationsceiling - ☐ Token budget: per-run token limit prevents runaway costs
- ☐ Path traversal protection: file tool restricts all operations to the project directory
- ☐ Command blocklist: shell tool blocks destructive commands (
rm -rf,sudo) - ☐ Skip-CI commits: auto-fix commits include
[skip ci]to prevent CI loops - ☐ Human approval gates: high-stakes actions route through GitHub Environments with required reviewers
- ☐ Agent run logging: every run is logged with token count, outcome, and files changed
- ☐ Secrets are secrets:
.envfiles are in.gitignore; secrets are in GitHub Secrets, not hardcoded - ☐ Timeout on all jobs: every GitHub Actions job has a
timeout-minutesto cap runaway compute
The Developer Experience
When this pipeline is running in production, the developer experience looks like this:
- Developer opens a PR
- Within 2–3 minutes: AI review appears as a PR review comment with structured feedback
- If the review requests changes, the PR is blocked until addressed
- Developer pushes fixes, CI runs, tests pass
- If tests fail: an auto-fix attempt is made within 5 minutes; if successful, a fix commit appears; if not, a comment explains what the agent tried
- PR is approved and merged
- Overnight: maintenance agents run on a schedule, keeping the codebase clean
Human reviewers are still needed for architecture, security decisions, and the AI review output itself — but all of the first-pass mechanical work is handled automatically.
Key Takeaways
- CI/CD integration turns one-off agent scripts into continuous, automated engineering assistance
- GitHub Actions provides the orchestration layer: events, outputs, conditions, concurrency controls, and secrets management
- Human approval gates via GitHub Environments are the right tool for higher-stakes agent actions
- Cost control requires three layers: per-run token budgets, iteration limits, and platform-level spending caps
- Observability — structured logging of every agent run — is what lets you confidently expand agent autonomy over time as you verify reliability
- Start narrow: deploy the PR review agent first (read-only, low risk), measure quality, then gradually expand to auto-fix and maintenance
AI Coding Agents Series — Complete
You have now completed the full AI Coding Agents Series:
- What Are AI Coding Agents?
- AI Coding Agents Compared: GitHub Copilot vs Cursor vs Devin vs Claude Code
- Build Your First AI Coding Agent with the Claude API
- Build an Automated GitHub PR Review Agent
- Build an Autonomous Bug Fixer Agent
- AI Coding Agents in CI/CD: Automate Code Reviews and Fixes in Production ← you are here
For more agent architecture patterns, see the Anthropic AI Tutorial Series — particularly the posts on AI Agents and Model Context Protocol.
