Project: Build an AI-Powered IT Incident Report Generator with Claude

Manual incident reporting is one of IT's most error-prone, time-consuming tasks. Under pressure during an active outage, engineers write inconsistent notes, forget key fields, and delay formal documentation until hours after resolution. That gap means lost data, missed patterns, and failed audits.
Claude can generate a complete, structured incident report from raw notes in under five seconds. It classifies severity, extracts the timeline, identifies root causes, suggests remediation steps, and formats everything to your organisation's template — consistently, every time.
This project builds a complete incident report generator: paste in raw incident notes and receive a professional, structured report ready for your ITSM ticketing system.
Schema and Tool Design
The heart of this system is a structured output tool that forces Claude to produce every required incident field, never leaving a section blank.
1import anthropic
2from datetime import datetime
3import json
4
5client = anthropic.Anthropic()
6
7# ─── Report Schema ─────────────────────────────────────────────────────────────
8
9INCIDENT_REPORT_TOOL = {
10 "name": "generate_incident_report",
11 "description": "Generates a complete, structured IT incident report from raw notes.",
12 "input_schema": {
13 "type": "object",
14 "properties": {
15 "incident_title": {
16 "type": "string",
17 "description": "Brief, clear title for the incident (max 100 chars)"
18 },
19 "severity": {
20 "type": "string",
21 "enum": ["P1 - Critical", "P2 - High", "P3 - Medium", "P4 - Low"],
22 "description": "Severity based on impact and affected users"
23 },
24 "incident_type": {
25 "type": "string",
26 "enum": [
27 "Infrastructure Outage", "Security Incident", "Application Failure",
28 "Data Integrity Issue", "Network Disruption", "Database Issue",
29 "Authentication Failure", "Performance Degradation", "Other"
30 ]
31 },
32 "affected_systems": {
33 "type": "array",
34 "items": {"type": "string"},
35 "description": "List of systems, services, or components affected"
36 },
37 "affected_users_estimate": {
38 "type": "string",
39 "description": "Estimated number or percentage of users impacted"
40 },
41 "incident_start": {
42 "type": "string",
43 "description": "Best estimate of incident start time from notes"
44 },
45 "incident_detected": {
46 "type": "string",
47 "description": "When the incident was first detected or reported"
48 },
49 "incident_resolved": {
50 "type": "string",
51 "description": "When the incident was fully resolved, or 'Ongoing'"
52 },
53 "summary": {
54 "type": "string",
55 "description": "2-3 sentence factual summary of what happened and impact"
56 },
57 "timeline": {
58 "type": "array",
59 "items": {
60 "type": "object",
61 "properties": {
62 "time": {"type": "string"},
63 "event": {"type": "string"}
64 },
65 "required": ["time", "event"]
66 },
67 "description": "Chronological list of key events extracted from notes"
68 },
69 "root_cause": {
70 "type": "string",
71 "description": "Most probable root cause based on the notes. Use 'Under investigation' if unclear."
72 },
73 "contributing_factors": {
74 "type": "array",
75 "items": {"type": "string"},
76 "description": "Secondary factors that contributed to the incident or its impact"
77 },
78 "resolution_steps_taken": {
79 "type": "array",
80 "items": {"type": "string"},
81 "description": "Chronological actions taken to resolve the incident"
82 },
83 "recommended_follow_up_actions": {
84 "type": "array",
85 "items": {
86 "type": "object",
87 "properties": {
88 "action": {"type": "string"},
89 "priority": {"type": "string", "enum": ["Immediate", "This Sprint", "Next Quarter"]},
90 "owner": {"type": "string", "description": "Role or team responsible"}
91 },
92 "required": ["action", "priority", "owner"]
93 }
94 },
95 "lessons_learned": {
96 "type": "array",
97 "items": {"type": "string"},
98 "description": "Key learnings that should be acted on to prevent recurrence"
99 },
100 "requires_post_mortem": {
101 "type": "boolean",
102 "description": "True for P1/P2 incidents or any with significant data loss or security impact"
103 }
104 },
105 "required": [
106 "incident_title", "severity", "incident_type", "affected_systems",
107 "affected_users_estimate", "incident_start", "incident_detected",
108 "incident_resolved", "summary", "timeline", "root_cause",
109 "contributing_factors", "resolution_steps_taken",
110 "recommended_follow_up_actions", "lessons_learned", "requires_post_mortem"
111 ]
112 }
113}Why Tool Use for Structured Reports?
Using tool_choice forces Claude to populate every required field, which a free-text prompt cannot guarantee. With tool use, if Claude cannot determine a value, it uses a defined fallback like 'Under investigation' rather than skipping the field entirely. This is essential for audit-trail integrity and ITSM integration where missing fields cause import failures.
Report Generation Engine
1# ─── Generation Engine ─────────────────────────────────────────────────────────
2
3SYSTEM_PROMPT = """You are an expert IT incident management specialist and technical writer.
4You produce professional, accurate, and actionable incident reports from raw engineer notes.
5
6Guidelines:
7- Extract facts from the notes; do not invent information not present in the notes
8- If a timeline entry has no specific time, estimate it as relative time (e.g., "T+15min")
9- Severity classification: P1=total service outage or data breach, P2=major feature down or >20% users affected, P3=partial degradation, P4=cosmetic or minor
10- Root cause should be factual and specific, not vague generalisations
11- Follow-up actions must be specific and actionable, not generic advice
12"""
13
14
15def generate_incident_report(raw_notes: str, additional_context: dict = None) -> dict:
16 """
17 Generate a structured incident report from raw notes.
18
19 Args:
20 raw_notes: Raw text notes from engineers or ticket history
21 additional_context: Optional dict with known fields e.g. {"reported_by": "Alice"}
22
23 Returns:
24 Structured incident report as a dict
25 """
26 context_str = ""
27 if additional_context:
28 context_str = "\n\nAdditional known context:\n" + json.dumps(additional_context, indent=2)
29
30 message = client.messages.create(
31 model="claude-sonnet-4-6",
32 max_tokens=4096,
33 system=SYSTEM_PROMPT,
34 tools=[INCIDENT_REPORT_TOOL],
35 tool_choice={"type": "tool", "name": "generate_incident_report"},
36 messages=[
37 {
38 "role": "user",
39 "content": f"""RAW INCIDENT NOTES:
40{raw_notes}
41{context_str}
42
43Generate a complete, structured incident report from these notes."""
44 }
45 ]
46 )
47
48 # Extract tool use result
49 for block in message.content:
50 if block.type == "tool_use":
51 return block.input
52
53 raise ValueError("Claude did not return a structured report — check tool configuration")
54
55
56# ─── Report Formatter ──────────────────────────────────────────────────────────
57
58def format_report_markdown(report: dict, report_id: str = None) -> str:
59 """Format the structured report as a Markdown document."""
60
61 rid = report_id or f"INC-{datetime.now().strftime('%Y%m%d-%H%M')}"
62 generated_at = datetime.now().strftime("%Y-%m-%d %H:%M UTC")
63
64 post_mortem_flag = "**Post-Mortem Required**" if report.get("requires_post_mortem") else "Not required"
65
66 # Timeline block
67 timeline_lines = []
68 for entry in report.get("timeline", []):
69 timeline_lines.append(f"| {entry.get('time', '?')} | {entry.get('event', '')} |")
70 timeline_table = (
71 "| Time | Event |\n|------|-------|\n" + "\n".join(timeline_lines)
72 if timeline_lines else "_No timeline extracted._"
73 )
74
75 # Follow-up actions
76 follow_up_lines = []
77 for action in report.get("recommended_follow_up_actions", []):
78 follow_up_lines.append(
79 f"- [{action.get('priority')}] **{action.get('action')}** — Owner: {action.get('owner')}"
80 )
81 follow_up_str = "\n".join(follow_up_lines) or "None identified."
82
83 affected_systems = ", ".join(report.get("affected_systems", [])) or "Unknown"
84 contributing = "\n".join(f"- {f}" for f in report.get("contributing_factors", [])) or "- None identified"
85 resolution_steps = "\n".join(f"- {s}" for s in report.get("resolution_steps_taken", [])) or "- None documented"
86 lessons = "\n".join(f"- {l}" for l in report.get("lessons_learned", [])) or "- None identified"
87
88 return f"""# Incident Report: {rid}
89
90**Generated:** {generated_at}
91**Title:** {report.get("incident_title")}
92**Severity:** {report.get("severity")}
93**Type:** {report.get("incident_type")}
94**Post-Mortem:** {post_mortem_flag}
95
96---
97
98## Impact
99
100| Field | Value |
101|-------|-------|
102| Affected Systems | {affected_systems} |
103| Estimated Users Affected | {report.get("affected_users_estimate", "Unknown")} |
104| Incident Start | {report.get("incident_start", "Unknown")} |
105| Detected At | {report.get("incident_detected", "Unknown")} |
106| Resolved At | {report.get("incident_resolved", "Unknown")} |
107
108---
109
110## Summary
111
112{report.get("summary", "")}
113
114---
115
116## Timeline
117
118{timeline_table}
119
120---
121
122## Root Cause Analysis
123
124**Root Cause:** {report.get("root_cause", "Under investigation")}
125
126**Contributing Factors:**
127{contributing}
128
129---
130
131## Resolution
132
133**Steps Taken:**
134{resolution_steps}
135
136---
137
138## Follow-Up Actions
139
140{follow_up_str}
141
142---
143
144## Lessons Learned
145
146{lessons}
147"""
148
149
150def export_report_json(report: dict, output_path: str):
151 """Export report as JSON for ITSM integration."""
152 with open(output_path, "w") as f:
153 json.dump(report, f, indent=2)
154 print(f"Report exported to {output_path}")Pipeline and Demo
1# ─── Example Usage ─────────────────────────────────────────────────────────────
2
3SAMPLE_NOTES = """
4Started getting alerts around 14:22 - database connection pool exhausted on prod-db-01.
5App servers started returning 503s to users. Estimated about 3000 users affected based
6on Cloudwatch error rate.
7
8Sarah noticed that a new deployment went out at 14:15 - the new order processing service.
9Jon checked the slow query log and found the new service was running a full table scan
10on the orders table on every request - no index on the customer_id column.
11
12We rolled back the deployment at 14:48. Services recovered within 2 minutes of rollback.
13Db connection pool back to normal by 14:51.
14
15The orders table has 12 million rows. The query was running in ~4 seconds each time.
16Under normal load that was fine in staging but prod traffic is 50x higher.
17
18We need to: add the index, fix the query, update the staging load test to match prod volume,
19and check if any other new queries have similar issues.
20"""
21
22if __name__ == "__main__":
23 print("Generating incident report...")
24
25 report_data = generate_incident_report(
26 raw_notes=SAMPLE_NOTES,
27 additional_context={
28 "reported_by": "Monitoring System (PagerDuty)",
29 "team": "Platform Engineering",
30 "environment": "Production"
31 }
32 )
33
34 # Format as Markdown
35 markdown_report = format_report_markdown(report_data, report_id="INC-2026041401")
36 print(markdown_report)
37
38 # Export as JSON for ITSM
39 export_report_json(report_data, "incident_report.json")Integrate with Your ITSM Ticketing System
Most ITSM platforms (Jira Service Management, ServiceNow, PagerDuty, Opsgenie) have REST APIs that accept structured JSON. Export the report dict and POST it directly to your ticketing system to auto-populate all fields, attach it to an existing alert ticket, or create a new post-mortem task — saving engineers 30-60 minutes of manual data entry after every incident.
Severity Classification Logic
Claude classifies severity using its reasoning on the notes, but you can reinforce the rules in the system prompt:
- P1 – Critical: Complete service outage, data breach, or full unavailability of a business-critical system affecting all or most users
- P2 – High: Major feature unavailable, more than 20% of users affected, significant revenue or compliance impact, or SLA breach imminent
- P3 – Medium: Partial degradation, workaround available, limited subset of users affected
- P4 – Low: Cosmetic issue, documentation error, no functional user impact
Human Review is Mandatory
This system generates reports as a starting point — a human engineer must review and approve before submitting to leadership or external stakeholders. Claude works from notes; if notes are incomplete or inaccurate, the report will reflect that. The goal is to eliminate blank-page paralysis and ensure no field is forgotten, not to replace engineering judgment.
Summary
This incident report generator solves a real operational pain point: turning raw, chaotic incident notes into professional, consistent, structured documentation in under ten seconds.
- The tool use schema guarantees every required field is populated — no missing root cause, no blank follow-up actions
- Severity and type classification are handled automatically, removing debate about severity during triage
- JSON export enables direct integration with Jira, ServiceNow, or any ITSM with a REST API
- The system is additive, not authoritative — it accelerates reporting, engineers provide oversight
Next IT pro project: Project: Build a Data Analyst Agent — CSV Insights in Plain English.
This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a RAG App with Claude — Query Your Own Documents.
