Guides

How to Build an AI-Powered Runbook for Incident Response

Marcus ·

I've written a lot of runbooks over the years. Most of them ended up in a Confluence graveyard, opened once during an audit and never again during an actual incident. The problem isn't that runbooks are useless — it's that static, 47-page Word documents don't match how humans actually respond under pressure. You need something that adapts, that gives you the next step based on what you're seeing right now, not what someone imagined you'd see six months ago.

That's where AI comes in. Not as a replacement for your runbook, but as the engine that makes it dynamic. I spent three months rebuilding our incident response playbooks around AI prompts, and the results changed how my team handles incidents. Here's exactly how I did it.

Why Traditional Runbooks Fail During Real Incidents

Let's be honest about the failure mode. A traditional runbook assumes a linear incident: alert fires, analyst investigates, analyst follows steps 1 through 15, incident resolved. Real incidents don't work that way. You're halfway through step 4 when new IOCs appear. The scope changes. The attacker pivots. Your neat checklist is now irrelevant to what's actually happening on screen.

The second problem is context. A static runbook says "check for lateral movement." An AI-augmented runbook takes the specific IOCs from your current incident, the affected hosts, the timeline, and generates a targeted hunting query for this lateral movement pattern. That's the difference between a generic instruction and an actionable step.

The Architecture: Prompt Templates + Live Data

Here's the framework I use. Each runbook phase (detection, containment, eradication, recovery, lessons learned) gets a set of prompt templates. These templates have variables that get filled in with live incident data. The analyst feeds the output of one phase into the next.

You don't need a fancy platform for this. I started with a shared doc of prompt templates and analysts copy-pasting into Claude. We've since moved to a lightweight Python wrapper that pulls data from our SIEM (Splunk) and feeds it into API calls, but the concept works at any maturity level.

Phase 1: Detection and Initial Triage

The first prompt template I use when an alert fires looks like this:

"I'm a SOC analyst investigating an alert. Here's the raw alert data: [PASTE ALERT]. Here's the relevant log context from the past 60 minutes for the affected host: [PASTE LOGS]. Classify this alert as likely true positive, likely false positive, or indeterminate. Explain your reasoning. If likely true positive, identify the MITRE ATT&CK technique and suggest three immediate investigation steps specific to this alert type."

The key phrase is "specific to this alert type." Generic advice is useless. I want the AI to look at the actual data and tell me what to check next for this situation. In practice, Claude handles this well about 80% of the time. The other 20%, it gives generic advice anyway, and you need to push back with follow-up prompts.

Phase 2: Containment Prompts

Once you've confirmed an incident, containment is where AI shines brightest. Here's my containment prompt template:

"Confirmed incident: [TYPE]. Affected assets: [LIST]. Current indicators of compromise: [IOCs]. Our environment runs [ENDPOINT TOOL] for endpoint, [FIREWALL] for network, and [SIEM] for logging. Generate specific containment actions I can execute right now, in priority order. For each action, include the specific CLI command or console step for the tools I listed. Flag any containment action that could cause business disruption."

That last sentence is critical. During a ransomware incident last quarter, the AI-generated containment plan correctly flagged that isolating a specific server would break the payroll batch process that was mid-run. A generic runbook would have said "isolate affected hosts" and we'd have caused an outage on top of an incident.

When I tested this with CrowdStrike Falcon as our endpoint tool, the AI generated accurate RTR (Real Time Response) commands for host isolation and process killing. It even suggested the correct Splunk queries to verify containment was holding. That's the value — not replacing the analyst's judgment, but generating the specific commands so the analyst can focus on decision-making instead of syntax.

Phase 3: Eradication Prompts

Eradication is trickier because it requires understanding persistence mechanisms. My template:

"We've contained an incident involving [DESCRIPTION]. The attacker used [TECHNIQUES]. Affected hosts are [LIST]. Generate a comprehensive eradication checklist for this specific attack type, including: persistence mechanisms to check, registry keys to inspect, scheduled tasks to review, and any service accounts that may have been compromised. For each item, provide the specific command to check it on Windows/Linux as applicable."

I've found AI is particularly good at generating thorough persistence-mechanism checklists. It catches things that tired analysts miss at 3 AM — like checking for WMI event subscriptions or DLL search-order hijacking when the primary indicator was a scheduled task. The breadth of AI's training data means it "remembers" persistence mechanisms that your analyst might not think of under pressure.

Phase 4: Recovery and Validation

For recovery, the prompt focuses on verification:

"Incident [TYPE] has been eradicated from [HOSTS]. Before returning these systems to production, generate a validation checklist. Include: specific log queries to confirm no remaining attacker activity, baseline comparisons to run, and monitoring rules to add for the next 30 days to detect any recurrence. Our SIEM is Splunk and our EDR is CrowdStrike."

The 30-day monitoring rules are something I never would have included in a traditional runbook. AI consistently suggests post-incident detection rules tailored to the specific attack, which is exactly what you need to catch an attacker who comes back through the same vector.

Phase 5: Lessons Learned (The One Everyone Skips)

This is where AI saves the most time, honestly. After an incident, nobody wants to write the post-mortem. So I use this prompt:

"Here's the timeline of our incident response: [PASTE TIMELINE WITH TIMESTAMPS]. Generate a structured post-incident report including: executive summary (3 sentences), detailed timeline, root cause analysis, what went well, what needs improvement, and specific action items with suggested owners. Format for both a technical audience and an executive audience."

This alone saves 2-3 hours per incident. The AI-generated report isn't perfect — you need to edit it, add context only you know, and correct any assumptions. But going from a first draft to a final report is much faster than starting from a blank page after an exhausting incident.

Building Your Own: Practical Steps

  • Start with your top 5 alert types. Don't try to build AI runbooks for everything. Pick the five alerts that generate the most volume or the most analyst confusion.
  • Write prompt templates, not scripts. Keep the human in the loop. The analyst should be reading and evaluating AI output, not blindly executing it.
  • Include your tool names in every prompt. Generic advice is useless. Telling the AI you use Splunk, CrowdStrike, and Palo Alto means you get actionable commands instead of abstract suggestions.
  • Version your prompts. Put them in Git. When a prompt produces bad output during an incident, update it afterward. Your prompt library should improve with every incident.
  • Test with tabletop exercises. Run your AI runbooks through tabletop scenarios before you need them in production. You'll find gaps in the prompts that are easy to fix when you're not under pressure.

What I'd Do Differently

If I were starting over, I'd invest earlier in the API integration. Copy-pasting into a chat window works for proof of concept, but the real value comes when incident data flows automatically into prompts. We use a Python script that pulls the last hour of logs from Splunk for an affected host and formats them into our prompt template. That automation alone cut our initial triage time by 40%.

I'd also add feedback loops sooner. After every incident, rate whether each AI-generated response was helpful, partially helpful, or not helpful. After a few months, you'll have data on which prompt templates need refinement and which phases AI handles well versus poorly.

Start with one playbook for your noisiest alert type. Wire the data in, run it for two weeks, and measure whether your mean-time-to-investigate actually drops. If it does, build the next one. If it doesn't, your prompts need work — not your strategy.