The Sysadmin's Guide to AI-Powered Documentation
Every sysadmin I've ever met has the same dirty secret: their documentation is terrible. Not because they're lazy — because documentation is the thing that gets sacrificed when the server's on fire, the ticket queue is overflowing, and there are only so many hours in a day. You always plan to go back and document that firewall change. You never do.
I've been using AI tools to attack this problem for about six months now, and the results have been genuinely surprising. Not perfect — I'll be honest about where things fall apart — but good enough that my documentation coverage went from maybe 40% to closer to 85%. Here's what works.
Runbook Generation: From Tribal Knowledge to Actual Pages
The biggest documentation gap in most shops is runbooks. The procedure for restarting the payment processing service lives in Dave's head. Dave's on vacation. Good luck.
My approach is simple and slightly sneaky. Next time you perform a procedure, record it. Not a screen recording — just open a text file and jot down every command you run, every screen you click through, every decision point. Don't worry about formatting. Just capture the raw sequence.
Then feed that messy notes file to Claude or ChatGPT with this prompt: "Convert these raw notes into a structured runbook. Include: purpose, prerequisites, step-by-step procedure with exact commands, expected output at each step, troubleshooting for common failures, and rollback steps. Target audience is a junior sysadmin who hasn't performed this procedure before."
The "junior sysadmin" framing is key. It forces the AI to spell out things you'd skip because they're obvious to you. Things like "connect to the VPN first" or "you'll need sudo access on this box." Those obvious-to-you steps are exactly what's missing from most runbooks.
I've generated about 35 runbooks this way. Average time: 15 minutes of note-taking during the procedure, plus 20 minutes editing the AI output. Compare that to the 2-3 hours it takes to write a good runbook from scratch. That's a 75% time savings, and more importantly, these runbooks actually get created instead of living on a "someday" list forever.
Network Diagram Descriptions (Not Diagrams Themselves — Yet)
Let me be upfront: AI can't generate good network diagrams directly. I've tried. Every tool that claims to do this produces something that looks like a toddler attacked Visio. What AI can do is generate the structured description that makes creating the diagram much faster.
Feed your firewall rules, routing tables, VLAN configs, and subnet information into an AI prompt asking it to: "Analyze these network configurations and produce a structured description of the network topology, including: all subnets and their purposes, routing relationships between segments, firewall rules summarized by zone, and any anomalies or inconsistencies you notice."
Take that structured output and use it as the blueprint for your diagram in draw.io, Lucidchart, or whatever you prefer. You'll build the diagram in half the time because the AI has already done the analysis work — figuring out which subnets talk to which, where the choke points are, and how traffic flows.
Bonus: the AI-generated description often catches things. I had it flag a subnet that had routing to the internet but no firewall rules — turned out to be a test VLAN from a project two years ago that nobody decommissioned. That alone justified the whole exercise.
Change Log Reconstruction: Documenting the Past
Here's a use case I didn't expect to work as well as it does. You have a server that's been modified by five different people over three years. Nobody documented anything. You need to understand what's been done to this thing.
Gather up whatever you can: shell history files, modification timestamps on config files, Git history if you're lucky enough to have configs in version control, backup diffs if you have those. Feed the whole mess into AI with: "Reconstruct a change log for this system based on the evidence provided. For each change, estimate when it happened, what was modified, and speculate on why based on the nature of the change. Flag any changes that look like they might have been emergency fixes or workarounds."
The output isn't perfect history — it's informed speculation. But informed speculation is infinitely better than "nobody knows what happened to this box." I've used this to reconstruct change histories for three legacy servers that were about to be migrated, and the reconstructed logs caught two configuration changes that would have broken things in the new environment.
Tool Comparison: What Actually Works
I've tested four AI tools for documentation work. Here's the honest breakdown.
Claude (Sonnet/Opus): Best for long-form documentation. The 200K context window means you can feed in massive config files and get coherent analysis. System prompt customization through Projects is great for maintaining consistent documentation style. Weakness: sometimes over-explains things.
ChatGPT-4o: Good all-rounder. Custom GPTs let you build a documentation assistant with your style guide baked in. Better at generating concise bullet-point procedures. Weakness: context window is smaller, so it struggles with very large config files.
GitHub Copilot: Underrated for infrastructure-as-code documentation. If your configs are in VS Code, Copilot can generate inline comments and README files that are surprisingly accurate. Weakness: only useful if you're already in the code editor.
Ollama + Llama 3: If you can't send configs to a cloud provider (and honestly, for production network configs, maybe you shouldn't), running a local model works. Quality is maybe 70% of Claude for documentation tasks, but 70% quality with zero data exposure is the right trade-off for sensitive infrastructure docs. Weakness: slower, less capable with complex analysis.
The Maintenance Problem (And a Partial Solution)
Generating documentation is the easy part. Keeping it current is where everything falls apart. Three months after you write that beautiful runbook, the application gets upgraded and half the steps are wrong.
My partial solution: I have a quarterly "documentation audit" prompt. I feed in the existing runbook plus the current system configs and ask: "Compare this documentation against the current configuration. Identify any steps that appear outdated, any new components not covered in the documentation, and any procedures that may have changed." It catches about 60% of drift. The other 40% requires a human who knows the system to review and update.
I've also started adding a "Last AI-audited" date stamp to every document. When someone opens a runbook and sees it was last audited two weeks ago versus eight months ago, they know how much to trust it. Simple, but it changes behavior.
Getting Started Without Overcomplicating It
Don't try to document everything at once. Pick your top five undocumented procedures — the ones where if the person who knows them got hit by a bus, you'd be in real trouble. Generate runbooks for those first. Then do five more next month. In six months you'll have documentation coverage that would have taken two years of "I'll get to it eventually."
Keep the AI-generated docs in the same system as your other docs. Don't create a separate "AI docs" section — that's a fast track to them being ignored. And always have a human review before publishing. AI occasionally gets the order of steps wrong or misses a prerequisite that's obvious to anyone who's actually done the procedure. A 10-minute human review catches those issues and turns a good draft into a reliable document.