ClaudeOperations

Incident Response Runbook Generator

مشاركة:

Why this prompt matters

During incidents, engineers need step-by-step procedures, not documentation to read. Runbooks with specific commands reduce MTTR significantly.

What we use it for

Operations

Prompt

Create an incident response runbook for the described system. Include: 1) Detection criteria (what alerts trigger this), 2) Severity assessment matrix, 3) Step-by-step mitigation procedures, 4) Communication templates for stakeholders, 5) Post-incident review checklist, 6) Preventive measures to implement after resolution.

Result

## Runbook: Database Connection Pool Exhaustion

### Detection - Alert: `db_active_connections > 90% of pool_size` for 5 minutes - Symptom: API requests returning 503 with increasing latency

### Severity Assessment | Metric | Sev 1 | Sev 2 | Sev 3 | |--------|-------|-------|-------| | Error rate | >10% | 5-10% | <5% | | Affected users | All | Subset | Internal |

### Mitigation Steps 1. Verify: `SELECT count(*) FROM pg_stat_activity;` 2. Identify long-running queries: `SELECT pid, query, duration FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC LIMIT 10;` 3. Kill stuck queries: `SELECT pg_terminate_backend(pid);` 4. If persistent: Scale connection pool via `DATABASE_POOL_SIZE` env var

During incidents, engineers need step-by-step procedures, not documentation to read. Runbooks with specific commands reduce MTTR significantly.

This prompt demonstrates a structured approach to operations, producing consistent, high-quality results that can be iterated upon.

مشاركة: