What was broken.
Engineering teams drowning in alerts. Median time-to-fix kept growing; postmortems written weeks late, missing detail. On-call engineers were burning out from interrupt-driven work.
An AI copilot for on-call engineers — triages alerts, runs runbooks, and drafts postmortems before the coffee finishes brewing.
Engineering teams drowning in alerts. Median time-to-fix kept growing; postmortems written weeks late, missing detail. On-call engineers were burning out from interrupt-driven work.
Designed a multi-agent system where each agent owns a domain — triage, diagnosis, comms, postmortem. Built a streaming UI that reveals each agent's reasoning and tool calls so engineers stay in the loop instead of trusting blindly. Every action is reversible with one keystroke.
Shipped to 40k+ engineers within six months. 62% of common incidents now resolve without human input. Median MTTR dropped from 47min to 12min. Postmortems are drafted automatically; engineers just edit and approve.