/ 01 — The problem
What was broken.
LLM features were shipping with no regression safety. A prompt change at 4pm broke production at 5pm. No team had a clear way to say 'is this prompt still good?'
An internal tool that lets teams ship LLM features with the same confidence as normal code — behavioural specs, regression tests, 200ms feedback loop.
LLM features were shipping with no regression safety. A prompt change at 4pm broke production at 5pm. No team had a clear way to say 'is this prompt still good?'
Designed a spec format that reads like English but compiles to a deterministic check. Spec runner caches deterministically; rerunning unchanged specs costs nothing. Built into the existing CI so no new tools to learn.
Adopted by every team shipping LLM features. 6 prompt regressions caught before deploy in the first month. Engineers describe it as 'finally feels like normal software'.