The full-session evaluation framing is the right call - most teams don't realize...

atarus · 2026-03-03T22:35:54 1772577354

We track the failure modes in production directly instead of relying on simulation. So if suddenly we are seeing a failure mode pop up too often, we can alert timely. In the approach of going from simulation to monitoring, I am worried the feedback might be delayed.

Doing it in production also helps to go run simulations by replaying those production conversations ensuring you are handling regression.