> Is 75 minutes really considered that long of a time? [...] When I worked on firmware we frequently spent _weeks_ trying to diagnose what part of the firmware was broken.
One might spend weeks diagnosing a problem if the problem only happens 0.01% of the time, correlated with nothing, goes away when retried, and nobody can reproduce it in a test environment.
But 0.01%-and-it-goes-away-when-retried does not make a high priority incident. High priority incidents tend to be repeatable problems that weren't there an hour ago.
Generally a well designed, properly resourced business critical system will be simple enough and well enough monitored that problems can be diagnosed in a good deal less than 75 minutes - even if rolling out a full fix takes longer.
Of course, I don't know how common well designed, properly resourced business critical systems are.
One might spend weeks diagnosing a problem if the problem only happens 0.01% of the time, correlated with nothing, goes away when retried, and nobody can reproduce it in a test environment.
But 0.01%-and-it-goes-away-when-retried does not make a high priority incident. High priority incidents tend to be repeatable problems that weren't there an hour ago.
Generally a well designed, properly resourced business critical system will be simple enough and well enough monitored that problems can be diagnosed in a good deal less than 75 minutes - even if rolling out a full fix takes longer.
Of course, I don't know how common well designed, properly resourced business critical systems are.