I did about something same. If there are 5 services each can fail independently (unlikely but lets assume) and uptime is 99%. Then P(All up at same time) = 0.99^5 which gives any one is down at any moment is 1- 0.99^5 ~5%. So this increase the failure 5 times of original 1% downtime. And with 100s of micro services in overall architecture with some indirect connections between many of them I think this number could possibly go much higher.
Further at least where I work it is clearly that failure rate is higher than 5%. But with cottage industry of observability tools, cloud native solutions ..blah..blah.. telling basic maths to people in responsible positions is sure shot way to get fired. I am already being marked as someone opposed to progress so I can basically take my statistics and shove up mine. There is million times more data about reliability of micro services and they all can't be wrong.
Further at least where I work it is clearly that failure rate is higher than 5%. But with cottage industry of observability tools, cloud native solutions ..blah..blah.. telling basic maths to people in responsible positions is sure shot way to get fired. I am already being marked as someone opposed to progress so I can basically take my statistics and shove up mine. There is million times more data about reliability of micro services and they all can't be wrong.