Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1 and 2 can be overridden via flag. 3 is practically the whole point of the software


With 3 I mean that in cases where there was an unambiguously correct way to recover from the situation, etcd did not automatically recover. My wrapper program would always recover from thise situations. (It's been a number of years and the exact details are hazy now, though.)


If the majority of quorum is truly down, then you’re down. That is by design. There’s no good way to recover from this without potentially losing state so the system correctly does nothing at this point. Sure you can force it into working state with external intervention but that’s up to you


Like I said I'm hazy on the details, this was a small thing I did a long time ago. But I do remember our on-call having to deal with a lot of manual repair of etcd quorum, and I noticed the runbook to fix it had no steps that needed any human decision making, so I made that wrapper program to automate the recovery. It wasn't complex either, IIRC it was about one or two pages of code, mostly logging.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: