It's fast, but I figured doing that on both sides before adding looked a bit inelegant and maybe it could be avoided by doing "something something bit operations" and then I dropped the thought and clicked the link.
On a modern architecture given that most integers are usually u32 by default but the underlying CPU deals with 64bits natively, I'd just cast to u64 and call it a day.
Actually I was curious to see if GCC would be smart enough to automatically choose what's the best optimization depending on the underlying architecture, but it doesn't appear to be the case.
A lot more register spilling in the 64bit version since it decides to do a true 64bit add using two registers and an adc.
My code, for reference:
uint32_t avg_64bits(uint32_t a, uint32_t b) {
uint64_t la = a;
uint64_t lb = b;
return (la + lb) / 2;
}
uint32_t avg_patented_do_not_steal(uint32_t a, uint32_t b) {
return (a / 2) + (b / 2) + (a & b & 1);
}