Thanks for the reply and clarification! That’s an impressive throughput. Am I interpreting the behavior correctly to think that: your code has one branch per int and that branch will be predicted well when your integers are all of similar magnitude?
I never fully groked the Zipf distribution but am I correct to conclude the worst case for vu128 would be alternating integers on either side of the branch condition?
> Am I interpreting the behavior correctly to think that: your code has
> one branch per int and that branch will be predicted well when your
> integers are all of similar magnitude?
Yep, that's generally correct. The branch prediction seems to be harmless either way -- interleaving values on either side of the branch doesn't seem to affect throughput much on my machine (results may vary).
> I never fully groked the Zipf distribution but am I correct to conclude
> the worst case for vu128 would be alternating integers on either side of
> the branch condition?
It depends on whether you're measuring throughput by values/sec or bytes/sec.
In terms of bytes/sec, the worst case is a set of integers exclusively in the range [0xF0, 0xFF] -- big enough to skip the fast path, small enough that each byte has to go through the full bit-masking.
In terms of values/sec, the worst is really big integers because writing 8 bytes (or 16, for `encode_u128`) to a buffer takes longer than writing one byte.
However, regardless of measurement, the worst case for vu128 is faster than VLQ/LEB128 -- except for distributions containing only [0x00, 0x7F] in which case they're all equivalent.
---
Also, after some comments on Lobsters about cases where space efficiency in the u16 range really matters, I'm experimenting with a version that tries to match or exceed VLQ/LEB128 for size efficiency at all bit lengths.
It has three more branches, but in practice this hasn't turned out to matter much -- it seems that indefinite loops are slower than a fixed set of branches, so still significantly faster than VLQ/LEB128 on most sets of integers.
// like LEB128 but continuation bits are packed into the
// leading byte -- sometimes called "prefixed varint"
0xxxxxxx // 7 bits of payload
10xxxxxx xxxxxxxx // 14 bits of payload
110xxxxx xxxxxxxx xxxxxxxx // 21 bits payload
1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx // 28 bits payload
// For payloads > 28 bits, use (0xF0 | length) type prefix,
// same as original post.
1111____ xxxxxxxx xxxxxxxx [...]
I never fully groked the Zipf distribution but am I correct to conclude the worst case for vu128 would be alternating integers on either side of the branch condition?
Thanks for the link to the slides, also clever!