I guess you assume to build one timeflake per ms for one year. That's about 3e13 timeflakes. Ordinary UUIDs have a collisions at the order of 1e18 hashes (cf. https://en.wikipedia.org/wiki/Universally_unique_identifier#...). So we are "only" 5 orders of magnitude away. For me the collision quality feels the same.
If you assume one per ms then timeflake and UUIDv1 both have a 0% chance of collision as they encode the timestamp.
You might go above this rate some of the time though and we might as well strive for perfection.
One of the nice things about UUIDs is that you can just merge data together willy-nilly without worrying about collisions - for example, if you gets bought by Google. You can't really do this with timeflake.
>> If you assume one per ms then timeflake and UUIDv1 both have a 0% chance of collision as they encode the timestamp.
They can be generated anywhere at any time. It two machines are generating even 1 per hour, they will have collisions if their clocks are synchronized. The random bits are important.
You'd run into a bigger problem than collisions a lot sooner (IMHO): time ordering. "ms" precision is very misleading, because keeping clocks really in sync while processing this much data is a real challenge.
Sure, maybe it doesn't matter that one server/emitter and thus a bunch of IDs are out of order, but if this is really just for helping indexes, then a simple 0.1 second precision would make more sense. (Or even simply truncating the timestamp to however many bits to get ~2-30 sec precision, and using a data store with LevelDB-like staged compaction. Which is virtually a requirement at this scale.)
Which is a pretty safe assumption, considering just generating
those 128bits 50 million times a millisecond results in over 20
million TB of data in a year. I think it's safe to say this is
not a common scenario.
There are close to 32 billion millisecond in a year. If the chance of a collision happening within a millisecond is 1 in a billion then my intuition is that the probability of a collision happening at least once a year is quite high, indeed.
It's like a dice: You have 1 in 6 chance of getting a 6 every time your roll the dice. Now, if you roll the dice 100 times the chance that you'll get a 6 at least once is then pretty high.
They correctly state you can generate 50 million of them per millisecond with only a one in a billion chance of a collision (per millisecond).
Unfortunately this means the chance of a collision in a year is 99.999999999998% which isn't great!
(I think they've given too many bits to the timestamp.)
Also, the uuid format it creates isn't a valid uuid which is a pity.
If you want to have "only" a one in a billion chance of a collision a year, you can only generate about 300 a millisecond.