Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd worry a bit about the chance of collisions.

They correctly state you can generate 50 million of them per millisecond with only a one in a billion chance of a collision (per millisecond).

Unfortunately this means the chance of a collision in a year is 99.999999999998% which isn't great!

(I think they've given too many bits to the timestamp.)

Also, the uuid format it creates isn't a valid uuid which is a pity.

If you want to have "only" a one in a billion chance of a collision a year, you can only generate about 300 a millisecond.



Look for a different solution if you need 1 quintillion IDs per year. Noted.


I guess you assume to build one timeflake per ms for one year. That's about 3e13 timeflakes. Ordinary UUIDs have a collisions at the order of 1e18 hashes (cf. https://en.wikipedia.org/wiki/Universally_unique_identifier#...). So we are "only" 5 orders of magnitude away. For me the collision quality feels the same.


If you assume one per ms then timeflake and UUIDv1 both have a 0% chance of collision as they encode the timestamp.

You might go above this rate some of the time though and we might as well strive for perfection.

One of the nice things about UUIDs is that you can just merge data together willy-nilly without worrying about collisions - for example, if you gets bought by Google. You can't really do this with timeflake.


>> If you assume one per ms then timeflake and UUIDv1 both have a 0% chance of collision as they encode the timestamp.

They can be generated anywhere at any time. It two machines are generating even 1 per hour, they will have collisions if their clocks are synchronized. The random bits are important.


You'd run into a bigger problem than collisions a lot sooner (IMHO): time ordering. "ms" precision is very misleading, because keeping clocks really in sync while processing this much data is a real challenge.

Sure, maybe it doesn't matter that one server/emitter and thus a bunch of IDs are out of order, but if this is really just for helping indexes, then a simple 0.1 second precision would make more sense. (Or even simply truncating the timestamp to however many bits to get ~2-30 sec precision, and using a data store with LevelDB-like staged compaction. Which is virtually a requirement at this scale.)


> Unfortunately this means the chance of a collision in a year is 99.999999999998% which isn't great!

How was this computed? My intuition begs to differ.


Probability of no collision in a single millisecond: p_1 = 999999999/1000000000

Milliseconds in a year: t = 1000 * 60 * 60 * 24 * 365

Probability of no collision happening for every millisecond of a year: p_2 = p_1^t

Probability of a collision happening for at least one millisecond of a year: p_3 = 1 - p_2

https://www.wolframalpha.com/input/?i=1+-+%28999999999%2F100...


There are 31,536,000,000 milliseconds in a year. If you do (1-1E-9)^3.1536E10, you get a 2E-14 chance of no collision in a year.


I think the presumption is you will not be generating 50 million of them each millisecond.


Which is a pretty safe assumption, considering just generating those 128bits 50 million times a millisecond results in over 20 million TB of data in a year. I think it's safe to say this is not a common scenario.


There are close to 32 billion millisecond in a year. If the chance of a collision happening within a millisecond is 1 in a billion then my intuition is that the probability of a collision happening at least once a year is quite high, indeed.

It's like a dice: You have 1 in 6 chance of getting a 6 every time your roll the dice. Now, if you roll the dice 100 times the chance that you'll get a 6 at least once is then pretty high.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: