Hacker Newsnew | past | comments | ask | show | jobs | submit | da_chicken's commentslogin

Nah, it's much easier than that.

The total amount of computer data across all of humanity is less that 1 yottabyte. We're expected to reach 1 yottabyte within the next decade, and will probably do so before 2030. That's all data, everywhere, including nation-states.

The birthday paradox says that you'll reach a 50% chance of at least one collision (as a conservative first order approximation) at the square root of the domain size. sqrt(2^256) is 2^128.

Now, a 256 bit identifier takes up 32 bytes of storage. 2^128 * 32 bytes = 10^16 yottabytes. That's 10 quadrillion yottabytes just to store the keys. And it's even odds whether you'll have a collision or not.

And if the 50% number scares them, well, you'll have a 1% chance of a collision at around... 2^128 * 0.1. Yeah, so you don't reach a 1% over the whole life of the system until you get to a quadrillion yottabytes.

Because you're never getting anywhere near the square root of the size, the chances of any collision occurring are flatly astronomical.


That's quite expensive. Most systems that need this sort of data will instead implement some form of audit log or audit table. Which is still quite expensive.

At the record level, I've seldom seen more than an add timestamp, and add user id, a last change timestamp, and a last change user id. Even then, it covers any change to the whole row, not every field. It's still relatively expensive.


> Which is still quite expensive.

OTOH, if they managed to do that in an efficient way, they have something really interesting.


Writing to disk is never free.

True, but you do it with blocks that often contain padding. If you can make the padding useful, that's a win.

Yeah, this seemed like a very long way to say, "Our RDBMS has system catalogs," as if it's 1987.

But then, they're also doing JOINs with the USING clause, which seems like one of those things that everybody tries... until they hit one of the several reasons not to use them, and then they go back to the ON clause which is explicit and concrete and works great in all cases.

Personally, I'd like to hear more about the claims made about Snowflake IDs.


> doing JOINs with the USING clause

I'm ashamed to say that despite using SQL from the late 1980s, and as someone that likes reading manuals and text books, I'd never come across USING. Probably a bit late for me now to use it (or not) :-(


I didn't really write USING in anger until around 10 years ago, and I have been around a long time too.

Not all databases support it. But once you start using it (pun) - a lot of naming conventions snap into place.

It has some funky semantics you should be aware of. Consider this:

  CREATE TABLE foo (x INT);

  CREATE TABLE bar (x INT);

  SELECT \* FROM foo JOIN bar USING (x);


There is only one `x` in the above `SELECT *` - the automatically disambiguated one. Which is typically want you want.

I've used SQL for around a decade and also never came across it. I'm maintaining SQL code with hundreds if not thousands of basic primary key joins and this could make those queries way more concise. Now I want to know the reasons for not using USING!

There are reasons for not USING.

First, you need to be aware of the implicit disambiguration. When you join with USING, you are introducing a hidden column that represents both sides. This is typically what you want - but it can bite you.

Consider this PostgreSQL example:

  CREATE TABLE foo (x INT);
  INSERT INTO foo VALUeS (1);

  CREATE TABLE bar (x FLOAT);
  INSERT INTO bar VALUES (1);

  SELECT pg_typeof(x) FROM foo JOIN bar USING (x);

The type of x is is double, - because x was implicitly upcast as we can see with EXPLAIN:

  Merge Join  (cost=338.29..931.54 rows=28815 width=4)
    Merge Cond: (bar.x = ((foo.x)::double precision))

Arguably, you should never be joining on keys of different types. It just bad design. But you don't always get that choice if someone else made the data model for you.

It also means that this actually works:

  CREATE TABLE foo (x INT);
  INSERT INTO foo VALUeS (1);

  CREATE TABLE bar (x INT);
  INSERT INTO bar VALUES (1);

  CREATE TABLE baz (x INT);
  INSERT INTO baz VALUES (1);

  SELECT \*
  FROM foo
  JOIN bar USING (x)
  JOIN baz USING (x);

Which might not be what you expected :-)

If you are both the data modeller and the query writer - I have not been able to come up with a reason for not USING.


Thanks for the reply. The use case I have in mind is joining onto an INT primary key using a foreign key column of another table. This alone would remove a massive amount of boilerplate code.

@da_chicken: You can read more about Snowflake ID in the Wiki page linked in the article.

The short story:

They are bit like UUID in that you can generate them across a system in a distributed way without coordination. Unlike UUID they are only 64-bit.

The first bits of the snowflake ID are structured in such a way that the values end up roughly sequentially ordered on disk. That makes them great for large tables where you need to locate specific values (such a those that store query information).


No, that's like calling Amazon a social media platform.

YouTube is a content delivery platform that has social media features. You can tell because if you shut off all the comments, people still visit the site in droves. But if you shut off the videos and left the comments then nobody would visit the site at all.

Now, it's possible that YouTube doesn't realize that, but I think they're just unwilling to make any changes at all if it doesn't give them any competitive advantages.


YouTube used to have direct messages until 2019.

Reimplemented in Nov 2025 for Ireland and Poland.


Yep. Gabe was right when he said it and he's still right now. Valve knows the product is service. This is why Epic Games Store and Microsoft Store have such a hard time. Good games come and go, but good service is good service.

And now, Valve is pushing to leave Windows, because they see which way the wind blows in Redmond. They don't want to be leashed to Microsoft in 2026 anymore than Microsoft wanted to be leashed to IBM in 1986.


Yeah, that already exists.

Protocol multiplexing/demultiplexing is a feature of software like sslh, nginx, and HAProxy exist, and they don't need to listen on multiple ports to speak multiple protocols or connect multiple services. Many advanced reverse proxies can do this with stream sniffing of some flavor.

People already do actually run everything through port 443 simultaneously.


Protocol multiplexing exists. But you will have to agree on a single protocol, which I view as impossible since different applications have different requirements.

If you route all your traffic through https that comes with all the upsides, for example the security layer (ssl). But also the downsides of for example overhead of headers. Currently we have an overarching (network layer) protocol, it is called IP. It divides the traffic into different ports at the host, these ports speak different protocols. If you move the multiplexing higher up the OSI stack, you are violating the principles of separation and making your stack less flexible, you are mixing OSI layers 4: transport up to 6: Presentation. Conflicting these layers can lead to big problems as this includes things like the Transport layer, for example the difference between udp/tcp is included there.

The beauty of the network stack is that there are certain layers that separate responsibility. This allows the stack to apply to wildly different scenarios. However I do agree that there should be no filtering applied on behalf of the customers.


> Protocol multiplexing exists. But you will have to agree on a single protocol

I may be misunderstanding your message here, but the requirement to agree on a single protocol isn't true when you're using multiplexing. I think you're confusing tunneling with multiplexing.

With multiplexing, you have multiple protocols listening on a single port. The multiplexer server sniffs the first few bytes of what the client sends to determine what protocol is being used, then decides which back-end to forward the connection to.

Neither the client nor the final back-end need to be aware that multiplexing is happening, and likely aren't.

Through this, you can use both HTTPS and Telnet on port 443 without the Telnet client needing to have any changes done.


Boy, wait until I tell you what happened with http!

That's my question. Why is there infrastructure that has open access to port 23 on the Internet. That shouldn't be a problem that the service provider has to solve, but it should absolutely be illegal for whomever is in charge of managing the service or providing equipment to the people managing the service. That is like selling a car without seatbelts.

We are beyond the point where not putting infrastructure equipment behind a firewall should result in a fine. It's beyond the point that this is negligence.


I've lived in Michigan for about the same length of time, and even with the terrible service our current power companies are providing the only time I've lost power for more than a few minutes during the winter has been after an ice storm.

Edit has not shipped with Windows since Win9x -- which is to say, it hasn't shipped since Windows required DOS -- and what you linked is an homage project, not the same program.

And you know what else was included in versions of MS-DOS that shipped with Win9x? Edlin. The editor so basic most people can't figure out how to use it.


Edit, what I linked, is shipping with Windows right now. This HN thread is about the current version of Windows. We are not talking about Windows XP.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: