Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think NoSQL people usually claim that SQL isn't scalable, just that it's unnecessarily complicated to scale.

You generally have to partition your data horizontally and thus give up many of the features that SQL has to offer: ACID transactions, unique keys, auto-increment primary keys, etc.

Then you have to come up with your own solutions to replace those features: eventual consistency, UUID keys, map/reduce, etc. And these happen to be exactly the kind of features that many NoSQL databases can give you out-of-the-box.



You generally have to partition your data horizontally and thus give up many of the features that SQL has to offer

There are plenty of databases that will partition data without giving up any SQL features, but they cost money.


> There are plenty of databases that will partition data without giving up any SQL features, but they cost money.

They also either rely on a single huge SAN for storage (single point of failure + expensive as hell) like Oracle RAC, or they require specialized gear like infiniband to reduce intra-node latency like Exadata (starting price: seven figures) or they're analytics databases that are designed for huge queries with latencies to match like Vertica, ParAccel, etc. (Think minutes between data being loaded and being available to query.)

I'll take NoSQL, thanks.


Afaik Exadata wasn't even originally meant for OLTP. Seems like another case of a high-latency analytics/warehousing system being marketed as a "distributed database". They're now claiming that they can get OLTP grade performance with SSDs on Exadata, but I don't buy it. The promotion of Exadata is ironic, given I remember one of their engineers claiming (on his personal weblog) about impossibility of OLTP on top of shared nothing not too long ago.

As for the high-latency analytics databases (Vertica, Greenplum et al), I don't see much market for them either. Their big advantage over Hadoop was claimed to be the ability of non-programmer analysts to use them (via SQL), but Hive (which now even has JDBC drivers for it, allowing it to work with existing OLAP tools) solves that problem as well.


Would would the need for "specialized gear like infiniband to reduce intra-node latency" be limited to parallel databases? (I assume you mean "inter-node"...)


This whole discussion is about parallel databases since that's the only way to scale beyond the performance of one machine.


Well, replace "parallel databases" with your favorite term for the parallel databases that fall outside NoSQL (VoltDB, Exadata, shared MySQL, etc). My point being that the alleged need for high-speed interconnects is orthogonal to SQL vs. NoSQL.


But it's not. Because SQL databases (strictly speaking any requiring strong consistency... which is mostly RDBMSes) are highly latency sensitive, where NoSQL databases like Cassandra design around that by saying "hey, you could not see the most recent write for a few ms, unless you request a higher consistency level." And most apps are fine with that. As a bonus you get multi-datacenter replication with basically the same code, another place most RDBMSes are weak.

It's a classic design hack -- redefining your goal as an easier problem.


Oh, I see what you're saying. Yes, the interconnect is orthogonal (although you could argue that strong consistency requires more complex protocols like 2PC so interconnect latency becomes critical).


Consistency, Availability, Partition tolerance. NoSQL stores usually sacrifice consistency, and instead settle for eventual consistency. SQL (i.e. RDBMS) stores, with their usual emphasis on transactions, must necessarily sacrifice something else. SQL that doesn't hew to a hard line on consistency and transactions doesn't really have all the features of SQL. This is the distinction that matters most, IMHO, in the NoSQL strand of thinking.


I admit that the SQL vendors (besides MySQL) made a mistake by putting ACID above scalability; that's clearly not always the right choice. However, CAP still allows a SQL database that is scalable, consistent, and available.


No, that's exactly what CAP doesn't allow. Unless by scalable you mean non-horizontal scaling. In which case yes, but we already knew that big machines make things fast.


> CAP still allows a SQL database that is scalable, consistent, and available

Name one please. It seems you are either fundamentally mistaken about what CAP implies or are constraining the "solution" to a clustered system that is effectively a single RDBMS hiding behind lots of tightly-coupled components.


A tightly-coupled (whatever that means) cluster sounds like a perfectly legitimate way to scale to me.


And what if that datacentre goes down? And what if you want reasonable (<50ms) latencies in different parts of the world?


Except for postgresql.


It certainly seems that the NoSQL movement is largely fueled by two facts: 1. MySQL sucks 2. Oracle is expensive

I'd love to see Postgresql get more attention, as I feel that they have scaling up and scaling out handled fairly well, whereas MySQL/InnoDB has a hard time even scaling up (which is why the Drizzle project even exists).


Does Postgres partition across a cluster? That's what we're talking about here.


Yes. This is what Skype does.


I'm not even so sure that they are unnecessarily complicated to scale, or that you should need or have to replace the features you list. I am sure, however, that everything you list ends up needing to happen if the solution chosen doesn't apply well to the problem at hand.

When you really dig deep down into each and every article on this subject, whether for or against NOSQL, the most important (and yet unstated) fact is this:

It isn't that RDBMS systems scale or don't scale, or that NOSQL systems scale or don't scale, it's that any solution which prioritizes (just for example) consistency and availability is not going to scale effectively for a problem that instead prioritizes availability and partition tolerance.

I am willing to bet that any time a problem and a given solution don't align on the two attributes they've respectively prioritized from CAP, there will be a claim that the solution "doesn't scale". The reality is just that the solution wasn't applicable to the problem at hand. If instead one evaluates solutions which match the problem's CAP priorities, the solutions will scale effectively (modulo their individual pros and cons relative to the other options within the evaluated CAP-priority-matching set of possible solutions, of course.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: