"Such a platform can yield very satisfactory performance for tens or hundreds of...

donw · on March 3, 2010

I don't think the original poster was making the argument that Twitter should run fine on a SQL database; in fact, I think he seemed to indicate the opposite. Namely, that large, nominally non-relational datasets that can afford to lose a little data here and there, or at the very least just take awhile to save it, are really what you need for serving up a big, fresh pile o' Social Networking.

So, bringing up Twitter or Facebook really doesn't make a good case against RDBMSes as a good tool in the toolbox -- they've got a very unique set of needs that don't apply to a lot of rest of the world. So, of course, SQL isn't the best solution when you're dealing with trillions of rows of data, and don't really want to spend hundreds of millions of dollars on the infrastructure required to guarantee that you never go down, and never lose a tweet.

And keep in mind, RDMBS helped them get to point where they could enjoy these problems; Twitter probably wouldn't exist in all its current glory if they spent a year building it to be 'scalable' before launching.

I think the reason that a lot of people end up hating RDBMS and SQL is because of one-or-more of (a) their only experience is with MySQL, which really isn't that awesome; (b) they've been burned by bad schema design; or, (c) they don't really get relational algebra or set theory.

For an example of 'bad schema design', I once worked at a company that had indices on nearly every column of their DB, even though almost none of these ever got queried. There was one database table with five indices on three columns, and of course this was the table that logged every single HTTP request processed by the front end. Including API calls. Did I mention that this table was never queried by any part of the application?

It was a poor design decision, and sure enough, it completely torpedoed performance. But the problem wasn't the RDBMS, because it did exactly what it was told to do, no matter how asinine.

So, in short, RDBMS aren't the solution to all problems, but they do solve a lot of problems adequately. NoSQL databases also serve an important role in the toolbox, but are much more narrowly-focused.

halostatue · on March 3, 2010

I'd also suggest that even in a social networking space, there's multiple types of data and some of it should be stored in an ACID environment (which probably means SQL). You may not care if you can't save a few thousand (or million) status updates to your data store immediately, but you need to care a lot more about your customer profile data (e.g., account, password, etc.) and changes there should be as ACID as possible. Your advertising or subscriber billing models should probably be ACID-backed as well. In both of these cases, that probably means SQL.

wooster · on March 3, 2010

You make some good points.

However, the original author's point basically boiled down to: if you define scalability as the problems you can scale an RDBMS to solve, RDBMS systems are scalable. I'm not big on arguing the finer points of someone's tautology.

The particulars of a situation determine the scalability of a solution. For a lot of us working at web scale or on interesting new problems, an RDBMS won't scale. Sometimes it won't scale within the constraints we have, but sometimes it won't scale because we won't be able to build the system we're trying to build. His example of a company-internal billing system really only served to highlight the disconnect between the crowd following along well-trod ground, and the people out front doing innovative work.

ibsulon · on March 3, 2010

No. The author provides counterexamples to the argument, "RDBMSes don't scale."

As for the "innovative work" versus "well-trod ground," there are still businesses who need better, more innovative solutions to well-trod problems, and I, for one, am not willing to ignore money on the table. The problem I'm working on works well with a combination RDBMS/key-value system for different pieces of the puzzle.

lucifer · on March 3, 2010

"[SQL deal for when] "Data consistency and reliability is a primary concern"."

I'm curious: lets say you have a tweeter scale app that must satisfy the consistency and reliability as a primary requirement. Is there really a NoSQL solution that can take you there without (effectively) raising the costs to the point that a scalable (money not an object) SQL solution would provide? (Kinda like how the difficult to extract North Sea oil became economically viable once oil prices crashed through a certain ceiling?)

wmf · on March 3, 2010

The essence of NoSQL is that it gets its scalability by giving up consistency and reliability. Trying to run NYSE or Visa on NoSQL is pointless.

evgen · on March 3, 2010

Well, "reliability" can be sliced in a couple of different ways since that term can cover both the A & P in the CAP options and it can also mean the elimination of single points of failure and an architecture that can degrade gracefully when components fail. Some NoSQL systems let you select the mix of consistency and reliability you need at a rather fine-grained level -- one thing that does distinguish these systems from the traditional RDBMS is that you are almost never in an all or nothing situation regarding any particular part of the data space unless you explicitly want to create that choice to enable other options.

rbranson · on March 3, 2010

NoSQL gets scalability by giving up a huge feature set, and, yes, cross-object consistency. I question the lower reliability argument though. Dynamo, one of the original NoSQLs, allows each instance to specify how reliable they want writes to be. Most of the NoSQLs allow you to tune the reliability factor, just as MySQL and PostgreSQL allow you to do.

wvenable · on March 3, 2010

The disconnect here is that only a tiny percentage of sites on the Internet have the requirements of Twitter and Facebook. They are outliers. Using them as an example here is ridiculous.

jdminhbg · on March 3, 2010

You don't need to have Twitter's userbase to have Twitter's data problem. Look at someone like FlightCaster -- they're using a NoSQL database to handle a huge amount of data to be useful to even their first user.

With the advent of cheap, reliable, available commodity hardware and network access, previously difficult data problems are solvable. SQL doesn't make sense for all of those problems.

wooster · on March 3, 2010

Those are the problems I'm dealing with at the moment, so they're the ones that came to mind.

stuhood · on March 3, 2010

This is a perfect response: thank you.

Also, the definition of scalability seems fairly clear: how many times can you square your capacity before you need to re-architect?