yeah, it doesn't make a lot of sense.. Guess the only valid point is 3: "*Not no...

PaulHoule · on May 26, 2011

The Map/Reduce paradigm was being talked about in papers about functional programming languages in the early 1980's.

There really are cases where a "full scan" is the fastest way to do something, and, when it works, sequential I/O can be orders of magnitude faster than the random access I/O used when you've got indexes -- particularly if you have to create the index in order to do your job. I've written systems that process hundreds of millions of facts, and I can do a "full scan" of these in 20 minutes on an ordinary desktop computer whereas it takes about 4 days to load these into an index in mysql or an RDF database.

Now we know that it's possible to parallelize SQL databases quite a bit, and commercial products are there, which leaves two questions for extra credit: (i) why do the "cool kids" completely ignore these commercial products, and (ii) why are there no Open Source projects in this direction?

lurker19 · on May 26, 2011

(i) because free is more accessible than expensive. (ii) because it is expensive.

skorgu · on May 26, 2011

(iii) because when it breaks you're at the mercy of the vendor to fix it.

sigil · on May 26, 2011

> Guess the only valid point is 3: "Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago" Probably true..

Unfortunately, the USPTO does not agree. MapReduce was patented in 2010.

http://www.google.com/patents?id=upHLAAAAEBAJ