Funny how that was one of the first few things I had to teach myself, as distrib...

Funny how that was one of the first few things I had to teach myself, as distributed systems was something that only the biggest businesses had when I started school, and so, the curriculum didn't exist yet.

Hadoop is not exactly the best example of distributed, but it does contain all the core components. If you want a highly efficient, distributed system, then I suggest one tries to write their own. This ground is still being tested.

Sockets and asyncrony are tricky things, and I'm sure there exist better ways of achieving distributed computing.

1) Compute-intensive, job-centric? 2) Compute-intensive, parallel reductions? 3) Database-intensive, map-reduce? 4) Database-intensive, sharded, non-normalized? ...

The various forms of distributed systems is something that many people don't fully grasp. It's rather easy to build your own #1 or #3 (hadoop). Facebook has done an alright job at #4. Parallel reductions on distributed systems... I'm thinking million factor by billion-row matricies. That is something that we have yet to explore. Sure, we've done thousand-factor by billion row no problem. That's essentially a map-reduce. But doing the matrix reductions on 1e6 by 1e9+ is not something typically done. ... at all. One would need to find alternate ways of representing those 1e6 features as separate matrices... perhaps some form of Bernoulli/Bayes combination and increase the number of operations by 1000-fold.

// Forgive me for the rant. This is something I do like to think of in my spare time. You're right in that self-taught's don't have this skill. My value-add is that a lot of school-educated don't possess this skill either.