If You Liked This, Sure to Love That

quan · on Nov 21, 2008

I worked on the Netflix Challenge last year but did not come far and gave up after submitting a few mediocre results. Actually, I was proud of myself for my result did not explode but was within the range of the Cinematch's algorithm. The experience has given me a lot of respect for companies that are developing recommendation algorithms.

There should be more great competition like this. At this point, I'd say the benefits coming from all the research and development by the Netflix Challenge community as well as the experience obtained by many hobbyists like myself as the result of this competition has already exceeded the $1 million winning prize.

randomwalker · on Nov 22, 2008

I saw an informal study on the economics of prizes. Apparently, the monetary value of the effort that is put into solving the, the media publicity, etc. added together exceeds the value of the prize itself by anywhere between a factor of 10 and 50, depending on the prize.

staunch · on Nov 22, 2008

Sounds like a very informal study.

jcdreads · on Nov 22, 2008

I'm a fan of this old post (probably linked from Hacker News, actually, but I can't really remember)...

http://whimsley.typepad.com/whimsley/2007/07/the-limitations...

...that asks exactly how much better the experience will be for a particular customer if all this research results in a recommendation engine that's really 10% better than Cinematch. Undoubtedly the marketing value (to Netflix) of the challenge is incredible, and I don't question the reasonableness of their desire to eke satisfaction out of every potentially satisfied customer; but surely there's somewhere else in their business they could more easily improve their margin and their customers' experience.

I, personally, use Netflix a bunch, and love the service. Furthermore, I love the fact that they sponsored this contest and provided a large research database to support it. I guess I'm just always a bit disappointed by breathless mainstream press coverage that doesn't discuss these other meta-contest questions. That and I wish I could somehow get Netflix to send me season 4 of Lost before season 5 starts.

DaniFong · on Nov 21, 2008

That's hilarious. The list of movies that are hard to classify reads like a list of my favorites...

[Like] “Napoleon Dynamite” — culturally or politically polarizing and hard to classify, including “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit 9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and “Sideways.”

ChaitanyaSai · on Nov 22, 2008

My similarity explorer would have picked some of those out for you without much effort! :)

http://gflix.appspot.com/netflix/3281

http://gflix.appspot.com/netflix/14273

I wasn't even using the algorithm ensemble that helped me get much further on the leaderboard.

Here's more about the app : http://www.discerniblepreferences.com/2008/10/netflix-movie-...

jfornear · on Nov 22, 2008

I would classify those all as Indie mainstream, stuff that is popular with the scenester, psuedo-intellectual, pretentious crowd.

GHFigs · on Nov 22, 2008

I would classify you as a pretentious psuedo-intellectual scenester based on the fact that you've just demonstrated mastery of that demographic's primary modus operandi.

plaggypig · on Nov 22, 2008

So then, if only Netflix had a checkbox on its registration form:

[ ] Yes, I am a pretentious psuedo-intellectual scenester!

Perhaps you've discovered the most important demographic unit of the century? I guess then, the greater problem would be to glean which people answered the question sarcastically.. damn.

olavk · on Nov 22, 2008

No, it must be more complicated than that. If they were just consistently popular with a specific demographic, it would be easy to predict - the problem seem to be that even if you like Lost in Translation, you might still hate Kill Bill, or Fahrenheit 9/11.

kirse · on Nov 22, 2008

After signing up for Netflix last month, I realized that I am one of those users that the algorithm writer's probably hate. I usually vote things "1" or "5", because a movie I did not like meant I just wasted 2 hours of my time. On the other hand, a movie I did like was an enjoyable and relaxing 2 hours.

So I rate everything to the polar ends, and also tend to have pretty varied choices for what movies I like. Usually the only genre I avoid is Horror/Psycho/Chainsaw Death films.

mixmax · on Nov 22, 2008

I loved kill bill, but left the cinema when seeing lost in translation because it was so terrible.

So it is more complicated than that...

zupatol · on Nov 22, 2008

I try hard to avoid judging people by their tastes.

I try to judge films by how I react to them, but this is difficult to do if at the same time I'm asking myself how others will react to my opinion of the film.

There are films I don't want to see and books I don't want to read because I heard statements like yours which I know would cloud my judgment and spoil my enjoyment.

Luckily, I already saw and liked enough of the films you dismiss to be able to safely ignore your condemnation of those I haven't seen yet.

dangoldin · on Nov 22, 2008

If anyone is interested in some Singular Value Decomposition type of work - take a look at Principal Component Analysis.

http://en.wikipedia.org/wiki/Principal_components_analysis

ctkrohn · on Nov 22, 2008

PCA is an incredibly useful technique. At work we've been using it to model the structure of the yield curve, i.e. the graph of interest rates vs. maturity. Turns out you can decompose most daily movements of the yield curve into three components: parallel shift up/down, steepening/flattening, and a "bow" where 2s5s flattens, 5s10s steepens, and 10s30s flattens.. It would be interesting to build an interest rate model that evolves these three components forward in time... it would probably be most useful for short time scales where the principal components are unlikely to change.

kurtosis · on Nov 22, 2008

unrelated question: I've been recently been reading a lot about wavelets and multiscale analysis. My application area is in text processing and topic models for legal document analysis. Wavelet transforms or statistical modeling in the wavelet domain seems like the kind of thing that would have been tried many times over in finance. Do you know of any instances when it turns out to be useful useful for time series?

ctkrohn · on Nov 22, 2008

I've heard people talk about it, but never seen any concrete applications to finance. If you know of any papers or introductory material, I'd be thrilled to check it out. I know nothing about wavelets or multiscale analysis -- I couldn't even define them if you asked -- but I have a decent math and statistics background so I'd love to take a look.

jaytee_clone · on Nov 24, 2008

My first thought was - the rating system itself creates a ceiling to the accuracy of the prediction. And it seems that every one knows that. (Netflix or the 30,000 hackers.)

Then why spend so much resource to improve 10% of the existing rating system instead of experiment with new kinds of rating system. (I'm sure someone can come up with something clever yet simple.) Yes it's costly to change the infrastructure. But if you don't do it, some startup will come out and beat them to it.

The worst part is that Netflix is paying people to think inside-of-the-box (the rating system).

I read about Bertoni a while ago and was inspired by his out-of-the-box approach (Behavioral Economics). Wouldn't that give Netflix a hint - "Yo Netflix, here's this dude who's getting the fastest-growing result by extracting more qualitative information out of the quantitative rating system. Maybe you should just design a new rating system that better orgnizes these qualitative information? Just Maybe."

Or maybe I'm missing the point here?

FiReaNG3L · on Nov 22, 2008

I really wish there was an open source incremental SVD-based library to to collaborative filtering; the algorithm is known, we know its pretty efficient, scalable and high performance. Sure there is Mahout (http://lucene.apache.org/mahout/) but development seems pretty slow, if there is still dev work at all on this.

fizx · on Nov 22, 2008

Lingpipe has a semi-open license, and a Java implementation of incremental SVD. The original incremental SVD code is open: http://www.timelydevelopment.com/demos/NetflixPrize.aspx

BTW, SVD is not scalable. I define scalable algorithms to have time and space complexity O(n log(n)) or less. SVD is generally O(n*n)+, requiring matrix multiplication. You'll have to shard your computations in order to scale indefinitely.

GavinB · on Nov 22, 2008

I assume it would be cheating to simply never recommend Napoleon Dynamite and the other controversial movies?

mynameishere · on Nov 21, 2008

If those teams got together, hacked up a program that ran 2 or more independant systems (entries) at the same time and averaged the results, they'd probably get to 10 percent.

aston · on Nov 21, 2008

There are lots of teams on the leaderboard that are actually metateams. My favorite is "When Gravity and Dinosaurs Unite."

http://www.netflixprize.com/leaderboard

They actually do one better than your suggestion, which is that they use machine learning to figure out how to weight one team's results vs. the other.

aswanson · on Nov 22, 2008

Yeah, what he said:

http://www.onderzoekinformatie.nl/en/oi/nod/onderzoek/OND127...

dmix · on Nov 21, 2008

I prefer the capitalist approach.

sharkfish · on Nov 22, 2008

The vast amount of data and the prediction task remind me a lot of stock market prediction attempts.

I enjoyed Pi, by the way.