Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If X and Y are two independent random variables, X representing skill and Y representing self-assessment of skill, X and Y will be negatively correlated

1. As mentioned in another comment, X and Y can't be independent and correlated at the same time

2. The point of the article is to show that you can replicate the results of the DK paper starting from purely random data. In the article X and Y don't mean anything, they're just random variables that the author draws samples from. The fact that you can get DK results from this very strongly suggests that DK is just an artifact of statistics, not an actual result.

> What do you mean by "the plot of y-x ~ x will always look that way" - what way?

See Figure 8 in the article, also panel B in Figure 10.

> The shape of the plot will necessarily depend on the relationship between x and y.

What I meant to say is that since X and Y are simulated data, not actual observations, the shape in Figure 8 will not actually depend on any possible relationship between ability and self-assessment. It's just a statistical artifact.

Figure 9 is based on this simulated data as well, and since it closely replicates Figure 2 there's good reason to believe that Figure 2 itself is actually just a statistical artifact and that the DK data don't actually show the purported correlation.

This point is further strengthened by referring to a few papers and by showing some corrected results in Figure 11.



> 1. As mentioned in another comment, X and Y can't be independent and correlated at the same time

And as I replied to that comment, "sorry, my bad - Y - X and X will be negatively correlated."

> 2. The point of the article is to show that you can replicate the results of the DK paper starting from purely random data

You (and many others) are using "purely random data" as if it's always the null hypothesis and using it to cast doubt on the results. But assuming as the null model 0 correlation between skill and self-assessment of that skill makes no sense to me, and is in fact more extreme than the claim DK is making. So in other words, sure, if you assume something more extreme than the claim and generate data based on this assumption, you'll get the same effect and more extreme.


The hypothesis and distribution of data are irrelevant. The artifact comes from x being on both sides of the regression equation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: