> The fact that the statistical artifact is seen in completely uncorrelated data is only shown as a demonstration that it is not itself evidence of the claimed effect
I don't understand this part. "Completely uncorrelated data" is usually taken to represent the null hypothesis, but that's not the case here. In the DK paper, the implicit null hypothesis is "people of all skill levels are good at estimating their performance". In this case the "completely uncorrelated data" matches an alternative hypothesis, "people's skills have nothing to do with their ability to estimate their performance in tasks testing that skill". This hypothesis doesn't outright contradict the DK proposed hypothesis (and is certainly not the DK null hypothesis), so getting similar results is unsurprising to me, and I'm not sure that we learn from it anything about the DK results.
As for the other study cited, the figure shown in the article doesn't give a lot of information on density, and looking at the paper itself, figure 4 does actually seem to show that self-assessment gradually shifts left with increasing level of education.
The null hypothesis for Dunning-Kruger isn't "people of all skill levels are good at estimating their performance" it's "people of all skill levels have equal bias" in estimating their performance" (remember that Dunning-Kruger isn't that lower skilled people are bad at estimating their skill, it's that they systematically overestimate their skill). The randomly generated data used is one example of that, albeit an unrealistic one: a world in which all people are completely incapable of estimating their performance at all. In this case all of them have no bias at all in their (totally random) estimates. The fact that the artifact is seen in such data is a powerful demonstration that it is not evidence of the Dunning-Kruger effect.
Re the other experiment, as I say it's just one study and I haven't looked deeply into it or others (and nor do I have a position on whether there is a real effect of this nature, or any great interest in it). The point is the article isn't claiming their randomly generated data example is evidence against the Dunning-Kruger effect itself. That needs further experiments such as the one they showed. The random data example is a demonstration that the original paper's analysis is flawed and doesn't support its conclusions.
> The null hypothesis for Dunning-Kruger isn't "people of all skill levels are good at estimating their performance" it's "people of all skill levels have equal bias" in estimating their performance"
OK,
> remember that Dunning-Kruger isn't that lower skilled people are bad at estimating their skill, it's that they systematically overestimate their skill
The way I see it these aren't very different - conditioning on low skill and randomly sampling will tend to give way more overestimates than underestimates. What is the distinction between being bad at the skill and bad at estimation, and being bad at the skill and systematic overestimation?
> The fact that the artifact is seen in such data is a powerful demonstration that it is not evidence of the Dunning-Kruger effect
"Such data" refers to a world where everyone have absolutely no idea how good or bad they are. To me that is a much stronger argument than that made by DK. So perhaps our differences all come down to our priors. My prior belief (before looking at any data) is that people would know how good they are at a certain skill. If your prior is to expect that people don't know how good they are, then your arguments make sense to me. If however your prior is that people do know how good they are, but are also biased (all in the same direction), then I don't understand how the random data experiment reveals anything relevant to your beliefs.
> But when you're bad at the skill and can't underestimate, they look the same.
The (definitional) difference between bias and variance isn't related to do with whether you're bad at the skill or not. It's just mean vs variance of a probability distribution.
If there's a good faith acknowledgement on your part that there's something here you're not getting then I'm very happy to try and help you understand it, and in the spirit of hn I'm assuming that is the case as you've claimed. I'm definitely not interested in any sort of motivated argument, though. If you're attached to the ideas you're putting forward here in some way I have no desire to try and dissuade you.
Operating on the former assumption, I'm not really clear where the misunderstanding lies at this stage, but perhaps it would help if you were to expand on in what sense you think being "bad at the skill" would make bias and variance "look the same"?
Here's my main point of confusion - what does the random data experiment have to do with the DK results?
As stated elsewhere DK has 2 claims:
1. Low-skilled people overestimate their performance and skilled people underestimate their performance
2. Skill correlates with self-assessment accuracy
My first issue with the article is that it implies that since we get effect #1 with random data, that invalidates the respective DK conclusion. This IMO is misleading because random data represents a null model that is very different from my intuitive null model, that of people generally capable of assessing their skills (which I truly believe).
My second issue is that there's no relationship between effect 2 and the random data experiment, which doesn't exhibit anything of the sort. We can have a discussion about the cited papers and effect 2 as the reproduced plot doesn't show density and density plots from the paper do seem to support DK, but that's not my main gripe with the article.
As far as I can see (having checked wiki and the abstract of the original paper - I'm no expert on this) the DK effect is only the first of those claims. However it sounds like claim 2 is less significant here anyway.
Re claim 1 the random numbers example is "all noise, no signal" and I can see the objection that a more convincing example might be to demonstrate the "false" DK effect in an example that does have some signal (i.e. a positive relationship between actual and estimated skill), but that is easy to do and I hope you'll be able to see why if you see my reply at https://news.ycombinator.com/item?id=31042619 and read the comments under the article I mentioned there.
The point is that the DK analysis involves comparing two things which both contain the same single sample from a noise source. Pure noise like the random numbers in the example displays a powerful DK effect due to autocorrelation that says nothing interesting (just that a single random sample of noise is correlated with itself), and that powerful effect can swamp any actual relationships in the distributions. To avoid that effect appearing, you have to make sure that if the two things you are comparing contain samples of a single noise source they are separate, independent samples of it. The experiment with the education level groups achieves this because the education level is "measured" as a separate event from the "actual" skill measurement so they have separate noise sources (and even if they didn't the noise source would have been sampled separately and independently).
I have to say, during the discussion above I hadn't thought through it deeply enough to grok this level of it, and while pondering your last comment I went through a phase of "hang on, am I actually understanding this myself?", so I apologise and retract any suggestion of bad faith.
Thanks! No worries, I appreciate you writing this.
> I can see the objection that a more convincing example might be to demonstrate the "false" DK effect in an example that does have some signal
More than that - as it is, the argument is meaningless to me. It states that DK is trivial in a world where all people have no ability whatsoever to assess their own performance. OK, and finding dinosaur bones is uninteresting in a world where dinosaurs roam free. Both are true, but both are irrelevant in our world (considering my priors). To give a less hyperbolic example, suppose I found some population of people whose weight and height correlate much less than we currently measure, through some biological mechanism of very high variance in bone density or something. To me this article is like saying "well yeah, but this finding is uninteresting, for example if you take purely random weight and height you get an even stronger effect of short people with very high bone density and tall people with very low bone density".
Regarding all the rest - I don't really understand all this "comparing things which both contain the same single sample from a noise source". I'm currently willing to bet (albeit not too much) that any synthetic data experiment you'll come up with, that doesn't display an effect through the DK analysis, will turn out to be based on assumptions that strongly align with my prior, which is that subjects' self-assessment of their performance is correlated to their performance, with 0 bias (on average) and noise that is small (but not negligible) compared to the signal. Would be interested to be proven wrong.
> Pure noise like the random numbers in the example displays a powerful DK effect due to autocorrelation that says nothing interesting
On the contrary, finding out that the distribution in the real world is like that ("pure noise") would be very surprising (therefore interesting, in a sense) to me.
Sorry, but this doesn't make sense to me. There has to be a boundary - a person who got all the answers wrong can't underestimate their performance, and a person who got all the answers right can't overestimate their performance. You could make the case that boundary effects are all DK is about, but that's not what the article is doing (and also I don't think such a claim is supported by the DK plot).
I'm watching this thread with interest and will try to restate my understanding of GP's argument, by means of an example.
If a person's true skill is 5 on a 1-100 point scale, but the person is completely unaware of their true skill and will guess randomly, then their estimate will bias heavily in the direction of overestimating their skill, even if they were not intrinsically motivated to overestimate their skill, simply because far more of the available guesses are higher than their true ability.
In other words, the available probability space itself biases in the direction of overestimating their ability, for those people.
Is that right andersource?
I don't know the statistical right answer here, but curious to know.
It's worth reading the discussion between author and Nicolas Bonneel that starts with the first comment below the article. The author's explanation is very helpful regarding this point.
The main point is that in the paper's randomly generated numbers example, the DK effect disappears if you measure the actual "skill" and the "prediction error" in separate, independent experiments. In the example if you take a "person" and conduct the test you get a totally random result, and you get another, independent totally random result if you test them again. If you perform your "actual skill" measurement using one of those test runs and your "skill estimation error" measurement using another, the DK effect disappears completely.
So, to the extent the result of your skills test has any "noisiness" to it, if you analyse it the way Dunning & Kruger did, the autocorrelation resulting from using the same sample of that noise in the two things you're trying to assess the relationship between will show up as a powerful DK effect, and can easily swamp any actual correlations in the underlying distribution.
Edit: Also worth mentioning footnote 3 on the article, which points out that the use of quantiles introduces a separate bias for the same reason you mention (about there being a minimum and maximum score).
It’s not a complete tautology though, if people’s estimates of their skill were accurate in an unbiased way, we wouldn’t see a DK effect (or we’d only see a slight one, since you can’t really be unbiased at the low and high ends of the spectrum as the other comment pointed out). This isn’t true in the case of uniform random data, or in the real data we see, but it could be true of some data.
I don't understand this part. "Completely uncorrelated data" is usually taken to represent the null hypothesis, but that's not the case here. In the DK paper, the implicit null hypothesis is "people of all skill levels are good at estimating their performance". In this case the "completely uncorrelated data" matches an alternative hypothesis, "people's skills have nothing to do with their ability to estimate their performance in tasks testing that skill". This hypothesis doesn't outright contradict the DK proposed hypothesis (and is certainly not the DK null hypothesis), so getting similar results is unsurprising to me, and I'm not sure that we learn from it anything about the DK results.
As for the other study cited, the figure shown in the article doesn't give a lot of information on density, and looking at the paper itself, figure 4 does actually seem to show that self-assessment gradually shifts left with increasing level of education.
(Edited for accuracy).