Excel has everything to do with it. From the article:
> In email correspondence seen by Retraction Watch and a follow-up Zoom call, Heshmati told the student he had used Excel’s autofill function to mend the data.
The most charitable interpretation is that the professor has no idea what autofill is or isn't capable of doing, so he misused it.
That’s not even the worst case though, worse by far than interpolating (whatever method is used) is simply taking data from countries with a similar name to fill gaps.
Flash Fill and Series Fill are great at recognizing simple patterns like 1,2,3,4,5 or 2,4,6,8,10 and can sometimes (unreliably!) recognize more complicated multiplications, repetitions or formulaic string modifications (like house numbers and street names). I use both frequently.
But neither these tools nor an LLM has real understanding, they're not going out into the real world and collecting data. The results need to be verified. Automatically filling an index column with 1,2,3... is one thing, automatically filling a data column with guesses from pattern matching is different. This problem is only going to get worse as LLMs proliferate and can more reliably but still imperfectly and still opaquely fill in more complicated data.
I don't use autofill precisely because I have no insight into the algorithm behind it. I suspect it's very "stupid," but I don't know exactly how stupid it is.
It sounds like he didn't realize or notice that autofill was using adjacent cells (from other countries, in this instance). That behavior would shock me as well. I thought autofill only took columns of data into account.
> It sounds like he didn't realize or notice that autofill was using adjacent cells (from other countries, in this instance).
No, it doesn’t sound like that: “in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet.”
The headline isn't blaming Excel. "Undisclosed tinkering in Excel" is not far from the accurate headline: "Misuse of commonly-used Excel tool..."
The interesting thing is that a supposedly sophisticated person was using an unsophisticated tool and then relying on the data it produced. Failing to mention "Excel" in this case is doing a disservice to the thrust of the problem.
The accurate headline would be incomplete data falsified.
The tools have nothing to do with it, and the same could have been done with paper and pencil. Autocomplete is the problem, it just repeats a single number or a series, and when using it you see exactly what the results were.
This was not done by mistake or in ignorance or because the tool encouraged them, it was done to knowingly falsify incomplete data so that they could finish their paper and publish.
> In email correspondence seen by Retraction Watch and a follow-up Zoom call, Heshmati told the student he had used Excel’s autofill function to mend the data.
The most charitable interpretation is that the professor has no idea what autofill is or isn't capable of doing, so he misused it.