The obvious way to assess if availability is the factor is to look at open access publications. Elsevier (unsurprisingly) say it makes no difference [1] and there are confounding effects which are stronger. For example, papers with code tend to be cited more in ML. And it's very difficult to control for article quality, which is a big factor (MDPI's quality is extremely hit/miss, for example). Open access may improve your paper's reach, but will it actually give you more citations? Unclear. I'm very much in favour of open research though.
On the matter of Sci-Hub. I already have access to all the journals I commonly use via my university. Those journals almost certainly track downloads (Elsevier even has a recommender system when you download a paper). All Sci-Hub does is reduce friction for researchers who can't be bothered to use their institutional login while off campus. Admittedly that includes a lot of people, but for the people who measurably contribute to citation metrics, does Sci-Hub actually improve availability? I doubt it, at least in universities with budgets for subscriptions. Before Sci-Hub there were always routes to get papers: ask the author or ask a colleague in a neighbouring university if they have access. Or you can ask your library who may be able to get it.
My point is that generally it was a bit more work sometimes, but in the grand scheme of publishing a paper, it wasn't so bad to network to get access to things.
It's certainly enabled the general public to access research and no doubt tons of small companies are using it to avoid paying for subscription fees, but those people aren't significantly publishing papers in places where we can measure it.
All this study shows is that paper downloads from anywhere is a proxy for popularity/hype which can be a proxy for citation count. I'm sure Elsevier or Taylor or Nature could provide a similar correlation between downloads and citations. There is also a bias here - if I'm looking for papers, I'm probably going to download the most highly cited stuff first because that's a weak signal that it's a useful paper.
I'd be more interested to see, as you said, temporal analysis of before/after Sci-Hub.
Or alternatively geographical studies - does this affect lower income countries more? I imagine that this is a boon to researchers who didn't have access because their institution couldn't afford it.
On the matter of Sci-Hub. I already have access to all the journals I commonly use via my university. Those journals almost certainly track downloads (Elsevier even has a recommender system when you download a paper). All Sci-Hub does is reduce friction for researchers who can't be bothered to use their institutional login while off campus. Admittedly that includes a lot of people, but for the people who measurably contribute to citation metrics, does Sci-Hub actually improve availability? I doubt it, at least in universities with budgets for subscriptions. Before Sci-Hub there were always routes to get papers: ask the author or ask a colleague in a neighbouring university if they have access. Or you can ask your library who may be able to get it.
My point is that generally it was a bit more work sometimes, but in the grand scheme of publishing a paper, it wasn't so bad to network to get access to things.
It's certainly enabled the general public to access research and no doubt tons of small companies are using it to avoid paying for subscription fees, but those people aren't significantly publishing papers in places where we can measure it.
All this study shows is that paper downloads from anywhere is a proxy for popularity/hype which can be a proxy for citation count. I'm sure Elsevier or Taylor or Nature could provide a similar correlation between downloads and citations. There is also a bias here - if I'm looking for papers, I'm probably going to download the most highly cited stuff first because that's a weak signal that it's a useful paper.
I'd be more interested to see, as you said, temporal analysis of before/after Sci-Hub.
Or alternatively geographical studies - does this affect lower income countries more? I imagine that this is a boon to researchers who didn't have access because their institution couldn't afford it.
[1] https://www.elsevier.com/connect/citation-metrics-and-open-a...