Zotero does a good job at it, doesn't it?

KingMob · on March 21, 2024

As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.

More fundamentally, we need to stop disseminating scholarly work as PDFs, a format primarily designed for print. Plain HTML would be an improvement. Even better than HTML would be an extended variant with scholarly-specific semantic markup and universal, animated, explorable figures. Embedded notebooks would be cool, too, but disseminating data would still be a major challenge. (And I don't just mean storage/transfer; a lot of researchers are reluctant to share source data to the world.)

Unlisted6446 · on March 21, 2024

So I'm a researcher that almost always uses pdfs... Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade. But is HTML the same way? It seems like it relies on the web browser and many other things... How would one manage things like images and gifs? Is there a way to keep everything into one HTML file that's easily shareable and feels secure?

telegtron · on March 21, 2024

The potential to freeze an HTML page in time with minimal changes at render time is already there. [0] Such an ability can even be baked directly into the rendered HTML page so the viewer would be able to download a copy of the page as it is seen at a given time. Other archiving facilities, such as archive.org, take static snapshots of accessible pages if allowed by the publisher of the page and requested by anyone who wants to make that snapshot.

My point is that it is possible to achieve in principle and in practice, albeit that might be practiced as often as one would like to see.

-------

[0] See SingleFile by gildas at https://addons.mozilla.org/en-US/firefox/addon/single-file/: “Save an entire web page—including images and styling—as a single HTML file.”

hju22_-3 · on March 22, 2024

I like SingleFile, but it's not perfect. It usually works just fine, but will occasionally drop the ball depending on the type of JavaScript on the page.

For example, I once backed up a page using it, and while it got all the content, it did not grab the JavaScript necessary for the images to display correctly.

cxr · on March 22, 2024

> Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade.

Feelings and promises are each one thing. Reality is another. PDF doesn't even look "the same" today. I have serious questions about how often folks who think that PDF is reliably consistent from system to system step outside their bubble and just how diverse their setups are that they're testing on.

> is HTML the same way?

Well the status-quo for copy-and-paste in HTML isn't dogshit, it's comparatively trivial to find and use tools that can thoroughly and exhaustively search your collection (or even write your own), and HTML is a dead simple plain-text format that if worst comes to worst you can read with your eyes (unlike needing to run a bunch of inscrutable code from a PostScript subset through an interpreter before you can do anything with it). So, no, I wouldn't call them the same.

nickpsecurity · on March 21, 2024

Machines and humans can both easily use HTML/XML. Extracting information from PDF’s is so much harder that there’s deep learning products dedicated to doing it. They still make mistakes, too.

I’d much rather have something akin to the CHM files where everything I need is in one file, easy to analyze, and has good readers.

cygnion · on March 21, 2024

I explored tools to export/interchange PDF to HTML in the KnowledgeGarden app, but the results were not optimal, suffering from non-standard layout and poor typesetting of equations. Publishers of scholarly articles generate web pages of papers, but they're not replicas of PDF files.

Re. self-contained HTML (and slightly off-topic), look at TiddlyWiki, which contains data/code/layout all in one interactive, local or hosted HTML. Extensibility, plugins, and community of contributors are some key highlights, among others.

[1] https://www.tiddlywiki.com

theGnuMe · on March 21, 2024

I'd like to see PDFs move to Computational Notebooks. One can dream.

gerroo · on March 21, 2024

That'd be so nice. Imagine executing the code for an ai paper and seeing the beautiful visualizations as you read it.

chipdart · on March 21, 2024

> As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.

Can you elaborate where you think Zotero drops the ball?

drhelix · on March 22, 2024

one major issue with zotero is the lack of android support. they are working on an android version or app or something since forever.

then is the way you store the pfds. if you want to sync between multiple computers you have to either know how to work with webdav or know how to point zotero at the location where you have your pdfs or (what they most certainly love) pay a lot of money for not so much storage space on their system. that last thing is what i don't like because i just don't trust anyone these days. you get invested in a system, build your routine around it only for them to shut it down, sell it watever and then puff you have to start over.

people keep calling zotero foss but if they were truly foss they would have a much more transparent way for people to roll their own selfhosted zotero server. instead, what they have is a dump of an old version, with next to zero documentation and a bunch of stubborn people that have managed to get something working but not quite.

I get that they are trying to make money but I am sure they could do that and be more transparent.

The other thing is the reliance on so many plugins. While zotero itself may last a while, who can say anything about the many devs of the many plugins that you end up relying on in order to make zotero bow to your routine? I like zotfile and a few others, but how long are they going to last? Also, reinstalling my system is a huge pain to get back to my routine because I have to remember all the settings for each and every plugin I install. They should come up with a way to save all these settings and restore them, and no don't do it through another plugin!

omnster · on March 22, 2024

Here's a guide I found useful to set up zotero storage. In brief, it relies on zotfile to flatten the storage (keep all pdfs in one directory) and better bibtex.

I realized that it helped me to get rid of exactly the pain with fresh installs that you mention. I realized that the two plugins give me most of the functionality that I want.

https://habr.com/ru/articles/443798/