ar5iv tracks the arXiv collection with a one month lag. Exactly as to signal that this is not the "official" arXiv rendering. It is also a showcase predating the arXiv /html/ route, but largely using the same technology. Nowadays maintained by the same people (hi!)
There used to be another showcase, called arxiv-vanity. They captured what happened pretty well with their farewell post on their homepage:
You can help make LaTeXML better, or you can simply report issues when you spot them during reading. Some we have collected automatically (any errors and missing packages), but others we can't - wrong colors, broken aspect ratios of figures, weirdly layed out author lists, etc.
As a very brief update - we are pending a larger update.
You will spot many (many) issues with our current coverage and fidelity of the paper rendering. When they jump at you, please report them to us. All reports from the last 2 years have landed on github. We have made a bit of progress since, but there are (a lot of) more low-hanging fruit to pick.
The main bottleneck at the moment is developer time. And the main vehicle for improvements on the LaTeX side of things continues to be LaTeXML. Happy to field any questions.
I would like to write code for latexml to translate a package but I found the documentation to be hard to understand. That might be what is holding developers back. I looked at this a year ago and gave up.
Tell us what you would need described in a tutorial to be productive, as well as your background with the technologies involved (TeX/LaTeX, perl, XML, XSLT, HTML). Probably best as a new issue:
Just passing by to mention that if you get excited about seeing your upgrades in arXiv itself, we can talk about contributing them to the arXiv HTML pages.
But seeing your plans for Science Stack, all the best with the endeavour!
And I am curious to know if arXiv:2105.10386 works well.
It works! After the initial data load (big paper), the scrolling and performance works nicely.
Can visit at sciencestack.ai/arxiv/2105.10386
Note: no support for nomenclature/index yet.
I'm also working on refactoring the data/json to a streaming model (right now it's one big json dump on load)
To me one of the exciting aspects of HTML is that we can theme the same article in different ways, tailored to individual preferences - just swap in a different CSS file.
Having a two-column theme, or left-aligned vs justified themes, could be workable in the long run. I hope that we get to see some browser extensions modding the pages before too long.
The reason for the current justified text is that it is the default aesthetic for a LaTeX-based article, and a lot of authors expect it.
For the image widths, there is some CSS fine-tuning that is still needed on the arXiv HTML side. I think that will get fixed soon, just needs the right height directive set.
Getting subfigures emulated via flexbox is one of our more recent LaTeXML enhancements, and still has some ongoing work (working on it today actually). It can be a bit finicky to test - there are easily 20 different ways people can write LaTeX for subfigures in arXiv.