> For the scientific literature, we need a ChatGPT equivalent to reconstruct LaTeX source that can reproduce each page. (We really need a successor to LaTeX that isn't such an arcane language, and can author fixed and flowable text with equal ease.)
Check out Nougat: OCRing scientific papers with a deep net trained end to end. It was released by Meta a few days ago.
“PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents.”
Except the only thing it has in common with the macOS version is a color scheme. Plus it hasn't been updated in years.
You don't actually get any of the things that made iA Writer for macOS so great, and all the technical issues (such as broken trackpad scrolling) are incredibly distracting, defeating almost the entire point of the app.
Yeah, ultimately this is what turned me off. Used the beta and loved it for as long as it lasted. When it ended it was hard to migrate elsewhere but I couldn't convince myself it was a good idea to continue investing in a service that was entirely at the mercy of its direct competitors and shows no signs its working to reduce that risk and cost.
Unless they mitigate those risks they will only exist for as long as google or bing wants them to. The only ways they survive are:
- Mitigating those risks and costs (e.g., building/using own index, well designed caching could help)
- Staying small enough in terms of searches and users to be under the radar for Google and Microsoft
- Pray for the mercy of two of the most ruthlessly anticompetitive companies in existence (laughable)
- Convincing Google or Microsoft that they are worthwhile to acquire (but this kills the service for me anyways)
Price hiking +150% for the stated reason that my direct competitor increased my costs certainly shows the pressure is on and working as intended. On the off chance that kagi devs or management reads this, PLEASE find a way to isolate yourself from being totally reliant on google,bing,etc. Unless you are going for an acquisition exit from Google or Microsoft, it will kill your company eventually.
They have their own index[1]. It's not easy, when a bunch of sites block anyone who isn't Google or Bing. But this is the same strategy Brave seems to be pursuing, where they try to rely more and more on their own indices.
> The crawler is hybrid, using async python requests and puppeteer with uBlock Origin. The way detection works is we count the number of uBO blocked requests on the page, and if too many (threshold is set to 5), we kick it out, leaving only "clean" pages in the index.
Fascinating; cnn.com reports 47 on the front page, npr.org is at 16, developer.hashicorp.com is at 9. I don't think that metric is doing what they think it is, or rather maybe they're trying to target only savanna.gnu.org style sites or something
Is there a legal issue with spoofing user agent to be the google crawler? Spoofing is certainly enough to get rid of article paywalls for 99% of sites Ive encountered. At least last I heard you can also work around cloudflare captcha by just routing requests through a worker on their service.
Sure. USB's port has a little tab inside, I think its like 0.6mm. Because they put a TON of pins on this little tab (24 pins!) the tab itself in almost all cases has to be plastic or non-metal.
Lighting by contrast tends to have a metal tab which is relatively beefy at 1.5mm. This is much stronger materials and dimensions wise AND is on a part (the cable) that is pretty easily replaceable. So that's a win win win.
Despite this, the port itself is SMALLER for lighting. So add another win?
Finally with lightning the interface is simple, it's one piece inside another and the tolerances seem really good / there is an little indent to help seat a lightning cable. With UCB-C you have a shall going around a tab, so loading is onto the interior surface with flexing, then a shall around that, so tab -> surrounded by cable shell -> surrounded by port shell. In my experience this just results in a fair bit of slop in accumulated tolerance issues.