tarleb's comments

tarleb · on March 22, 2023

Maybe take a look at pandoc before writing your own DSL. You can use it with --pdf-engine=weasyprint, so your DSL can be Markdown, reStructuredText, Org-mode, ...

airstrike · on March 22, 2023

Thank you. pandoc may be part of the glue too, I have to think about it. I probably do want my own DSL as none of the existing ones apply to my "domain", "specifically"

tarleb · on Jan 9, 2021

You might like these:

An Empirical study on the impact of static typing on software maintainability, Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, Andreas Stefik/. Empir Software Eng, (2013-12-11). DOI: 10.1007/s10664-013-9289-1.

An Empirical Investigation of the Effects of Type Systems and Code Completion on API Usability using TypeScript and JavaScript in MS Visual Studio. Lars Fischer, Stefan Hanenberg, Proceedings of the 11th Symposium on Dynamic Languages (154--167), 2015.

A large-scale study of programming languages and code quality in GitHub. Ray et al., 2014

The TL;DR is: typing matter, but so does tooling. However, programmers in dynamic languages are slightly slower, appear to produce more defects. There is a measurable benefit of static typing, but it's small.

UncleMeat · on Jan 9, 2021

The paper by Ray et al. has been harshly criticized (https://dl.acm.org/doi/pdf/10.1145/3340571).

jose_zap · on Jan 9, 2021

Interestingly, in this paper, Haskell displays a negative correlation in defect rate

tarleb · on Oct 25, 2020

> I've decided to evaluate pandoc and see if it might be useful for supporting Markdown and Word formats, etc. If it is, then I'll reach out to John McFarlane and ask about a commercial license (or just something in writing), perhaps in exchange for sponsorship on GitHub.

Better to just use a GPL compatible distribution method: pandoc has 349 contributors; none of them signed a copyright assignment, so you'd need permission from each and every contributor to use the software in a way not permitted by the GPL.

If you need a freelancer with deep pandoc knowledge, please do reach out. I'm happy to help.

tarleb · on Oct 25, 2020

Or HTML+CSS with WeasyPrint or Prince; the latter is free for personal use.

tarleb · on Oct 25, 2020

Arch, I presume? That's mostly due to a man-power problem on the side of the Arch Haskell maintainers. Try our pandoc Docker images or use pandoc-bin from AUR for a bloat-less version. https://hub.docker.com/u/pandoc

flootgrumk · on Oct 25, 2020

Considering what pandoc does and how it is used, docker is a massive overkill imho. What pandoc should actually do, is come as a tar ball and be buildable the traditional configure make make install way like all unix tools of a similar fashion do. Haskell, atm, is no language for this.

tarleb · on Oct 25, 2020

Hahaha, that's actually some quality and funny trolling. Not bad :D

For everybody interested in alternative installation methods: all pandoc releases are available as statically compiled binaries for Linux, and via installers on macOS and Windows. Any major package managers ship a more-or-less recent version of pandoc. Compiling is as simple as getting the "stack" tool and running `stack install`.

tarleb · on Oct 25, 2020

I'm the author of pandoc's org-mode parser. Can you drop me a mail (listed on my GitHub profile <https://github.com/tarleb>) or post to the pandoc-discuss mailing list?

bzg · on Oct 25, 2020

Thanks for writing this parser!

FYI, https://orgmode.org/list/[email protected] is about enhancing Org's syntax documentation. If you have specific needs/ideas that you'd like to share, please don't hesitate.

tarleb · on Oct 25, 2020

I'm a long time (7 years) contributor to pandoc. Other frequent contributors often drop by here as well. Happy to answer questions, ask us anything.

einpoklum · on Oct 25, 2020

I speak (and write) a right-to-left language.

I'm not a pandoc user (so far); and have struggled many times in the past with bugs and lacking features in LibreOffice and LaTeX regarding right-to-left text layout and language-specific issues.

My question: How "trustworthy" is pandoc in handling right-to-left content and side-stepping the minefield of target format issues involving such content? Is this subject getting explicit attention from maintainers?

tarleb · on Oct 25, 2020

Pandoc should be usable for users of all languages and scripts. It is possible to define the documents language via the `lang` metadata field; `ltr` and `rtl` attributes can be set for individual text elements.

Core contributors are westerners or Russian (US, UK, Switzerland, Germany, Russia), and we rely heavily on user reports to improve non-LTR scripts and languages. But the goal is to make pandoc work flawlessly for everyone.

mcswell · on Oct 25, 2020

I have used Xe(La)TeX and the bidi package for mixed rtl and ltr script documents. I don't recall any problems with that. There's also a polyglossia package, but I have less experience with that.

mb2100 · on Oct 25, 2020

see https://pandoc.org/MANUAL.html#language-variables

harry8 · on Oct 25, 2020

There seem to be not so many haskell applications that succeed to the point where they are of general use, as in not simply useful to programmers doing programming (probably in Haskell) At least this is a frequent observation about Haskell and one I've made myself. https://news.ycombinator.com/item?id=11907839 Obviously around here the ideal is we keep language wars/boosterism/accusations of being a virus etc out if it (Hey I /like/ Haskell, I've just found it useful for my brain rather than being especially useful for performing data transformations that come my way).

/If/ you accept that premise, why do you think Pandoc has been so very successful where perhaps other applications written in haskell have not? The Problem domain (something about writing parsers)? The contributors? The culture? Something else entirely?

Of course if you reject that premise I'd also be interested to hear your thoughts on it in as much detail as you care to provide.

Cheers.

tarleb · on Oct 25, 2020

First, let me challenge the premise: the list of popular Haskell projects on GitHub is far longer than you might expect. Pandoc isn't even the most popular one: https://github.com/search?q=language%3Ahaskell+stars%3A%3E10...

But there still may be some truth to the claim. A simple fact is that smaller mind share -> fewer programs -> less chance for extremely successful projects. From personal experience: it took me three tries and multiple months to get comfortable enough with Haskell to the point that I was able to write my first contribution to pandoc (the org-mode parser), despite having dabbled in functional-style Lisp for years before that. But Haskell, as used by pandoc, isn't difficult. In fact, I often find it easier to use Haskell, thanks to its excellent type system. It's just very different and requires a bit more investment up front, with huge benefits lurking down the road.

Data to support my claim that Haskell is actually easy to use: over 300 people have contributed to pandoc, with over 100 contributing Haskell code. Many of those contributors have never written any Haskell before, but the type system helped them to find their way.

I talked a bit about the whole topic here: https://youtu.be/JpNEIpLtCHs

harry8 · on Oct 26, 2020

Just to address the premise with the data in the link you provide. Click your link, remove anything that is a compiler, a linter, some other parser of programming languages, a library for use when programming haskell or a programming framework and that list gets very, very dramatically shorter.

I don't think that's entirely fair fwiw, it's github ordered by stars, that will turn up things used by programmers for programming in any language. But either way I don't find the refutation convincing.

I'd love it if the premise was no longer fair. That the data really does not support it. I want monad tutorials, there are thousands. That is no exaggeration. I want Haskell applications useful for something that isn't programming a computer - really not much.

I was kind of hoping you'd say something about the parsing problem domain and why that /seems/ to work particularly well with haskell but other domains not quite so much, at least yet, and whether that can be changed or is simply the nature of statically typed, pure functional programming languages (I really hope not).

It's not "successful" let alone "extremely successful" programs so much as "existant" that is the bar that needs clearing first.

Pandoc is great. Haskell works well for those of you hacking on it. I've used it, liked it and thank you for it! It isn't necessary to have an opinion on the topic at all, of course.

tikej · on Oct 25, 2020

Thank You for the ever improving org-mode parser. Org-mode is in general difficult since it's a bit of a moving target, so I'm surprised that it's so well supported!

tarleb · on Oct 25, 2020

Thanks, comments like yours make my day :)

Not sure if I'll ever find the time, but I'd like to make the org-parser less useful for Emacs users. The idea is to write an org exporter which produces pandoc's AST JSON format; all Emacs Org settings would be respected that way, the detour through pandoc's parser would no longer be necessary, and remaining parser incompatibilities wouldn't matter for users exporting from Emacs through pandoc. Well, some day...

tikej · on Oct 26, 2020

That will be great. Org’s greatest power it’s also a weakness – coupling with Emacs. I mean it’s great in all aspects except getting other people to use it.

Pandocs makes it possible/bearable to interact with rest of the world (I’m in the process of moving more things to org).

Being able to export directly to pandoc’s AST Json will probably allow to avoid using other programs to edit content at all! I’ll wait for this day to come; perhaps I’ll even learn enough Elisp to contribute untill then. ;)

tome · on Oct 27, 2020

> There seem to be not so many haskell applications that succeed to the point where they are of general use, as in not simply useful to programmers doing programming (probably in Haskell) At least this is a frequent observation about Haskell and one I've made myself.

Yes, repeatedly, and I'd love to know why you think it matters and what it is indicative of!

amelius · on Oct 25, 2020

Perhaps performance plays a role; transforming documents is usually not a bottleneck (unless you are running some server farm).

Also transforming documents seems like a task well suited to functional languages.

mb2100 · on Oct 27, 2020

I heard somebody say "Haskell people tend to write libraries, Rust people commandline tools". Pandoc is the excpetion that proves the rule ;-)

deskamess · on Oct 25, 2020

Are there any filters/plugins that could create a good workflow for converting a pdf that is multiple pages of very clear text images? Think of each page having a few printed multiple choice questions. Is there an easy way to get it into a text document?

Some command (or commands) that can be wrapped in a script:

> convert2txtViaOCR.sh -i input.pdf -o output.txt

Thanks.

hadley · on Oct 26, 2020

Could you shoot me an email? I’m always on the lookout for pandoc freelancers.

nwaheed · on Oct 25, 2020

I want to use it in commercial product, is it allowed?

dwheeler · on Oct 25, 2020

I presume you mean a proprietary license. Probably yes, you just have to obey the license. The Linux kernel and git are also GPL. In general, if you're not linking it into your software you're fine, but see the license for details.

Under US law at least, open source software is commercial: https://dwheeler.com/essays/commercial-floss.html

tarleb · on Oct 25, 2020

Pandoc is licensed under the GPL version 2 or later. I know of a couple of companies where pandoc is used in proprietary systems server-side. IANAL, so best to consult one for your specific use case.

tarleb · on April 13, 2020

Pandoc can even free you of the second step by using WeasyPrint as PDF engine:

    pandoc --pdf-engine=weasyprint -t html …

tarleb · on April 12, 2020

Fully free and open source note-taking app with a similar focus: https://zettlr.com

gexla · on April 12, 2020

Is this spam? You created an account just to post this link? Did you read the article?

I looked through the features and this appears to be the same dime-a-dozen-me-too app which doesn't have the one feature everyone loves about Roam.

Cheers to the person who built this. I don't want to crap on anyone's work. But this isn't at all like Roam from what I can see.

tarleb · on April 12, 2020

Indeed, this shouldn't have been a top comment. It is more a reply to the `org-roam`. Also no, I'm not involved in the development of that app, I just happen to like it.

BTW, if your intention is not to crap on anyone's work, maybe don't call something a dime-a-dozen app?