I found sokuza-kanren [0] to be a wonderful introduction. It starts with the absolute basics and builds into a minimal logic system through a series of well-documented steps.
I think python is the most common and simple language with majority of devs would understand for concepts learning, but it is likely too slow for practical applications.
One of the things miniKanren has going for it is that tons of folk have built little embedded implementations in the whatever your favorite host language is. For instance, here's https://web.archive.org/web/20211205175513id_/https://erik-j... a walkthrough of how you'd implement the basics in Python. Comes with an explanation.
> loops like this are easy for compilers to analyze, in a way that makes them not representative of real code
Which makes it a perfectly fine benchmark to measure whether a particular compiler implements these optimisations. The benchmark also highlights fine implementation details. I did not know about Dart's interrupt checks, for instance.
I see these microbenchmarks as genuinely useful, as I can analyse them, the logic behind them, and apply results in interpreter design. Consider [0] for example. Any sane compiler would do this kind of optimisation, but I've seen only one production interpreter (daScript) doing it.
Actually, when I first saw this, I set out to investigate. I managed to reproduce the same performance in an isolated environment and I has been sitting on a post on how this works for quite a time. Hopefully, I will have time to finish it this weekend.
Yes, it is that fast.
Long story short, the trick is a completely novel way of implementing interpreters.
It is tree-based and I should stress that the tree is not an AST tree but a highly specialized tree-based representation produced from the AST tree. A node is basically equivalent to an "instruction" in a bytecode-based interpreter. Usual arguments against tree-based interpreters do not apply: there is no pointer-chasing, since the nodes sit tightly in the CPU cache. Dispatch overhead is just a virtual function call so should be comparable to a computed goto-based interpreter.
The nodes operate on machine-native primitive types and the values are passed around between nodes in registers most of the time - oftentimes the interpeter doesn't even need to access the main memory. There is no encoding or decoding of values and no typechecking during evaluation. The interpreter even goes at lengths to avoid conversion of values in most paths, i.e. if your code operates on uint32_ts, they will be passed around as such up until a node that requires them to be cast, effectively creating typed execution paths.
Before the interpreter there is also a "compilation" phase which performs a ton of optimizations, of which I guess the most interesting is the node fusion. As you may guess, it replaces smaller nodes with larger and more specialized ones. In my tests I implemented it for a naive AST-walker-like intepreter for a 3x performance improvement.
I can go on but I better finish the post.
I would also add that writing an interpreter in this style is far simpler than writing even a simplest JIT, but it does impose certain requirements on the language itself.
All in all, the design is nothing short of genius.
Interesting, thanks for the info. So as it seems the fact that no type checking/conversion is necessary as in dynamic languages has a positive influence on performance; but I would expect significantly less than factor two. The special AST therefore means that the interpreter is closer to a bytecode than an AST interpreter and doesn't suffer from the drawbacks of an AST interpreter, but then assumingly is not repsonsible for the measured speed-up (but instead avoiding a speed-down). The mentioned optimizations must then be responsible for the speedup, but actually also LuaJIT has many optimizations.
Could be dynamic frequency scaling. To minimize the impact of it when benchmarking one can pin the process to a single core and warm it up before running the benchmark.
That's the point. This hand waving allows the user to build more complex systems.
The cost is efficiency but make no mistake not thinking about the heap allows people to build much more complex programs. It's a trade off between complexity and efficiency.
Then could it reversely be argued that not worrying about the stack (tail recursion) allows people to build more complex programs? Probably depends on which is more worrisome on average, heap or stack.
Au contraire. It is a tradeoff between application programmer effort and compiler writer effort. Amortization infers it is better to have an extremely advanced IR optimizer.
Good question! From the top of my head: non-opionated memory management, nice syntax (and meta-semantics), strongly statically typed, "interpreted with possibility of AOT compilation" execution model and easy and performant native interop. Let me expand the points in order.
Non-opionated memory management means that I am in the ultimate control of each allocation it does. Even if it requires GC it should allow me to redefine malloc/free and sandbox it into a memory region I choose (like Lua). It should not allocate memory willy-nilly and free it as it chooses, especially if it aims for a "system language" role.
Nice syntax is the most opionated point of mine. It should definitely be C-like and not, for example, Python-like or Lisp-like. After all, everyone can read JavaScript. I like the idea of expanding C syntax with first-class blocks (like Ruby) and semantic macros and quoting like Lisp. C has a tradition to make language constructs first-class - that was the motivation for varargs, but it's ability to create new language constructs is flawed. You can #define a "foreach" with "for"-like syntax, but it still falls short compared to what proper semantic macros can achieve. It would be nice to be able to define a "foreach" as a hygienic macro with a quote.
Typing is a non-question - I don't want to have to deal with dynamically-typed languages anymore. Extra points for local type inference. Strong static typing also enables a lot of performance enhancements down the line.
The execution model is the next salient point. My perfect language should have REPL (I'm that spoiled by Ruby and RoR) and also should be performant enough in release/production builds. Which is why it should allow both live coding and ultra-optimized AOT builds right down to native machine code. In my opinion, the best way of AOT compilation is targeting C or C++ - the AOT compiler shouldn't have to implement any optimizations Clang or GCC already have. Maybe it could target LLVM IR instead.
Native interop should be zero-cost and ideally based on LLVM tooling. The performance cost of calling a C or C++ function should be negligible. Regarding the tooling, I don't actually want to write FFI interfacing code by hand, it can easily be generated automatically.
The only language I know of that actually ticks most of these boxes is daScript, but it is still in the early phase of development and I don't particularly like the syntax of it.
[0] https://github.com/miniKanren/sokuza-kanren