someuser0123091's comments

someuser0123091 · on March 16, 2024

Does anyone know if there is any research around designing a compiler + language that produces optimal code? Besides superoptimization that is.

Specifically, in what way should a language be designed to aid the compiler in finding the optimal code?

For example for the matrix multiplication, why can't the compiler find the optimal code? Why are BLAS written in assembly for each CPU architecture? Isn't this a question of having a high level CPU spec (how many ports, instruction latency, throughput) and then using a constraint solver to find the optimal code?

TrueDuality · on March 16, 2024

Compiler optimization is an extremely hard problem. Unsolvable by our current standards and usually the compiler has less knowledge than the author of the code and has to make assumptions about the hardware it is going to run on. Language design feeds heavily into the types of optimizations that are even possible to reason about.

So yes there has been an enormous amount of research into both the programming language design side of this equation, the compiler design, and the area in the middle. Probably hundreds of thousands of research papers since the 50s or 60s.

Compilers and automated systems, in the general case at least, still can't out compete a domain expert that is familiar with the specific hardware being targeted, a good profiler, and the time to fiddle with the raw assembly.

charleshn · on March 16, 2024

> Why are BLAS written in assembly for each CPU architecture?

On that specific subject of SIMD portability, embedded DSL like Highway developed by Google [0] can help. Hopefully it'll gain traction and usage.

[0] https://github.com/google/highway

pornel · on March 16, 2024

With a typical workflow of “just compile this source code”, this is an impossible task, because performance of algorithm implementations depends on the actual data they process (think how SQL plans queries based on table stats). Profile-guided optimization already improves performance significantly despite merely informing some heuristics. A truly optimal compiler would need to design algorithms for specific data sets (basically what people do when they profile the code).

However, there are some takes beyond the “Algol68 on PDP11” design of most languages. There’s Intel’s SPMD C compiler that extends the language to make auto-vectorization a first-class feature.

There is Halide lang specifically for image processing, which separates the goal from the algorithms used to archive it.

naasking · on March 18, 2024

This strikes me as akin to asking why can't an algorithm find the lowest cost path to visit a set of nodes, aka the traveling salesman problem.

A language that helps the compiler has the usual properties we already know: provide as much info as possible with types, and restrict the set of allowable expressions to make the search tractable (like CUDA, ownership and alias analysis, and so on).

nyrikki · on March 16, 2024

Different hardware cache-hits differently for one.

Level 3 BLAS functions like matrix on matrix products perform O(N^3) operations on O(N^2) data.

The main way BLAS boosts Level 3 functions is cache optimization which is hardware dependent.

But optimization problems in general are not trivial for computers.