Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought this was the reasoning behind Itanium, the idea that scheduling could be worked out in advance by the compiler (probably profile guided from tests or something like that) which would reduce the latency and silicon cost of implementations.

However, it wasn't exactly a raging success, with I think the predicted amazing compiler tech not materialising, but maybe it is the right answer, but the implementation was wrong? I'm no CPU expert...



Itanium was a really badly designed architecture, which a lot of people skip over when they try to draw analogies to it. It was a worst of three worlds, in that it was big and hot like an out-of-order, it had the serial dependency issues of an in-order, and it had all the complexity of fancy static scheduling without that fancy scheduling actually working.

There have been a small number of attempts since Itanium, like NVIDIA's Denver, which make for much better baselines. I don't think those are anywhere close to optimal designs, or really that they tried hard enough to solve in-order issues at all, but they at least seem sane.


Would Itanium have been better served with bytecode and a modern JIT? Also, doesn't RISC-V kinda get back on that VLIW track with macro-ops fusion, using a very basic instruction set and letting the compiler figure out the best way to order stuff to help target CPU make sense of it?


Those are all very different things. It is probably possible to argue that using a JIT would have solved some of Itanium's compilation issues, like it would make it easier to make compilation decisions about where to do software data miss handling, but I don't think it would have made the hardware fundamentally more sensible, or all that performant. RISC-V isn't really anything like VLIW, it is about as close to a traditional RISC as anything gets nowadays, and macro-op fusion is just a simple front end trick that doesn't overly influence whether the back end is in-order or out-of-order (or whatever else).


I heard that the desire to make x86 emulation performant on Itanium made things really bad, compared to a "clean" VLIW architecture.


I'm not sure what happened with Itanium.

I do think a big part of the problem is that people want to distribute binaries that will run on a lot of CPUs that are physically really different inside. But nowadays there's JIT compilation even for JavaScript, so you could distribute something like LLVM, or even (ecch) JavaScript itself, and have the "compiler scheduling" happen at installation time or even at program start.


You can't distribute LLVM for that purpose without defining a stable format like WebAssembly or SPIR-V.


Yes, you can if you use Bitcode which has been a stable format since around 2015. It is possible to distribute an application as a pure Bitcode binary that can be statically translated into the underlying hardware ISA unless the source code uses inline assembly – see https://lowlevelbits.org/bitcode-demystified/ for details.


What about linking?


Itanium was designed decades ago when compilers were a lot worse, and had other issues. Maybe it's time for another attempt in that direction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: