More

IainIreland · 2025-10-15T15:46:32 1760543192

One clear use case for GC in Rust is for implementing other languages (eg writing a JS engine). When people ask why SpiderMonkey hasn't been rewritten in Rust, one of the main technical blockers I generally bring up is that safe, ergonomic, performant GC in Rust still appears to be a major research project. ("It would be a whole lot of work" is another, less technical problem.)

For a variety of reasons I don't think this particular approach is a good fit for a JS engine, but it's still very good to see people chipping away at the design space.

quotemstr · 2025-10-15T17:38:59 1760549939

Would you plug Boehm GC into a first class JS engine? No? Then you're not using this to implement JS in anything approaching a reasonable manner either.

zorgmonkey · 2025-10-15T18:20:00 1760552400

It looks like the API of Alloy was at least designed in such a way that can somewhat easily change the GC implementation out down the line and I really hope they do cause Boehm GC and conservative GC in general is much too slow compared to state of the art precise GCs.

quotemstr · 2025-10-15T18:31:01 1760553061

It's not an implementation thing. It's fundamental. A GC can't move anything it finds in a conservative root. You can build partly precise hybrid GCs (I've built a few) but the mere possibility of conservative roots complicates implementation and limits compaction potential.

If, OTOH, Alloy is handle based, then maybe there's hope. Still a weird choice to use Rust this way.

ltratt · 2025-10-15T18:53:42 1760554422

We don't exactly want Alloy to have to be conservative, but Rust's semantics allow pointers to be converted to usizes (in safe mode) and back again (in unsafe mode), and this is something code really does. So if we wanted to provide an Rc-like API -- and we found reasonable code really does need it -- there wasn't much choice.

I don't think Rust's design in this regard is ideal, but then again what language is perfect? I designed languages for a long while and made far more, and much more egregious, mistakes! FWIW, I have written up my general thoughts on static integer types, because it's a surprisingly twisty subject for new languages https://tratt.net/laurie/blog/2021/static_integer_types.html

quotemstr · 2025-10-15T21:18:13 1760563093

> We don't exactly want Alloy to have to be conservative, but Rust's semantics allow pointers to be converted to usizes (in safe mode) and back again (in unsafe mode), and this is something code really does. So if we wanted to provide an Rc-like API -- and we found reasonable code really does need it -- there wasn't much choice.

You can define a set of objects for which this transformation is illegal --- use something like pin projection to enforce it.

ltratt · 2025-10-15T21:27:47 1760563667

The only way to forbid it would be to forbid creating pointers from `Gc<T>`. That would, for example, preclude a slew of tricks that high performance language VMs need. That's an acceptable trade-off for some, of course, but not all.

quotemstr · 2025-10-15T21:33:03 1760563983

Not necessarily. It would just require that deriving these pointers be done using an explicit lease that would temporarily defer GC or lock an object in place during one. You'd still be able to escape from the tyranny of conservative scanning everything.

nitwit005 · 2025-10-15T22:04:02 1760565842

Once you are generating and running your own machine code, isn't the safety of Rust generally out the window?

IainIreland · 2025-10-14T17:02:05 1760461325

Cranelift does not use copy-and-patch. Consider, for example, this file, which implements part of the instruction generation logic for x64: https://github.com/bytecodealliance/wasmtime/blob/main/crane...

Copy-and-patch is a technique for reducing the amount of effort it takes to write a JIT by leaning on an existing AOT compiler's code generator. Instead of generating machine code yourself, you can get LLVM (or another compiler) to generate a small snippet of code for each operation in your internal IR. Then codegen is simply a matter of copying the precompiled snippet and patching up the references.

The more resources are poured into a JIT, the less it is likely to use copy-and-patch. You get more control/flexibility doing codegen yourself.

But see also Deegen for a pretty cool example of trying to push this approach as far as possible: https://aha.stanford.edu/deegen-meta-compiler-approach-high-...

IainIreland · 2025-09-25T21:18:26 1758835106

I don't know how JSC handles it, but in SM `eval` has significant negative effects on surrounding code. (We also decline to optimize functions containing `with` statements, but that's less because it's impossible and more because nobody uses them.)

cogman10 · 2025-09-25T21:27:38 1758835658

Last I saw (and I admit this is pretty dated) V8 was doing the same thing. try/catch at one point in V8 would cause the surrounding method to be deoptimized.

IainIreland · 2025-09-25T21:38:13 1758836293

Yeah, SM will compile functions with try/catch/finally, but we don't support unwinding directly into optimized code, so the catch block itself will not be optimized.

pizlonator · 2025-09-25T22:37:58 1758839878

JSC will still JIT optimize functions that use eval.

It’s true that there are some necessary pessimizations but nothing as severe as failing to optimize the code at all

IainIreland · 2025-08-12T18:04:33 1755021873

This isn't about languages; it's about hardware. Should hardware be "higher-level" to support higher level languages? The author says no (and I am inclined to agree with him).

librasteve · 2025-08-12T18:43:22 1755024202

IainIreland · 2025-05-22T16:54:16 1747932856

This is really impressive analysis.

IainIreland · 2025-05-19T19:06:27 1747681587

This doesn't read as AI-generated to me at all.

The prose isn't polished enough to be AI. AI generation is unlikely to produce missing spaces like "...which are not readable to humans.SDB uses eBPF ...", or grammatical inaccuracies like "Ensuring Fully Correctness".

As for the data race thing, it seems to me that there's a pretty clear distinction between rbspy's approach (as described in reference 1) and this blog post. rbspy is walking the native stack, which occasionally fails. SDB seems to be looking at Ruby's internals instead, and has some sort of generation-number design to identify cases where there was a data race.

Beyond that, this post just absolutely sounds like what somebody would write if they were trying to describe in prose why they think their multi-threaded code is correct, especially the "Scanning Stacks without the GVL" section.

IainIreland · 2025-03-12T17:50:31 1741801831

Copy and paste all four lines at once.

gitpusher · 2025-03-12T17:52:43 1741801963

Oops, haha. Yes now i am getting `1` as the result. Thanks for the tip.

IainIreland · 2025-03-04T00:05:11 1741046711

asm.js solves this in the specific case where somebody has compiled their C/C++ code to target asm.js. It doesn't solve it for arbitrary JS code.

asm.js is more like a weird frontend to wasm than a dialect of JS.

lern_too_spel · 2025-03-04T02:50:09 1741056609

No, if you just use the standard JavaScript cast to integer incantation, |0, v8 will optimize it. asm.js is valid JavaScript.

MaxBarraclough · 2025-03-04T20:04:47 1741118687

Sure, but that was essentially my point. If we're trying to compare HotSpot and V8 for similar input code, Java and asm.js seem closer than Java and full-blown JavaScript with its dynamic typing.

IainIreland · 2025-03-04T00:01:03 1741046463

The main thing we're doing differently in SM is that all of our ICs are generated using a simple linear IR (CacheIR), instead of generating machine code directly. For example, a simple monomorphic property access (obj.prop) would be GuardIsObject / GuardShape / LoadSlot. We can then lower that IR directly to MIR for the optimizing compiler.

It gives us a lot of flexibility in choosing what to guard, without having to worry as much about getting out of sync between the baseline ICs and the optimizer's frontend. To a first approximation, our CacheIR generators are the single source of truth for speculative optimization in SpiderMonkey, and the rest of the engine just mechanically follows their lead.

There are also some cool tricks you can do when your ICs have associated IR. For example, when calling a method on a superclass, with receivers of a variety of different subclasses, you often end up with a set of ICs that all 1. Guard the different shapes of the receiver objects, 2. Guard the shared shape of the holder object, then 3. Do the call. When we detect that, we can mechanically walk the IR, collect the different receiver shapes, and generate a single stub-folded IC that instead guards against a list of shapes. The cool thing is that stub folding doesn't care whether it's looking at a call IC, or a GetProp IC, or anything else: so long as the only thing that differs is the a single GuardShape, you can make the transformation.

pizlonator · 2025-03-04T03:22:08 1741058528

> The main thing we're doing differently in SM is that all of our ICs are generated using a simple linear IR (CacheIR)

JSC calls this PolymorphicAccess. It’s a mini IR with a JIT that tries to emit optimal code based on this IR. Register allocation and everything, just for a very restricted IR.

It’s been there since I don’t remember when. I wrote it ages ago and then it has evolved into a beast.

IainIreland · 2025-03-04T06:10:46 1741068646

Taking a quick look at the JSC code, the main difference is that CacheIR is more pervasive and load-bearing. Even monomorphic cases go through CacheIR.

The main justification for CacheIR isn't that it enables us to do optimizations that can't be done in other ways. It's just a convenient unifying framework.

IainIreland · 2025-03-03T23:31:01 1741044661

We talk about this a bit in our CacheIR paper. Search for "IonBuilder".

https://www.mgaudet.ca/s/mplr23main-preprint.pdf