The other major issue with the Deref implementation is that `&mut` needs to be a...

ltungv · on July 31, 2024

Yeah true, this breaks all sorts of contracts that we have with Rust xD. But if mark-and-sweep is implemented correctly, then no reference is ever held across the GC. Though, there's gonna be lots of pain debugging when you got it wrong, speaking from experience.

Do you have any resources about Rust inlining and the issues it might cause? I'd love to read more about that.

ekidd · on July 31, 2024

I would be really careful with those deref methods. They return references, not pointers, which means you need to follow the Rust rules:

- You can have any number of simultaneous readers,

- Or one writer and no readers.

- But if you ever break these rules, the world may burn.

Using unsafe and violating these rules is one of the cases where the Rust compiler can inflict worlds of misery: Incorrect code generation, CPUs seeing different versions of the same memory location, etc., depending on the exact context. Once you use "unsafe" and break the rules, Rust can be more dangerous than C. Rust even reserves the right to generate code using "noalias" (except doing so often triggers LLVM bugs, so it's usually turned off).

"Undefined behavior" means "anything at all can happen, and some of it is deeply weird and awful and counterintuitive."

You could enforce the borrowing rules at runtime by using std::cell::Cell in your heap objects, which is exactly what it exists for. Or you could package everything inside a tiny core module and audit it extremely carefully.

slaymaker1907 · on July 31, 2024

You would probably want to use RefCell instead of Cell. It allows you to safely upgrade into a &mut using only a constant reference to RefCell, but it dynamically verifies that it's actually safe using ref counting. The ref counting also isn't too expensive since it isn't atomic.

Measter · on Aug 1, 2024

> (except doing so often triggers LLVM bugs, so it's usually turned off).

It's been enabled for a couple years now.

ltungv · on July 31, 2024

I'm aware of all the issues mentioned. But for this particular project, I simply don't care as long as it passes Lox's test suite xD. I went this path just to see how easy it is to get tripped by unsafe while knowing that there's a technique to get safety with Pin<T> that this can get refactored into. I actually implemented this with Cell and RefCell but didn't find that interesting.

chc4 · on July 31, 2024

Mark and sweep doesn't stop you from holding references across GC.

If you write e.g.

``` let obj = some_object(); let len : &mut usize = &mut obj.len; // deref_mut trigger_gc(); use(*len); ```

then you held a reference across the GC, and while it's mark/sweeping created an aliases `&mut` to `len`.

Inlining was mention just because it causes function bodies to be splatting together, and so puts together code that is fine in isolation in a way that may allow Rust to observe UB: if `trigger_gc` was inlined for example then Rust has more information and code available at once, and might use the knowledge to apply some optimizations that are invalid because you caused UB.

Actually, looking at your code the much larger issue is that nothing stops you from doing

``` let obj = some_object(); let obj2 = obj.clone(); let len1 = &mut obj.len; let len2 = &mut obj2.len; let n = *len1; *len2 = 0; println!("{n}"); // what does this print? ``` because your Deref mut are just creating unconstrained borrow from pointer types that you can copy. This is almost definitely going to cause you bugs in the future, since it opens you up to accidentally writing iterator invalidation and other issues (and even worse than C++ does, because C++ doesn't optimize references as heavily as Rust does borrows)

ltungv · on July 31, 2024

Yeah, it's a self-enforced contract that objects of type Gc<T> are only ever touched by the virtual machine, but I get your point.

adrian17 · on July 31, 2024

The issue is that it's a simple footgun as soon as you start adding non-trivial native methods (say, string methods, array.map etc). The only way to make sure that they don't exist is removing the DerefMut impl entirely.

It's not just that it's possible to break the code, but that lack of any checks makes it impossible to detect at either compile-time or runtime and have it "appear to work".

One way to solve it to remove the DerefMut implementation and have it work more like Rc - as in, the user is forced to write `Gc<Cell<T>>` or `Gc<RefCell<T>>` if they want mutability. This solves the aliasing borrows issue at the cost of extra runtime checks with RefCell (and still doesn't prevent you from unsoundly calling `sweep()` while holding a Gc)

ltungv · on July 31, 2024

I implemented exactly what you were saying here (https://github.com/ltungv/rox/commit/6a611e7acb3b36d0a3a4376...). But where's the fun in that?

galangalalgol · on July 31, 2024

With a borrowchecker what does GC let you do that is more ergonomic? I have never used a borrowchecked and garbage colected language so I have no experience to consult.

ltungv · on July 31, 2024

The borrow-checker helps when you're writing Rust code. But when writing an interpreter for another language, you kinda have to support its semantics. In Lox, there's no move semantic, no borrowing, almost everything is a heap-allocated object, variables can alias, etc. Thus, you need to have a way to manage the memory of the implemented language without the borrow-checker. Here, the borrow-checker can help with implementing the GC safely and correctly, but I didn't utilize it.