And arm-windows will (does already?) run x86 binaries with weaker memory orderin...

kg · on Oct 26, 2021

Are you sure the translators don't insert code necessary to maintain ordering? I would be shocked if most threaded code works when you throw out the x86 memory model. Managed runtimes like .NET definitely generate code for each target designed to maintain the correct memory model.

nyanpasu64 · on Oct 26, 2021

https://docs.microsoft.com/en-us/windows/uwp/porting/apps-on...

> You can also select multi-core settings, as shown here... These settings change the number of memory barriers used to synchronize memory accesses between cores in apps during emulation. Fast is the default mode, but the strict and very strict options will increase the number of barriers. This slows down the app, but reduces the risk of app errors. The single-core option removes all barriers but forces all app threads to run on a single core.

https://news.ycombinator.com/item?id=28732273

zamadatix's interprets this as Microsoft saying that by default, Windows on ARM runs x86 apps without x86 TSO, and turns on extra memory barriers using per-app compatibility settings. But if an app needs TSO but isn't in Windows's database, it will crash or silently corrupt data.

pmuderoc · on Oct 26, 2021

They better do, but then, how would an automatic translator know that this is a "release semantics" atomic store operation?

Because on x86 it is, no special barriers or instructions necessary.

mov [shared_data], 1

mov [release_flag], 1

my123 · on Oct 26, 2021

It’s pessimistic and converts over a lot of memory accesses to RCpc or atomics.

(on ARMv8.0 where you don’t have those, barriers are used more)

TSO pessimization is the only way to make the thing work at a translation time cost that isn’t too high.

gpderetta · on Oct 26, 2021

Or you support TSO directly on your cpu like Apple does on M1.

tsimionescu · on Oct 26, 2021

Sure, but Windows on ARM has to run on many ARM processors, not a specific one designed by MS. They could detect if the processor has non-standard TSO support and use that when running an x86 app, but they still have to do something to run the x86 app on a standard ARM processor.

my123 · on Oct 26, 2021

Maintaining the memory model guarantees is what causes the steep cost in performance when using x86 apps on Windows on Arm.

That said, heuristics are used to speed it up. I would recommend not sharing values in the stack between threads for synchronisation for example.

xxs · on Oct 26, 2021

Normally the code should have all the needed memory fences as if running on DEC Alpha, e.g. linux does that, and the compilers omit the unneeded ones.

monocasa · on Oct 26, 2021

And since the compiler omitted it on x86, an x86 emulator doesn't have access to where they're required as seen by the compiler.

xxs · on Oct 26, 2021

emulator would have a zero issue, if it's a direct transfer for assembly (not an emulator), it'd need either hardware support - e.g. apple chips, or memory barriers.

The differences between arm and x86 are known for 15y+, there is nothing new about it. Also concurrency support is one of the major benefits of languages with proper memory model - java started it with JMM[0]

[0]: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedL...

monocasa · on Oct 26, 2021

The differences are very known, it's just still an open problem how to address running code for a stronger memory model on systems with a weaker one at as high as possible performance performance without explicit hardware support (like Apple's choice of a TSO config bit). Your compiled binary for TSO has erased any LoadLoad, LoadStore, and StoreStore barriers, and the emulator has to divine them. The heuristics there are still fraught with peril.

The JVM absolutely did some great work walking this path, both in defining a memory model in the first place, and supporting that model on weak and strong hardware memory models, but the JMM was specifically designed to be able to run cleanly on WMO platforms to begin with (early Sparc), so they don't face a lot of the same problems discussed here.

gpderetta · on Oct 26, 2021

Any emulator that wants to be remotely performance competitive will do dynamic translation (i.e JIT). In fact ahead-of-time translation is not really feasible.

Memory models and JVM are not really relevant when discussing running binaries for a different architecture.

xxs · on Oct 26, 2021

The memory models are relevant as the translation/JIT/whatever has to take them in consideration. TSO is well known and it's also well known how arm weaker memory model needs memory barriers to emulate TSO.

If there is a JIT I'd expect to be able to add a read barriers, on memory location allocated by another thread - incl. allocating bits in the pointers and masking them off on each dereference. If any block appears to be shared - the code that allocated it would need to be recompiled with memory store-store barriers; the reading part would need load-load and so on. There are quite a few ways to deal with the case, aside the obvious - make the hardware compatible.

If in end it's not an easy feat to make up for the stronger memory model correctness, yet correctness should be a prime goal of an 'emulator'

stonemetal12 · on Oct 26, 2021

It is relevant as an example of how to write a JIT for a specific memory model that will run on a different architecture with a different memory model. AKA it is a known issue that has been successfully dealt with for quite a while now.

secondcoming · on Oct 26, 2021

Doesn't the JVM define its own memory model?

gpderetta · on Oct 26, 2021

Sure, but how's that relevant when discussing running x86 binaries on ARM?

xxs · on Oct 26, 2021

Ah well, back in the day, prior JMM, there was no notion of memory models at all, and no language had one; Hence, I referenced the original paper that started it all. The point was that it happened long time ago, and there is nothing new about the current case.

secondcoming · on Oct 26, 2021

Sorry, I thought you were talking about running JVM binaries on different architectures

belter · on Oct 26, 2021

Now I am worried. Do you have a reference please?

im3w1l · on Oct 26, 2021

Best I could find. It's not a great reference because it doesn't give any details but it does prove that it's a thing. https://docs.microsoft.com/en-us/windows/uwp/porting/apps-on...