> If the Zba extension is present, sh3add.uw is a single instruction for zero-ex...

camel-cdr · 2025-12-10T11:24:12 1765365852

> It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension

Zcmp is only for embedded applications without D support.

You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.

If you want load/store pair, we already have that, you can just interpret two adjacent 16-bit load or stores as a single 32-bit instruction.

Joker_vD · 2025-12-11T21:49:55 1765489795

> You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.

Why not? Code density matters even in high-performance designs although I guess the "millicode routines" can help with that somewhat. Still, the ordering of stores/loads is undefined, and they are allowed to be re-done however many times, so... it shouldn't be onerous to implement? Expanding it into μops during the decoding stages seems straightforward.

camel-cdr · 2025-12-11T22:07:59 1765490879

> Expanding it into μops during the decoding stages seems straightforward.

I wouldn't say so, because if you want to be able to crack an instruction into up to N uops, now the second instruction could be placed in any slot from the 2nd to the 1+Nth and you now have to create huge shuffle hardware tk support this.

Apple for example can only crack instructions that generate up to 3 μops at decode (or before rename) anything beyond needs to be microcoded and stall decoding other instructions.