> If the Zba extension is present, sh3add.uw is a single instruction for zero-extending idx from 32 bits to 64, multiplying it by sizeof(uint64_t), and adding it to slots.
Yay, we've got an equivalent of SIB byte but as three (six?) separate opcodes. Well, sub-opcodes.
It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension.
> You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.
Why not? Code density matters even in high-performance designs although I guess the "millicode routines" can help with that somewhat. Still, the ordering of stores/loads is undefined, and they are allowed to be re-done however many times, so... it shouldn't be onerous to implement? Expanding it into μops during the decoding stages seems straightforward.
> Expanding it into μops during the decoding stages seems straightforward.
I wouldn't say so, because if you want to be able to crack an instruction into up to N uops, now the second instruction could be placed in any slot from the 2nd to the 1+Nth and you now have to create huge shuffle hardware tk support this.
Apple for example can only crack instructions that generate up to 3 μops at decode (or before rename) anything beyond needs to be microcoded and stall decoding other instructions.
Yay, we've got an equivalent of SIB byte but as three (six?) separate opcodes. Well, sub-opcodes.
It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension.