Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If the Zba extension is present, sh3add.uw is a single instruction for zero-extending idx from 32 bits to 64, multiplying it by sizeof(uint64_t), and adding it to slots.

Yay, we've got an equivalent of SIB byte but as three (six?) separate opcodes. Well, sub-opcodes.

It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension.





> It's a shame though that Zcmp extension didn't get into RVA23 even as an optional extension

Zcmp is only for embedded applications without D support.

You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.

If you want load/store pair, we already have that, you can just interpret two adjacent 16-bit load or stores as a single 32-bit instruction.


> You wouldn't want an instruction with up to 13 destinations in high performance designs anyways.

Why not? Code density matters even in high-performance designs although I guess the "millicode routines" can help with that somewhat. Still, the ordering of stores/loads is undefined, and they are allowed to be re-done however many times, so... it shouldn't be onerous to implement? Expanding it into μops during the decoding stages seems straightforward.


> Expanding it into μops during the decoding stages seems straightforward.

I wouldn't say so, because if you want to be able to crack an instruction into up to N uops, now the second instruction could be placed in any slot from the 2nd to the 1+Nth and you now have to create huge shuffle hardware tk support this.

Apple for example can only crack instructions that generate up to 3 μops at decode (or before rename) anything beyond needs to be microcoded and stall decoding other instructions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: