> They definitely could have made avx512 instructions trigger a switch to p-cores
Not really, no. OS-level schedulers are complicated as is with only P vs E cores to worry about, let alone having to dynamically move tasks because they used a CPU feature (and then moving them back after they don't need them anymore).
> and honestly probably could have supported them completely by splitting the same way AMD does on Zen4 and Zen5 C cores.
The issue with AVX512 is not (just) that you need a very wide vector unit, but mostly that you need an incredibly large register file: you go up from 16 * 256 bit = 4096 bits (AVX2) to 32 * 512 bit = 16384 bits (AVX512), and on top of that you need to add a whole bunch of extra registers for renaming purposes.
> The issue with AVX512 is not (just) that you need a very wide vector unit, but mostly that you need an incredibly large register file
Not necessarily, you need to behave as if you had that many registers, but IMO it would be way better if the E cores had supported avx512, but half of the registers actually didn't exist and just were in the L2 cache.
Also Zen4C has AVX512 support while being only ~35% bigger than Gracemont (although TSMC node advantage means you should possibly add another 10% or so). This isn't really a fair comparison because Zen4c is a very differently optimized core than Intel's E cores, but I do think it shows that AVX-512 can be implemented with a reasonable footprint.
Or if Intel really didn't want to do that, they needed to get AVX-10 ready for 2020 rather than going back and forth on it fore ~8 years.
They could enable it on P cores with a separate enablement check and then leave it up to the developer to schedule their code on a P core. I imagine Linux has some API to do that scheduling (because macOS does), not sure about Windows.
So introduce performance and efficiency profiles for threads at the OS level. Why should CPUs have to be heterogeneous with regard to the ISA and other details?
ACHI (which is how SATA is exposed) doesn't change any of this. The only thing that is affected is how the loaded OS has to talk to the disk controller. The real thing that loses space is the BPB (a 60 bytes or so, iirc) because some BIOSes are broken and require it and the MBR (but that's only a couple bytes). At least [bootelf] manages to fit in (without a BPB or an MBR) with 128 bytes to spare, enough for a dummy BPB that makes all BIOSes happy.
Additionally, UEFI's reliability is.. sketchy, as far as I know (using the classic logic of "if Windows doesn't use it does it really matter?"). And GNU-EFI suffers from build portability troubles, AFAIK.
If you chose AHCI on QEMU it will require a partition table to be present on the disk, so it recognizes it as a bootable disk. in addition to the magic value. If you do not add the partition table.
Thanks for this comment. It makes me realize this is likely not an AHCI thing ,but how the seaBIOS(? qemu's flavor?) handles enumerating disks via AHCI rather than IDE.
If it uses the IDE controller then it will recognize the boot disk. If i pick AHCI, i need to add the partition table.
UEFI reliability is sketch, but really BIOS is incredibly crap, so much more than UEFI.
Windows DOES use EFI/UEFI, how else will it boot on a system that has EFI/UEFI firmware inside of it? It can let you do secureboot, edit efi variables... - where do you get this classic logic from? (maybe i am totally misisng something, but they interface with it, and should thus use the spec? even tho they might not use gnuefi or EDK2 ofc :P (edk2 is still likley...))
Can you show a source for this claim? There exists this codepath in adp_open:
int adp_open(struct inode *inode, struct file *filp)
{
if (current->comm[0] == 'X')
return -EBUSY;
return drm_open(inode, filp);
}
but:
(a) You know what "adp" stands for? It stands for Apple Display Pipe, or the touchbar.
(b) Back when I was dailydriving asahi, I actually had to patch that to check for 's' instead of 'X', because sway dies on this exact thing too.
Can you provide any citations for this (extraordinary, if true) claim? My knowledge of operating systems (which is based on experience in designing toy ones, but also studying other operating systems) suggests this is not the case on at least windows, macOS and linux. Feel free to correct me if I am wrong however!
> Or maybe I'm all wrong. I'm not an OS dev, so please someone correct me.
Nope, you are completl right! A while (true) might slow down the system, but even that is not necessarily the case: I wrote a program to test this (based on the pseudocode in the readme) and my system is totally usable, with basically zero lag! This is the power of a well-written operating system that includes a mysterious concept called priorities. There is actually no discernable different when running the program (other than battery use and temperature going up). In fact, I am running that program as I right this because the difference is so negligable.
But understand that modern CPUs have hyperthreading, meaning more than one independent thread of execution. A while(true) on one doesn't affect the other(s). So you won't notice much - other apps keep running.
Not because it isn't a resource disaster. But because unless you measure something, you might not notice.
Oh! You did measure something. Battery use and temp. There you go. It's a disaster. Some kind of management of such irresponsible threads is definitely a good idea.
> But understand that modern CPUs have hyperthreading, meaning more than one independent thread of execution.
I know. But spinning up threads in an infinite loop is going to get code running on all threads (and indeed, that can be confirmed on `htop`). Also, not all CPUs have hyperthreading: the apple M1 chip (which is indeed my CPU) does not have support for that.
> Oh! You did measure something. Battery use and temp.
So I didn't do any scientific measurements, but the temperature difference doesn't seem to be very significant.
> It's a disaster.
But cooperative scheduling isn't any better. In fact, it's even worse! Since if an app doesn't yield (and let's face it, no app is perfect), you can easily get a deadlock. Developing for such a system without a VM sounds like a nightmare too to be honest.
> Some kind of management of such irresponsible threads is definitely a good idea.
What kind of managment do you propose? I mean, if there is nothing else running on the system, it's probably okay to just let them keep running (it's not like they are harming anything except battery life, and you might want something that can max all cpu cores, like if you are running a compilation or a video game).
Not really, no. OS-level schedulers are complicated as is with only P vs E cores to worry about, let alone having to dynamically move tasks because they used a CPU feature (and then moving them back after they don't need them anymore).
> and honestly probably could have supported them completely by splitting the same way AMD does on Zen4 and Zen5 C cores.
The issue with AVX512 is not (just) that you need a very wide vector unit, but mostly that you need an incredibly large register file: you go up from 16 * 256 bit = 4096 bits (AVX2) to 32 * 512 bit = 16384 bits (AVX512), and on top of that you need to add a whole bunch of extra registers for renaming purposes.