There's plenty of literature on the topic, but you can start with "A fork() in the road" [1] that explains why this Unix feature has long passed its best-by date. Another good read is "Dot Dot Considered Harmful" [2]. There are other papers on features that have badly aged like signals for example, but I don't have them on hand.
It's interesting and I've experienced slow forks which lead to using a tiny companion process to execute programs (before spawn arrived).
I have to say I hate CreateProcess more for taking a string rather than an array of string pointers to arguments like argv. This always made it extra difficult to escape special characters in arguments correctly.
Another example is select() API, it’s still in use but the limitations are no longer adequate.
Another example is ioctl() API for communicating with device drivers. It technically works, but marshaling huge APIs like V4L2 or DRM through a single kernel call is less than ideal: https://lwn.net/Articles/897202/
Speaking of select(), a while ago I got a PR merged into SerenityOS [1] that removed it from the kernel and reimplemented it as a compatibility shim on top of poll() inside the C library.
You can shove some of the minor cruft from Unix out to the side even on a Unix-like system, but you can't get rid of it all this way.
Well, it's the best design that was implemented inside SerenityOS when I contributed this, as mentioned inside the PR. The event loop still used select() at the time, although it was migrated to poll() a couple of months ago [1].
Polling mechanisms that keep track of sets of file descriptors in-kernel are especially useful when there's a large number of them to watch, because with poll() the kernel has to keep copying the sets from userspace at each invocation. Given that SerenityOS is focused on being a Unix-like workstation operating system rather than being a high-performance server operating system, there is usually not a lot of file descriptors to poll at once in that context. It's possible that poll() will adequately serve their needs for a long time.
That PR was an exercise of reducing unnecessary code bloat in the kernel. It wasn't a performance optimization.
Evidence?