When I was young and really didn't understand Unix, my friend and were summer students at NBS (now NIST), and one fine afternoon we wondered what would happen if you ran fork() forever.
We didn't know, so we wrote the program and ran it.
This was on a PDP-11/45 running v6 or v7 Unix. The printing console (some DECWriter 133 something or other) started burping and spewing stuff about fork failing and other bad things, and a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
"What were you DOING?" he asked / yelled.
"Uh, recursive forks, to see what would happen."
He grumbled. Only a late 70s hacker with a Unix-class beard can grumble like that, the classic Unix paternal geek attitude of "I'm happy you're using this and learning, but I wish you were smarter about things."
I think we had to hard-reset the system, and it came back with an inconsistent file system which he had to repair by hand with ncheck and icheck, because this was before the days of fsck and that's what real programmers did with slightly corrupted Unix file systems back then. Uphill both ways, in the snow, on a breakfast of gravel and no documentation.
Total downtime, maybe half an hour. We were told nicely not to do that again. I think I was handed one of the illicit copies of Lions Notes a few days later. "Read that," and that's how my introduction to the guts of operating systems began.
> ...a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
It's kind of weird that, while root has always had e.g. 5% reserved disk space on the rootfs for emergencies, one thing no Unix has ever done is enforce a 5% CPU reservation for root so administrators can "talk over" a cascading failure. I think this is possible just recently in Linux with CPU namespacing, but it's still not something any OS does by default.
It's not specifically the lack of cpu timeslices that crowds out other programs, it's more like exhaustion of all the OS resources (process table fills up, file table fills up, memory runs out, swap death etc).
Sure if you carefully made everything fork-bomb-resistant then a cpu quota would be a part of it. Container systems
use fork bombs as basic test cases.
I'm surprised that this wasn't one of the primary goals of cgroups: the ability to group "all userspace processes" into one cgroup, and then say that that cgroup can in sum only use so much CPU, so many processes, so many inodes, etc. You know, a control plane/data plane separation, without requiring hypervision.
It is. Cgroup provides limits for memory, CPU time. We already have other accounting mechanisms for processes/threads (rlimits) and for inodes and disk space (disk quota systems). We've had those for ages. I imagine there will be more work to integrate these various accounting mechanisms with cgroup as the work continues.
If you care about such things the normal method is to have a backup ssh running on a different port with realtime priority , it is not used at any other time except when some process had gone runaway and you can't do anything else.
No write up that I know of. I used it in systems I've made in the past. Some of our services were running in a realtime priority and we needed a way to take care of such a system mostly in development.
Most linux distributions assign root processes a better scheduling priority than non-root processes, which should be good enough in most cases. Critical system processes also run at better priorities than other processes. It's not uncommon to see linux users consciously decide on the priority of a process by using nice or renice.
Totally limiting the CPU utilization of a group of processes requires more overhead than changing the scheduling priority since you must actively account for the CPU usage. CPU cgroups should do just that though and in most cases the overhead should be acceptable.
In your comment's parent, I don't think raw CPU utilization was the issue since kabdib mentioned fork and it was in response to a post about fork failures. The problems caused by a fork bomb are not limited to CPU utilization, see: https://en.wikipedia.org/wiki/Fork_bomb
In any case, there will likely always be some system call you can abuse to totally exhaust some resource of the kernel.
> In any case, there will likely always be some system call you can abuse to totally exhaust some resource of the kernel.
If this is true, I would expect there to exist one or more articles entitled "how I brought down my Heroku host-instance" or something along those lines. Anyone got some links? :)
It would only be possible if a limit were enforced on all non-primary namespaces.
However something that has been /possible/ for a while (but not in practice done) would be to elevate root process priority over other processes. Probably not done due to daemons needing to run as root (which is decreasing as they're able to drop privileges these days).
Root has had the ability to assign negative nice values since long, long ago. Non-root users can only assign positive niceness. The range is -20 - +19.
In theory this can give higher priority to a process, but if you cannot get into the run-queue at all (fork bomb), or the problem is in kernel space (e.g., I/O access, hang, or a kernel space loop), then it's not going to help you much.
And, sadly, most of the really hard hangs are kernel space. The general fix is to cut off all network requests/incoming jobs, powercycle, dig through logs, and try to shunt a future hang. (Sometimes just cutting incoming jobs will stop the hang, too)
On Linux, nice is not an absolute priority system.
In the old days, the Amiga operating system did use static absolute priorities for its multi-tasking. This meant that if a task with a priority of 1 wanted to use as much CPU as it wanted, then all tasks with a priority of 0 or below would be completely starved. This meant that you could boost a certain process (like, say, a CD writer) and get close to real-time behaviour. I was certainly writing coaster-free CDs on a much less powerful Amiga than a Linux box that constantly made coasters from buffer under-runs.
Linux, however, has virtual memory and "nice", which complicates matters. A process with a niceness of 19 will still take a small amount of CPU in the presence of another process with a niceness of -20. In the presence of a fork bomb, you may have a very large number of processes. If they all (by some miracle) have a niceness of 19, you still have very little CPU time left for a process with a normal or negative niceness. Infinity multiplied by a small number is still infinity. Real-time priorities are the only thing that will save you here.
You also have the problem of being able to actually change the processes' nicenesses. That requires CPU time, which you no longer have. You would be better off sending a kill signal. You also have a race condition - you obtain (from the OS) a list of processes that are running that you want to renice or kill. By the time you have iterated through each one renicing or killing them, new processes have appeared.
For several years I was a sysadmin for the University of Texas computer sciences department. (This was much later than your story, though.) If I remember correctly, the operating systems class was usually taught in the spring and they got to exploring processes sometime in late March or early April. And for about two weeks, none of our generally available systems would have an uptime of more than a couple of days.
Sure, you could get in and kill a fork-bomb before it did anything bad. But two or three on the same machine? And when you've got a couple hundred machines? It was easier to just reboot and let the victims who were inconvenienced handle explaining to the guilty how what they did was bad.
Then there were the guys who would log into one machine in a lab, fork-bomb it, move to the next machine over and make a change to their program, fork-bomb that machine, and expect to iterate that process until they passed the assignment. Leaving a wake of pitifully flailing workstations behind. Ahh, good times.
Must have been V6; I recall V7 had patches to prevent this, at least to the extent that it wouldn't crater the whole machine. I haven't thought about using ncheck & icheck since fsdb showed up about BSD4.2 or thereabouts. I remember using adb as well to fix buggered filesystems back in the ancient days.
I remember well the day one of the elder neckbeards handed me my own photocopy of the Lions books. It was enlightenment in pure form.
We didn't know, so we wrote the program and ran it.
This was on a PDP-11/45 running v6 or v7 Unix. The printing console (some DECWriter 133 something or other) started burping and spewing stuff about fork failing and other bad things, and a minute or two later one of the folks who had 'root' ran into the machine room with a panic-stricken look because the system had mostly just locked up.
"What were you DOING?" he asked / yelled.
"Uh, recursive forks, to see what would happen."
He grumbled. Only a late 70s hacker with a Unix-class beard can grumble like that, the classic Unix paternal geek attitude of "I'm happy you're using this and learning, but I wish you were smarter about things."
I think we had to hard-reset the system, and it came back with an inconsistent file system which he had to repair by hand with ncheck and icheck, because this was before the days of fsck and that's what real programmers did with slightly corrupted Unix file systems back then. Uphill both ways, in the snow, on a breakfast of gravel and no documentation.
Total downtime, maybe half an hour. We were told nicely not to do that again. I think I was handed one of the illicit copies of Lions Notes a few days later. "Read that," and that's how my introduction to the guts of operating systems began.