Your computer is a distributed system

throwaway787544 · on March 30, 2022

The thing we are missing still is the distributed OS. Kubernetes only exists because of the missing abstractions in Linux to be able to do computation, discovery, message passing/IO, instrumentation over multiple nodes. If you could do ps -A and see all processes on all nodes, or run a program and have it automatically execute on a random node, or if (grumble grumble) Systemd unit files would schedule a minimum of X processes on N nodes, most of the K8s ecosystem would become redundant. A lot of other components like unified AuthZ for linux already exist, as well as networking (WireGuard anyone?).

wwalexander · on March 30, 2022

Plan 9 was designed in this way, but never took off.

Rob Pike:

> This is 2012 and we're still stitching together little microcomputers with HTTPS and ssh and calling it revolutionary. I sorely miss the unified system view of the world we had at Bell Labs, and the way things are going that seems unlikely to come back any time soon.

jasonwatkinspdx · on March 30, 2022

I think Rob is right to call out the problem, but is being a bit rose colored about Plan 9.

Plan 9 was definitely ahead of its time, but it's also a far cry from the sort of distributed OS we need today. "Everything is a remote posix file" ends up being a really bad abstraction for distributed computing. What people are doing today with warehouse scale clusters indeed has a ton of layers of crap in there, and I think it's obvious to yern for sweeping that away. But there's no chance you could do that with P9 as it was designed.

wahern · on March 30, 2022

"Everything is a file" originally referred to read and write as universal object interfaces. It's similar to Smalltalk's send/receive as an idealized model for object-based programming. Hierarchical filesystem namespaces for object enumeration and acquisition is tangential, though it often works well because most namespaces (DNS, etc) tend to be hierarchical. (POSIX filesystem semantics doesn't really figure into Plan 9 except, perhaps, incidentally.) Filesystem namespacing isn't quite as abstract, though (open, readdir, etc, are much more concrete interfaces), making impedance mismatch more likely.

The abstraction is sound. We ended up with TCP and HTTP instead of IL and 9P (and at scale, URLs instead of file descriptors), because of trust issues, but that's not surprising. Ultimately the interface of read/write sits squarely in the middle of all of them, and most others. To build a distributed system with different primitives at the core, for example, send/receive, requires creating significantly stronger constraints on usage and implementation environments. People do that all the time, but in practice they do so by building atop the file interface model. That's what makes the "everything is a file" model so powerful--it's an interoperability sweet spot; an axis around which you can expect most large-scale architectures to revolve around at their core, even if the read/write abstraction isn't visible at the point users (e.g. application developers) interact with the architecture.

jasonwatkinspdx · on March 30, 2022

A hierarchical namespace is fine, but the open/read/write/sync/close protocol on byte based files is definitely inadequate. The constraints on usage you decry are in fact fundamental constraints of distributed computing that are at odds with the filesystem abstraction. And this is exactly what I was getting at in talking about rose colored glasses with P9. It in no way is a replacement for something like Colossus or Spanner.

zozbot234 · on March 30, 2022

> P9 ... in no way is a replacement for something like Colossus or Spanner.

Colossus and Spanner are both proprietary so there's very limited info on them, but both seem to be built for very specialized goals and constraints. So, not really on the same level as a general system interface like 9P, which is most readily comparable to, e.g. HTTP. In Plan 9, 9P servers are routunely used to locally wrap connections to such exotic systems. You can even require the file system interface locally exposed by 9P to be endowed with extra semantics, e.g. via special messages written to a 'control' file. So any level of compatibility or lack thereof with simple *nix bytestreams can be supported.

pjmlp · on March 31, 2022

Or proper graphics programming as well.

jlpom · on March 30, 2022

This describe more a Single System Image [0] to me (WPD includes Plan 9 as one but considering it does not does not supports process migration I find it moot). LinuxPMI [1] seems to be a good idea but they seems to be based on Linux 2.6, so you would have to heavily patch newer kernel. The only thing that seems to support process migration with current software / still active are CRIU [2] (which doesn't support graphical/wayland programs) and DragonflyBSD [3] (in their own words very basic).

[0]: https://en.wikipedia.org/wiki/Single_system_image [1]: http://linuxpmi.org [2]: criu.org [3]: https://man.dragonflybsd.org/?command=sys_checkpoint&section...

stormbrew · on March 30, 2022

I don't really see any reason to consider process migration a required feature of either a distributed os or a single system image. Even on a single computer this isn't always practical or desireable (ie. you can't 'migrate' a program running on your gpu to your cpu, and you can't trivially migrate a thread from one process to another either).

Not all units of computation are interchangeable, and a system that recognizes this and doesn't try to shoehorn everything down to the lowest common denominator actually gains some expressive power over a uniform system (else we would not have threads).

MisterTea · on March 30, 2022

> his describe more a Single System Image [0] to me

No, Plan 9 is not a SSI OS. The idea is all resources are exposed via a single unified file oriented protocol: 9p. All devices are files which means all communication happens over fd's meaning you look at your computer like a patch bay of resources, all communicated with via read() and write(). e.g.:

  [physical disk]<-->[kernel: sd(3)]-----< /dev/sdE0/
  [audio card] <---->[kernel: audio(3)]--< /dev/audio
  [keyboard]-------->[kernel: kbd(3)]----< /dev/kbd

Looking above it looks like Unix but with MAJOR differences. First off the disk is a directory containing partitions which are just files who's size is the partitions size. You can read or write those files as you please. Since the kernel only cares about exposing hardware as files, the file system on a partition needs to be translated to 9p. We do this with a program that is a file server which interprets e.g. a fat32 fs and serves it via 9p (dossrv(4)). Your disk based file system is just a user-space program.

And since files are the interface you can bind over them to replace them with a different service like mixfs(4). /dev/audio is like the old linux oss where only one program could open a sound card at a time. To remedy this on plan 9 you run mixfs which opens /dev/audio and then binds itself over /dev replacing /dev/audio in that namespace with a multiplexed /dev/audio from mixfs. Now you start your window manager and the children programs will see mixfs's /dev/audio instead of the kernel /dev/audio. Your programs can now play audio simultaneously without changing ANYTHING. Now compare that simplicity to the trash fire linux audio has been and continues to be with yet another audio subsystem.

Keyboard keymaps are a filter program sitting between /dev/kbd and your program. All it does is read in key codes and maps key presses according to a key map which is just a file with key->mapping lines. Again, keyboards are files so a user space file server can be a keyboard such as a GUI keyboard that binds itself over /dev/kbd.

Now all those files can be exported or imported to other machines, regardless of CPU architecture.

Unix is an OS built on top of a single machine. Plan 9 is a Unix built on top of a network. It's the closest I can get to computing nirvana where all my resources are available from any machine with simple commands that are part of the base OS which is tiny compared to the rest.

nyanpasu64 · on March 31, 2022

Does Plan 9 have an equivalent to alsa or jackaudio or pipewire, where you can pick a (usually shared memory) ring buffer size, have the kernel or daemon alert you (usually through a poll fd) when there's room in the buffer, and only then perform the necessary calculations (instead of performing a blocking write which adds latency)? What about synchronously receiving blocks of input audio as soon as available, and outputting equally-sized aligned blocks of output audio played back exactly two periods of buffering after the matching input audio? Or chaining three or more apps in a pipeline, all woken in sequence whenever input data is available, and still have only two periods of latency?

MisterTea · on March 31, 2022

This is not supported yet but if you'd like to have a go at it, patches welcome.

emteycz · on March 30, 2022

Best explanation of Plan 9 I've ever seen

zozbot234 · on March 30, 2022

Graphical programs could be checkpointed and restored as long as they don't directly connect to the hardware. (Because the checkpoint/restore system has no idea how to grab the hardware's relevant state or replicate it on restore.) This means running those apps in a hardware-independent way (e.g. using a separate Wayland instance that connects to the system one), but aside from that it ought to be usable.

jlpom · on March 30, 2022

For CRIU it is not supported: https://criu.org/Integration#Wayland.2FWeston, also in my experience it doesn't work. Are you talking about an other software?

zozbot234 · on March 30, 2022

It has been done "virtually" by going through e.g. VNC https://criu.org/VNC . Alternately, CRIU apps could be required to use virt-* devices, which CRIU might checkpoint and restore similar to VM's.

gnufx · on March 30, 2022

For what it's worth, the HPC-standard way of checkpointing/migrating distributed execution (in userspace, unlike CRIU) is https://dmtcp.sourceforge.io/ It supports X via VNC -- I've never tried -- but I guess you could use xpra.

gnufx · on March 30, 2022

If you yearn for Plan 9 -- I'm not sure I do -- Minnich's current incarnation of the inspiration seems to be https://github.com/u-root/cpu

NavinF · on March 30, 2022

Meh. Every time a 9p server dies, every client dies. Plan9 is not comparable to k8s.

monocasa · on March 30, 2022

Are there any good walkthroughs of what a good, distributed plan 9 setup looks like from either a development or a administration perspective? Particularly an emphasis on many distributed compute nodes (or cpu servers in plan 9 parlance).

nautilus12 · on March 30, 2022

Glad to see Plan 9 getting some love in the comments even if it didn't make it into the article.

oceanplexian · on March 30, 2022

Am I the one who doesn’t want this?

The entire point of UNIX philosophy (Which seems to be something they aren’t teaching in software development these days) is to do one thing and do it well. We don’t need Linux operating operating as a big declarative distributed system with a distributed scheduling systems and a million half-baked APIs to interact with it, the way K8s works. If you want that you should build something to your specific requirements, not shove more things into the kernel.

Karrot_Kream · on March 30, 2022

The UNIX philosophy made more sense as an abstraction for a computer when computers were simpler. Computers nowadays (well at least since 2006-ish) have multiple cores executing simultaneously with complicated amounts of background logic, interrupt-driven logic, shared caches, etc. The UNIX philosophy doesn't map to this reality at all. Right now there's no set of abstractions except machine code that exposes the machine's distributed systems' in a coherent abstraction. Nothing is stopping someone else from writing a UNIX abstraction atop this though.

generalizations · on March 30, 2022

The idea of doing one thing, and doing it well, isn't dependent on the simplicity of the underlying system (I imagine that PDP-11 systems seemed impressively complicated in their time, too). The UNIX philosophy is a paradigm for managing complexity. To me, that seems more relevant with modern computers, not less.

> “A program is generally exponentially complicated by the number of notions that it invents for itself. To reduce this complication to a minimum, you have to make the number of notions zero or one, which are two numbers that can be raised to any power without disturbing this concept. Since you cannot achieve much with zero notions, it is my belief that you should base systems on a single notion.” - Ken Thompson

aseipp · on March 30, 2022

All of this was possible with QNX literally decades ago, and it didn't need whatever strawman argument you're making up in your head in order to accomplish it. QNX was small, fast, lean, real-time, distributed, and very powerful for the time. Don't worry, it even had POSIX support. A modern QNX would be very well received, I think, precisely because taking a distributed-first approach would dramatically simplify the whole system design versus tacking on a distributed layer on top of one designed for single computers.

> Which seems to be something they aren’t teaching in software development these days

This is funny. Perhaps the thing you should have been taught instead is history, my friend.

jeffreygoesto · on March 30, 2022

You mean QNet [0]? That is still alive... It is for LAN use ("Qnet is intended for a network of trusted machines that are all running QNX Neutrino and that all use the same endianness."), so extra care is needed to secure this group of machines when exposed to the internet.

[0] https://www.qnx.com/developers/docs/7.0.0///index.html#com.q...

[1] https://recon.cx/2018/brussels/resources/slides/RECON-BRX-20...

aseipp · on March 30, 2022

Correct. Thought QNet itself is only one possible implementation, in a sense (but obviously the one shipped with QNX.) And the more important part of the whole thing is the message-passing API design built into the system, which enables said networking transparency, because it means your programs are abstracted over the underlying transport mechanism.

"LAN use" I think would qualify roughly 95% of the need for a "distributed OS," including a lot of usage of K8s, frankly. Systems with WAN latency impose a different set of challenges for efficient comms at the OS layer. But even then you also have to design your apps themselves to handle WAN-scale latencies, failover, etc too. So it isn't like QNX is going to make your single-executable app magic or whatever bullshit. But it exposes a set of primitives that are much more tightly woven into the core system design and much more flexible for IPC. Which is what a distributed system is; a large chattery IPC system.

The RECON PDF is a very good illustration of where such a design needs to go, though. It doesn't surprise me QNX is simply behind modern OS's exploit mitigations. But on top of that, a modern take on this would have to blend in a better security model. You'd really just need to throw out the whole UNIX permission model frankly, it's simply terrible as far as modern security design is concerned. QNet would obviously have to change as well. You'd at minimum want something like a capability-based RPC layer I'd think. Every "application server" is like an addressable object you can refer to, invoke methods on, etc. (Cap'n Proto is a good way to get a "feel" for this kind of object-based server design without abandoning Linux, if you use its RPC layer.)

I desperately wish someone would reinvent QNX but with all the nice trappings and avoiding the missteps we've accumulated over the past 10 to 15 years. Alas, it's much more profitable to simply re-invent its features poorly every couple of years and sell that instead.

This overview of the QNX architecture (from 1992!) is one of my favorite papers for its simplicity and straightforward prose. Worth a read for anyone who like OS design.

https://cseweb.ucsd.edu/~voelker/cse221/papers/qnx-paper92.p...

pjmlp · on March 30, 2022

The philosophy that is cargo culted and was never taken seriously by any commercial UNIX.

random314 · on March 31, 2022

Moreover when folks talk about doing only one thing and one thing well, they are referring to command line utilities and pipes. And pipes were an invention of Doug mclroy, not Ritchie or Thompson.

And the unix command-pipe philosophy is realized much better as ordinary functions in a functional programming language.

pjmlp · on March 31, 2022

Indeed, the UNIX CLI is a poor man's experience compared with a Lisp REPL, Smalltalk transcript or Oberon commands session.

random314 · on March 30, 2022

The Unix philosophy was a reasonably good model decades ago. But I think it is over romanticized.

It's binary blob design is no good for security, as opposed to a byte code design like Forth. Its user security model was poor and doesn't help with modern devices like phones. Its multiprocess model was ham fisted into a multithreading model to compete with windows NT. Its asynchronous i/o model has always been a train wreck even compared to NT. Its design creates performance issues, especially in multiproc networking code with needless amount of memcopys. Now folks are rewriting the networking stack in user space. Its software abstraction layer was some simple scheme from the 70s which has fragmented into crazy number of implementations now. Open source developers still complain about how much easier it is to build a package for windows, as opposed to linux. It was never meant to be a distributed system either. Modern enterprise compute cannot scale by treating and managing each individual VM as it's own thing with clusters held together by some sysadmins batch scripts.

aseipp · on March 30, 2022

A good paper giving a concrete example of all this is "A fork() in the road", where you can see how an API just like fork(2) has an absolutely massive amount of ramifications on the overall design of the system, to the point "POSIX compliance" resulted in some substantial perversions of the authors' non-traditional OS design, all of which did nothing but add complexity and failure modes ("oh, but I thought UNIX magically gave you simplicity and made everything easy?") It also has significantly diverged from its "simple" original incarnation in the PDP-11 to a massive complex beast. So you can add "CreateProcess(), not fork()" on the list of things NT did better, IMO.

And that's just a single system call, albeit a very important one. People simply vastly overestimate how rose-tinted their glasses are and all the devils in the details, until they actually get into the nitty gritty of it all.

https://www.microsoft.com/en-us/research/uploads/prod/2019/0...

nyanpasu64 · on March 31, 2022

I agree that fork (a Unix implementation detail) creates issues like overcommit and complicating memory management (took a look at the paper and I won't dispute the issues it points out). I don't agree that farming out in-app functionality into a herd of daemons (d-bus, desktop portals, pipewire, pipewire-pulse, wireplumber, system and user systemd with daemon-level environment variables) is beneficial for system functionality and doesn't create added complexity (each daemon's state, which daemons are running or crashed, reliance on IPC instead of being able to trace each process in isolation) and new failure modes (apps can't find D-Bus to load the desktop portal, and hang instead, if you login to Wayfire unless you login to Xfce first without killing systemd --user).

anthk · on March 30, 2022

And yet Linux manages better doing heavy I/O stuff over filesystems than Windows NT.

pjmlp · on March 30, 2022

Because it doesn't provide the abstraction capabilites that NTFS allows for third parties, so naturally it is faster doing less.

anthk · on March 31, 2022

Ever checked xattr's and ACL's?

random314 · on April 1, 2022

He is talking about the NTFS file filter stack.

anthk · on April 1, 2022

https://docs.microsoft.com/en-us/windows-hardware/drivers/if...

Ah, a bit meh. Haiku OS/Be did similar queries with BFS and yet it can be much faster on SSD's.

Ok, no proper permissions/ACL's, but NTFS it's on par on EXT3 performance with some additions.

It needs two news FS', one for desktops and another one for the enteprise. Linux' ones should be F2FS for flash media and BcacheFS for the professional storage needs.

random314 · on April 12, 2022

It looks like you have zero familiarity with NTFS. NTFS had a far more fine grained ACL model since version 1. Perhaps linux caught up several decades later. I am not really sure.

I am also not sure why you insist on arguing about a topic you are not familiar with at all.

goodpoint · on March 30, 2022

Linux/UNIX does not have to turn into a mess like k8s to be natively distributed. Plan9 was doing it with a tiny codebase in comparison.

NavinF · on March 30, 2022

Eh I can’t see Linux getting a built-in distributed kv store (etcd) any time soon. Same goes for distributed filesystems. All you have out of the box is nfs which gives you the worst of both worlds: Every nfs server is a SPOF yet these servers don’t take advantage of their position to guarantee even basic consistency (atomic appends) that you get for free everywhere else.

And besides how would you even implement all those features you listed without recreating k8s? A distributed “ps -A” that just runs “for s in servers; ssh user@$s ps; done” and sorts the output would be trivial, but anything more complex (e.g. keeping at least 5 instances of an app running as machines die) requires distributed and consistent state.

throwaway787544 · on March 30, 2022

Fwiw those features existed in Mosix (a Linux SSI patch) 2 decades ago... I feel like we could probably do it again

In terms of CAP, yeah it might not have been technically as reliable. But there's different levels of reliability for different applications; we could implement a lot of it in userland and tailor as needed

NavinF · on March 30, 2022

I call BS. I can’t find any details about how mosix handled storage, but what I did find suggests nfs semantics. That’s totally unusable which is probably why the project died decades ago. (And apparently you had to recompile every app because they changed the syscall ABI to add a node ID to every inode or something? Guess they were speedrunning obsolescence)

> we could implement a lot of it in userland

Yeah that’s k8s, etcd, ceph, and the distributed database of the week.

throwaway787544 · on April 4, 2022

You didn't need to recompile programs, that was the whole idea. Distribute any app's compute over many nodes. But shared memory and threading were very hard to distribute and I/O was not distributed except for mfs (a distributed layer on NFS) which did work fine. But obviously NFS is not suitable for all applications, in which case you could use any other form of distributed I/O.

It worked great for forking apps. Trouble was hell would freeze over before the patches got merged and most people thought it wouldn't be widely adopted without shared memory and threads.

But the point is, it did run arbitrary apps across distributed nodes, you could see any node's processes and instrument them, you could see the filesystem of any node. This isn't some advanced mystic sorcery, it was there two decades ago. Clearly we could implement these features again in some new way - not as an SSI, but at least allowing an assortment of system-level RPC and some sort of distributed pluggable VFS.

And also my point is: sure, we have all these 3rd party userland solutions, and that is bad. It means nothing is supported until it's been "integrated". It means we have miles and miles of plumbing that schmucks like me are paid to set up before a JavaScript developer can run their piddly web app across 3 nodes. It should just be baked into the OS, batteries included. A lot less annoying bullshit, a lot more standardization, and the ability to get more shit done with less effort. That is the entire point of operating systems, to make it easier to run programs. Not to make it necessary to add 15 million new abstractions before you can run your programs.

zozbot234 · on March 30, 2022

> requires distributed and consistent state

Distributed yes, but not necessarily consistent. You can use CRDTs to manage "partial, flexible" consistency requirements. This might mean, e.g. sometimes having more than 5 instances running, but should come with increased flexibility overall.

zozbot234 · on March 30, 2022

The abstractions are there in Linux, largely imported from plan 9. And work is ongoing to support further abstractions, such as easy checkpoint/restore of whole containers. Kubernetes is a very new framework intended to support large-scale orchestration and deployment in a mostly automated way, driven by 'declarative' configuration; at some point, these features will be rewritten in a way that's easier to understand and perhaps extend further.

MisterTea · on March 30, 2022

> The abstractions are there in Linux, largely imported from plan 9.

Which abstractions are those?

zozbot234 · on March 30, 2022

> to be able to do computation, discovery, message passing/IO, instrumentation over multiple nodes.

Kernel namespaces are the building blocks for this, because an app that accesses all kernel-managed resources via separate namespaces is insulated from the specifics of any single node, and can thus be transparently migrated elsewhere. It enables the kind of location independence that OP is arguing for here.

stormbrew · on March 30, 2022

Linux namespaces don't actually do any of those things though? Like, not even a single one of them are made possible because of namespaces. They're all possible or not possible precisely as much with or without namespaces.

The thing is when comparing plan9 and linux here, you have to recognize that linux has it backwards. On plan9 namespaces are emergent from the distributed structure of the system. On linux they form useful tools to build a distributed system.

But what's possible on plan9 is possible because it really does do "everything is a file," so your namespace is made up of io devices (files) and you can construct or reconstruct that namespace as you need.

Like, this[1] is a description of how to configure plan9's cpu service so you run programs on another node.

[1] https://9p.io/wiki/plan9/Expanding_your_Grid/index.html

Nothing in there makes any sense from a linux containers perspective. You can't namespace the cpu. You can't namespace the gui terminal. All you can namespace is relatively superficial things, and even then opening up that namespacing to unprivileged users has resulted in several linux CVEs over the last year because it's just not built with the right assumptions.

zozbot234 · on March 30, 2022

Doesn't Linux create device files in userspace these days, anyway? I thought that's what that udev stuff was all about. So I'm not sure that the Plan9 workflow is inherently unfeasible, there's just no idiomatic support for it just yet.

stormbrew · on March 30, 2022

device nodes are managed in userspace nowadays yes, but they're just special files that identify a particular device id pair and then the OS acts on them in a special way. udev is just the userspace part of things that manages adding and removing them in response to hotplug events. Everything that matters about them is still controlled by the kernel.

glorfindel66 · on March 30, 2022

That’s not at all what Linux namespaces permit. It’s a side effect of using them that could be leveraged using something like CRIU, sure, but it’s not what they’re for and they’re not a building block for anything mentioned in the portion of their comment you quoted.

Namespaces simply make the kernel lie when asked about sockets and users and such. It’s intended for isolation on a single server. They’re next to useless in distributed work, particularly the kind being discussed here (Plan 9ish). You actually want the opposite: to accomplish that, you want the kernel to lie even harder and make things up in the context of those interfaces, rather than hide things. Namespaces don’t really get you there in their current form.

zozbot234 · on March 30, 2022

> That’s not at all what Linux namespaces permit.

Isolating processes from the specifics of the system they're running on is a key feature of the namespace-based model; it seems weird to call it a "side effect only". We should keep in mind that CRIU itself is still a fairly new feature that's only entered mainline recently, and the kernel already has plenty of ways to "make up" more virtual resources that are effectively controlled by userspace. While it may be true that these things are largely ad hoc for now, it's not clear that this will be an obstacle in the future,

gnufx · on March 30, 2022

I can talk about namespaces in HPC distributed systems, and they don't look anything like Plan 9 to me. They make life harder in various respects, and even dangerous with Linux features that don't take them into account (like at least one of the "zero-copy" add-on modules used by MPI shared memory implementations).

ff317 · on March 30, 2022

There were older attempts at this stuff, in the 90s with "Beowulf" clusters that had cross-machine process management and whatnot. It's a lot harder than it seems to make this approach make sense in the real world, as the abstraction hides important operational details. The explicit container + orchestration abstraction is probably closer to the ideal than trying to stretch linux/systemd/cgroups across the network "seamlessly". It's clearer what's going on and what the operational trade-offs are.

mnd999 · on March 30, 2022

Imagine a Beowulf cluster of hot grits in soviet Russia with CowboyNeal.

gnufx · on March 30, 2022

> in the 90s with "Beowulf" clusters

In case of any confusion, that sort of thing wasn't a generic Beowulf feature, but it sounds like Bproc. I don't know if it's still used. (The Sourceforge version is ancient.)

https://updates.penguincomputing.com/clusterware/6/docs/clus... https://sourceforge.net/projects/bproc/

Containers actually only make it harder to "orchestrate" your distributed processes in an HPC system.

evandrofisico · on March 30, 2022

Actually, at some point in the 2.4 kernel it was possibile to do that, with single image systems, such as openmosix, that handled process discovery, computation and much more, but underneath the simple user interface it was complex, kinda insecure and so, was never abandoned and never ported to newer versions.

gnufx · on March 30, 2022

Distributed computation with message passing (and RDMA) is the essence of HPC systems. SGI systems supported multi-node Linux single system images up to ~1024 cores a fair few years ago, but they depend on a coherent interconnect (NUMAlink, originally from the MPIS-based systems under Irix).

However, you don't ignore the distributed nature of even single HPC nodes unless you want to risk perhaps an order of magnitude performance loss. SMP these days doesn't stand for Symmetric Multi-Processing.

zozbot234 · on March 30, 2022

Distributed shared memory is feasible in theory even via being provided in-software by the OS. You're right that this would not change the physical reality of message passing, but it would allow a single multi-processor application code to operate seamlessly using either shared memory on a single node, or distributed memory on a large cluster.

gnufx · on March 30, 2022

I talk about the practice in HPC, not theory, and this stuff is literally standard (remote memory of various types and the same thing running the same, modulo performance and resources, on a 32-core node as on one core each of 32 nodes). However, you still need to consider network non-uniformity at levels from at least NUMA nodes up, at least if you want performance in general.

Karrot_Kream · on March 30, 2022

I think you're looking at the wrong abstraction level. You're thinking on a node (computer) basis. Even on a single computer, many of the things that happen are distributed. DMA controllers, input interrupts, kernel-forced context switches, there's a lot going on there but we still pretend that our computers are just executing sequential code. I agree with the OP and think it's high time we treat the computer as the distributed system it is. Fuschia and GenodeOS are both making developments in this direction.

cultofmetatron · on March 30, 2022

I built my startup in elixir and the erlang VM provides all of these. its kind of amazing. Things we've been able to have out of the box

* a metrics dashaboard that gives you a ps -A for all our nodes

* intermachine pubsub without having to setup a third party message queue

* auto failover for all our microservices

* spinning up a microservice takes about as much effort as adding a controller in rails

* cronjobs where one node can trigger a job on another node in the network. Hell, we have crons scheduled where we don't even know which machine it'll run on. it just gets done

sethammons · on March 31, 2022

I was thinking elixir/erlang too. I've only been using it for a couple of months but I've quickly come to the conclusion that their claims of robustness are not in line with what I want/need. For example, the pubsub lacks persistence and if the node that carries that info dies, you lose that data. There is no built in consensus that tries to maintain state in the face of failure, so you bring in Oban. I've yet to experience the advantages of elixir. Sure, I can hot reload a module, but I spend probably an hour a day waiting for things to compile. I prefer k8s and Go but figure that may be because I'm still new to the ecosystem.

cultofmetatron · on March 31, 2022

so far we've had little need for persistence in pubsub, for the few places we do, we have used oban the same way you have. It would be easy to pull in a library like Yggdrasil and abstract it away. For the most part, we just haven't needed it enough to justify setting up rabbitmq or kafka. k8 is indeed useful but the benefit of elixir here is that I can setup the supervision tree in pure elixir. By keeping things simple, we've been able to focus on pushing out features instead of worrying about infrastructure.

uvdn7 · on March 30, 2022

Abstract a fleet of machines as single super computer sounds nice. But how about partial failures? It's something that a real stateful distributed system would have to deal with all the time but a single host machine almost never deals with (do you worry about a single cacheline failure when writing a program?).

marcosdumay · on March 30, 2022

There is a huge amount of research about distributed OSes (really, they were very fashionable at the 90's and early 00's). Plenty of people worked on this problem, and it's basically solved (as in, we don't have any optimal solution, but it won't be a problem on a real system).

NavinF · on March 30, 2022

It’s “basically solved” in the sense that everyone gave up on distributed OSes and used k8s instead.

zozbot234 · on March 30, 2022

K8s is doing distributed OS's on easy mode, supporting basically ephemeral 'webscale' workloads for pure horizontal scaling. Even then it introduces legendary amounts of non-essential complexity in pursuit of this goal. It gets used because "Worse is better" is a thing, not because anyone thinks it's an unusually effective way to address these problems.

pjmlp · on March 31, 2022

I see K8S as Application Servers for everyone, with containers replacing EARs, it certainly gets some WebLogic/WebSphere vibes when looking at those yml files and how we used to setup an Application Server cluster.

marcosdumay · on March 31, 2022

K8s just pushes the problem into the application layer, where every developer must solve it independently. It's not like it solves anything.

f0e4c2f7 · on March 30, 2022

I very much agree with this and while Kubernetes is better than a poke in the eye, I look forward to the day when there is a true distributed OS available in the way you describe. It's possible Kubernetes could even grow into that somehow.

pjmlp · on March 31, 2022

Kubernetes only exists because people wanted to do Application Servers in any language, and now they are rediscovering them trying to sell us on Kubernetes + WebAssembly, the irony.

als0 · on March 30, 2022

I remember the Barrelfish OS was trying to tackle this problem head on https://barrelfish.org/

icedchai · on March 30, 2022

I think OpenVMS did this... in the 80's.

AdamH12113 · on March 30, 2022

>[The fact that computers are made of many components separated by communication buses] suggests that it may be possible to abstract away the distributed nature of larger-scale systems.

This is a neat line of thought, but I don't think it can go very far. There is a huge difference in reliability and predictability between small-scale and large-scale systems. One way to see this is to look at power supplies. Two ICs on the same board can be running off of the same 3.3V supply, and will almost certainly have a single upstream AC connection to the mains. When thinking about communications between the ICs, you don't have to consider power failure because a power failure will take down both ICs. Compare this to a WiFi network where two devices could be on separate parts of the power grid!

Other kinds of failures are rare enough to be ignored completely for most applications. An Ethernet cable can be unplugged. A PCB trace can't.

I used to work with a low-level digital communication protocol called I²C. It's designed for communication between two chips on the same board. There is no defined timeout for communication. A single malfunctioning slave device can hang the entire bus. According to the official protocol spec, the recommended way of dealing with this is to reset every device on the bus (which may mean resetting the entire board). If a hardware reset is not available, the recommendation is to power-cycle the system! [1]

Now I²C is a particularly sloppy protocol, and higher-level versions (SMBus and PMBus) do fix these problems, so this is a bit of an extreme example. But the fact that I²C is still commonly used today shows how reliable a small-scale electronic system can be. Even at the PC level, low-level hardware faults are rare enough that they're often indicated only by weird behavior ("My system hangs when the GPU gets hot"), and the solution is often for the user to guess which component is broken and replace it.

[1] Section 3.1.16 of https://www.nxp.com/docs/en/user-guide/UM10204.pdf

littleJeck · on March 30, 2022

> Now I²C is a particularly sloppy protocol

I know this pain as an embedded software monkey. So many edge cases unthought of and no method to gracefully fail. Not to mention you always find some fault with the I²C implementation on the SOC or slave device you want to talk to.

I’ve spent weeks debugging bus lockup problems only to find erratas for the silicon, or find this particular chip needs an extra 100ms to startup than others.

It’s the one communication protocol that o always have problems with. Nothing else has caused me as many headaches.

a9h74j · on March 31, 2022

This is something which worries me with Oxide using I2C (according to their board bring-up podcast), and using an auto-direction-switching level-shifter on the I2C bus. I hope and I am sure they can make it work, but to me that comes closer to violating a (core?) value of determinism and observability. Of course if it is only one well-studied component or two, perhaps out of design/supply necessity, this might be a very small risk in the larger scope.

bcantrill · on March 31, 2022

It is indeed a very small risk in the larger scope -- and besides, using I2C isn't optional: all of the devices on the board (from converters/regulators, temp sensors, clock generators and fan controllers to CPUs, DIMMs, NICs, and NVMe drives) have I2C (or I2C-based) management interfaces. If you'd like more concrete detail, take a look at the Hubris app definition for our Gimlet board.[0]

[0] https://github.com/oxidecomputer/hubris/blob/master/app/giml...

a9h74j · on April 1, 2022

Yes, clearly I2C, properly employed, adds far more to observability than it might, in principle, very rarely take away. And of course: a) Oxide is not the first to use the same components; b) Oxide is past the point of bring-up problems with a previously untried component; c) Oxide will have done all sorts of power cycling and confidence-building; so d) common-mode failures can likely be ruled out; and e) (I presume) any rare bus hang or error can itself be detected and reported.

AdamH12113 · on March 31, 2022

Are you sure it’s really I2C? Servers often use PMBus for power supply management, which uses the I2C physical layer but the (better-behaved) SMBus link layer.

I’m not familiar with Oxide, but the I2C physical layer is just a pair of open-drain signals, so I wouldn’t worry about a level-shifter too much.

charcircuit · on March 30, 2022

>A PCB trace can't.

Sure, but physical damage can disconnect it.

anyfoo · on March 30, 2022

But in that case we accept the device as "broken" and need to replace or repair it. If you are lucky the PCB trace was only relevant to a single feature, say an LED. If you are unlucky, it's part of the memory bus and everything is toast. But no significant engineering went into making one more resilient than the other.

Whereas in a distributed system, a single broken communication line, even or especially if it's extremely important, still means that the distributed system has to recover somehow, preferably gracefully.

charcircuit · on March 31, 2022

Making things robust to failure is an engineering trade off. Your comment simply points out that there are cases where handling failures aren't seen as being important.

harry8 · on March 31, 2022

Clearly it's a trade off as everything always is. So you deal with probabilities of various kinds of failure in various circumstances.

Turning on a computer, compare the probability of an issue with the ethernet cable, ie not plugged in as suggested vs a pcb trace being broken. Given that probability do you want to handle one of those gracefully and not fry the machine? Both? Neither. And that's the point of identifying the difference. A thing that is rare and if it occurs everything is b0rked anyway doesn't need to be handled.

That individual components having to be robust on some level (what level? It depends...) to failure difference of distribution via a network connection and pcb traces is probably the reason we don't usually (but we might) refer to the latter as a distributed system.

hsn915 · on March 30, 2022

Yes but your computer will not gracefully handle CPUs randomly failing or RAM randomly failing. Sure, storage devices can come and go, but that's been the case since forever, and most programs are not written to handle this edge case gracefully. Except for the OS kernel.

The links between the components of your computer are solid and cannot fail like actual computer network connections.

In terms of "CAP" theorom, the system has no Partition tolerance. If one of the the links connecting CPUs/GPUs/RAM breaks, all hell breaks loose. If a single instruction is not processed correctly, all hell might break loose.

So I find the analogy misleading.

aidenn0 · on March 30, 2022

I think that TFA gets it exactly backwards. It's not that we will be able to treat multi-node systems as non-distributed it's that single-nodes will have to start being treated like distributed systems.

> The links between the components of your computer are solid and cannot fail like actual computer network connections.

I've personally had this disproven to me on multiple occasions.

catern · on March 30, 2022

>I've personally had this disproven to me on multiple occasions.

That sounds like interesting stories! Can you elaborate?

aidenn0 · on March 30, 2022

Accidents on desktop hardware:

Multiple bad disk cables (more common in IDE era, but happened once with SATA). Interestingly enough, Windows would reduce the drive speed on certain errors, so I had a drive that booted up in UDMA/133 and the longer it was running the slower it got, eventually settling in at PIO mode 2. Switching the drive cable fixed it.

A sound card that wasn't screwed in to the case, so if you pushed the phone connector in too hard it would unseat. I still don't know how that happened; it must have been me (unless someone pranked me) but the sound-card hadn't been changed in like 2 years at that point.

A DIMM wasn't fully clipped in, but the system worked fine for weeks until someone bumped into the case.

Things that were actually intentional:

We expect anything plugged in externally (e.g. USB, ethernet, HDMI) to be plugged and unplugged without needing to restart the system. This sounds banal, but wasn't always the case. I had a network card with 3 interfaces (10BASE5 AUI, 10BASE2 BNC, 10BASE-T modular plug) and you needed to power off the system and toggle a DIP switch to change which was in use.

I've seen server and minicomputer hardware with hotpluggable CPUs and RAM

Eurocard type systems (e.g. VME, cPCI) could connect all sorts of things, and could run without restarting. This sort of blurs the line as to what a "node" is. If you have multiple CPUs on the same PCI bus, is that one node or many?

eGPUs have made hotplugging a GPU something that anyone might do today. If you run this setup, then the majority of the computational power in your system can appear and disappear at will, along with multiple GB of RAM.

StillBored · on March 30, 2022

There have been machines tolerant to CPU and Mem failures, and to a certain extent this sorta works on some of the higher end machines that support ram/cpu hotplug. (historically see hp/tandem/nonstop, sunos/imp, etc).

The problem is linux's monolithic model, doesn't work well for kernel checkpoint/restore despite it actually supporting hotplug cpu/ram it they have to be gracefully removed.

So, this is less about the machine being distributed, and more about the fact that linux is the opposite of a microkernel/etc that can isolate and restart its subsystems in the face of failure. Its also sorta funny that while these types of operations tend to need to be designed into the system, the last major OS's designed this way were done in the 1980's.

dwohnitmok · on March 30, 2022

I know of no OSes that are resilient to CPU cores producing wrong results (or incorrect mem results: I consider ECC a lower level concern that is not part of the OS), whereas a lot of distributed consensus algorithms have this built into their requirements. EDIT: I have heard through the grapevine that something like this might be done for aerospace, but I have no personal experience with that.

I agree with parent. The major reason why programming on a single computer is easier than a distributed system is that we assume total resilience of various components that we cannot for a distributed system.

From the article:

> This offers hope that it is possible to some day abstract away the distributed nature of larger-scale systems.

To do this is not a question of software abstractions, but hardware resilience. If we have a network which we can reasonably assume to have 100% uptime and absolutely no corruption between all its components then we can program distributed systems as single computers.

catern · on March 30, 2022

Most distributed consensus algorithms, or distributed systems in general, are not resilient to nodes producing arbitrary wrong results. That's the realm of systems like Bitcoin, which achieve such resilience by paying big performance costs.

So it shouldn't be surprising that computers have the same lack of resilience.

anonymousDan · on March 30, 2022

Sorry what? That is exactly the purpose of Byzantine fault tolerant consensus algorithms, which have been around for many years.

catern · on March 31, 2022

Sure, that doesn't contradict what I said. They have been around for years, and they are very expensive and almost all systems do not use them.

anonymousDan · on March 31, 2022

They are a lot more efficient than bitcoin. But agreed, not cheap.

StillBored · on March 30, 2022

The tandems I listed above, originally used lock stepped processors, along with stratus/etc.

edit: Googling yields few results that aren't actual books, Try this

https://books.google.com/books?id=wBuy0oLXEuQC&pg=PA218&lpg=...

dwohnitmok · on March 30, 2022

Ah well there you go. Had no idea they used lock stepping!

gnufx · on March 30, 2022

It doesn't count as resilient in the mainframe sense, but in an effort to encourage system management, I ran the Node Health Check system on our "commodity" HPC cluster and found multiple failed DIMMs and a failed socket no-one had noticed. (I'd had enough alerts from that on a cluster I managed.)

bee_rider · on March 30, 2022

You can 'disable' a core in Linux pretty easily, although I'm not sure to what extent you'd consider this graceful (in the sense that you write to a system file and then some magic, which may be arbitrarily complicated I guess, happens in the background. So it doesn't seem equivalent to just yanking a core from the package, if that were possible).

imtringued · on March 30, 2022

The article also ignores that e.g. the CUDA API looks nothing like a local function call. People are explicitly aware when they are launching GPU kernels.

catern · on March 30, 2022

>Yes but your computer will not gracefully handle CPUs randomly failing or RAM randomly failing

That's incorrect.

There are plenty of machines/OSs which are (or can be) resilient to a CPU failing; Linux, for example. From the OS point of view, you just kill the process that was running on the CPU at the time and move on.

Resilience to spontaneous RAM failures is rarer but possible.

bee_rider · on March 30, 2022

Killing the processes running on the compute element seems not very graceful, right? I'd expect a gracefully handled failure to have some state staved from which the computation can be continued.

Which would be overkill on a single node, given that CPUs don't really fail all that often.

catern · on March 30, 2022

It's up to userspace to do more than that. There are other issues which can cause processes to be spontaneously killed (OOMkiller for example) so it's something you should be tolerant of.

nickelpro · on March 30, 2022

Disagree. An environment that's being reaped by OOMK is not stable enough to make assumptions about. You're in "go down the hall and turn it off and on again" territory.

Attempting to account for such environments in user programs massively inflates their complexity, does little to enhance reliability, and the resulting behavior is typically brittle or outright broken from the get go.

This is why, for example, the C++ committee flirts with making allocation failure a UB condition.

astrange · on March 31, 2022

iOS constantly runs an OOM killer and works pretty well.

rbanffy · on March 30, 2022

This was true for several home computers since the late 70's. Atari 8-bit computers had all peripherals connecting via a serial bus, each one with its own little processor, ROM, RAM and IO (the only exception, IIRC, was the cassete drive). Commodores also had a similar design for their disk drives. A couple months back a 1541 drive was demoed running standalone with custom software and generating a valid NTSC signal.

Frenchgeek · on March 30, 2022

( https://youtu.be/zprSxCMlECA )

catern · on March 30, 2022

Wow! Reminds me of https://www.rifters.com/crawl/?p=6116

A hydrocephalic demo!

rbanffy · on March 30, 2022

I think that plan hits a wall for heat dissipation and nutrient/oxygen consumption - not sure we have lungs large enough to keep a brain doing 10x more computation oxygenated, nor perspiration glands to keep it cool.

But I'd be totally in to a 10% increase in IQ in exchange to being able to eat 10% more sugar.

dkersten · on March 30, 2022

Wow, that is cool!

YZF · on March 30, 2022

well, it's been true since a wire has been connecting any two bits. The processor, ROM, RAM are all "distributed" systems internally.

rbanffy · on March 30, 2022

That's not what "distributed system" means.

YZF · on March 30, 2022

What's your definition of "distributed system" then?

Two flipflops interconnected on one wafer. Two flipflops inteconnected on one PCB. Two flipflops interconnected with a cable between two PCBs. These are all "distributed". They're all subject e.g. to the CAP theorem. Sure, the probability of one flipflop failing on the same wafer is quite small. The probability of one flipflop failing on one PCB is slightly larger. But fundamentally all these systems are the same. If you have two computers on a network you can make the probability of failure (e.g. of the network) pretty small.

rbanffy · on March 30, 2022

I start counting them as independent computers when they have their own firmware.

TickleSteve · on March 30, 2022

It absolutely is.

distribution of signals within smaller systems (microcontrollers, ASICs, FPGAs, etc) are all distributed systems. Ask anyone doing any kind of circuit design about distributing clocks and clock skew, etc.

rbanffy · on March 30, 2022

If you read the article, you’ll understand it’s about our computers being networks of smaller computers. The SSD, GPU, NIC, and BMC has its own CPU, memory, and operating system.

WestCoastJustin · on March 30, 2022

Great post called "Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation" [1, 2] that basically walks through step-by-step that your computer is just a bunch of interconnected networks.

Highly recommend the post if you're into this and also sort of amazing how far single systems have come. You can basically do "big data" type things on this single box.

[1] https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-th...

[2] https://news.ycombinator.com/item?id=25956670

kmod · on March 31, 2022

I worked on this for my masters thesis! The thesis was for a specific part but the group worked on the problem as a whole, see https://dspace.mit.edu/handle/1721.1/49844

IMO there are two things that make the current abstraction of a computer as a unit make sense:

- You (mostly) don't have to try to handle partial failures within a computer. Partial failures are what make distributed systems hard.

- The difference in communication costs between two cores in a single machine is several orders of magnitude lower than communicating with a separate machine using commodity technologies. So while yes, "it's all just distributed" and you can use a common abstraction, a large enough constant factor difference means that you still will have to look through the abstraction to build a performant system.

zozbot234 · on March 31, 2022

The latency involved in communicating with a separate machine is comparable to loading data from disk. So yes, it is slow, but not so slow that you couldn't reuse many of the abstractions involved in programming a single machine.

sesuximo · on March 30, 2022

I think there’s a big difference which is that your computer is allowed to crash when one component breaks whereas a distributed system is typically more fault tolerant.

uvdn7 · on March 30, 2022

This is actually what makes handling the distributed system in a single computer easier – everything crashing together makes it an easier problem.

E.g. you have multiple CPU cachelines, caching different values of a main memory location. And there are different cache coherence protocols to keep them sane. But cache coherence protocols never need to worry about the failure mode when one cacheline is temporarily unavailable but the others are.

So yes, there's a distributed system in each multi-core computer, but it's a distributed system with an easier failure mode.

If you like more analogies between CPU caches and distributed systems, https://blog.the-pans.com/cpp-memory-model-as-a-distributed-... :p

harperlee · on March 30, 2022

Ideally a peripheral crashing should not crash the whole system.

catern · on March 30, 2022

And indeed it does not: Modern operating systems like Linux can perfectly well deal with all kinds of devices crashing or disappearing at runtime. Just like in larger distributed systems.

zerohp · on March 31, 2022

That's not entirely true. There's usually some level of fault recovery built in but it doesn't extend to the level of allowing any component to fail at any time.

taeric · on March 30, 2022

So much of programming languages is to hide the distributed nature of what the computer is doing on a regular basis. This is somewhat obvious for thread abstractions where you can get two things happening. It is blatant for CUDA style programming.

As this link points out, it gets a bit more difficult with some of the larger machines we have to keep the abstractions useful. That said, it does mostly work. Despite being able to find and harp on the areas that it fails, it is amazing how well so many of the abstractions have held up.

Would be neat to see explicit handling of what features are basically completely hiding distributed nature of the computer.

jayd16 · on March 30, 2022

The abstractions aren't just for simplicity. In many cases, ensuring that the distributed nature is unknown or unobserved means the system can make different decisions without affecting the program. This leaves room for flexibility in the system design.

Karrot_Kream · on March 30, 2022

Maybe? Alternatively by bringing the distributed nature up front-and-center you can have more flexible designs. If I could timeout my drawing routine when the screen has already refreshed (or context has been stolen from the OS) then I have a lot more flexibility in how to recover instead of pretending to do my best and ending up with a lot of screen tearing when I miss my frame budget.

jayd16 · on March 30, 2022

I'm trying to wrap my head around where this would happen in a way that made sense. Derailing the GPU pipeline from the OS probably doesn't make much sense. If we're talking about the OS halting the CPU side of the render I guess that would maybe be useful? Even on a single core machine it would be equally useful so I don't know if its a case of distribution per se...

But in the abstract, sure. It's a give and take. It's useful to know things and use that knowledge. It's also useful to know a detail is hidden and changeable without consequence.

Karrot_Kream · on March 30, 2022

Yeah I'm thinking the OS halts the CPU side of the render and, say, stuffs an errno into a register after the routine so the CPU can see what happened and recover. If I were writing a program that required a minimum frame rate and I missed multiple frames, it would probably be nicer for the user if I displayed a message that I was just unable to write a frame at the required speed and quit rather than screen tear and frustrate the user.

A similar situation happens if my NIC/kernel buffers are to overloaded to send the packets I need out. Instead I can try in vain to push packets out and have almost no understanding how many packets the OS is dropping just to keep up. Media standards like RTCP were designed around scenarios like these, but that itself is complexity we wouldn't need if the OS could notify the application when their packet writes failed.

This kind of flexibility right now is really difficult because most OSs try to pretend as hard as possible that everything happens sequentially. This is just about opening up more complete abstractions to the programmer.

taeric · on March 30, 2022

Distributed problems that are largely timing are easy to see in this nature. In large, the whole synchronize on a clock idea is invisible to programmers.

That said, there are times when it isn't hidden, but only taken out of your control. I guess the question is mainly in how to move them to first class objects to reason about?

zozbot234 · on March 30, 2022

The distributed nature can never be unobserved, by definition. What a well-designed distributed system can do is offer facilities to enable useful constraints on its operation, that might then be used as necessary via a programming language.

alexisread · on March 30, 2022

There are lots of good resources in this area: The programming language of the transputer https://en.m.wikipedia.org/wiki/Occam_(programming_language)

Bluebottle active objects https://www.research-collection.ethz.ch/bitstream/handle/20.... with some discussion of DMA

Composita components http://concurrency.ch/Content/publications/Blaeser_Component...

Mobile Maude (only a spec) http://maude.sip.ucm.es/mobilemaude/mobile-maude.maude

Kali scheme (atop Scheme48 secure capability OS) https://dl.acm.org/doi/pdf/10.1145/213978.213986

Kali is probably the closest to a distributed OS, supporting secure thread and process migration across local and remote systems (and makes that explicit), distributed profiling and monitoring tools, etc. It is basically an OS based on the actor model. It doesn't scale massively as routing nodes was out of scope (it connects all nodes on a bus), but that can easily be added.

Extremely small (running in 2mb ram), it covers all of R5rs, and the VM has been adapted to bare metal.

I feel that there is more to do, but a combination of those is probably the right direction.

saltcured · on March 31, 2022

I think that someone newly interested in this should consider the longer history of distributed OS concepts too. What are you trying to do differently? What set of tradeoffs give you a well-defined solution space which is a gap in current approaches?

https://en.wikipedia.org/wiki/Sprite_(operating_system) was a project at UC Berkeley that ran building-scale networks of workstations as a distributed OS. It ended in 1992 but produced innovations such as the log-structured filesystem along the way.

Various commercial products like Domain/OS or SGI Irix also were distributed operating systems at different scales. It seems like these lost out to the more typical HPC solutions with distributed/parallel OS instances limited to each node. Or on the other extreme, mainframe systems continue to be a kind of distributed OS built for high availability and scaling in a very controlled environment.

ilaksh · on March 30, 2022

This proves that conventional wisdom (such as the idea that abstracting distributed computation is unworkable) is often wrong.

What happens is enough people try to do something and can't quite get it to work quite right that it eventually becomes assumed that anyone trying that approach is naive. Then people actively avoid trying because they don't want others to think they don't know "best practices".

Remember the post from the other day about magnetic amplifiers? Engineers in the US gave up on them. But for the Russians, mag amps never became "unworkable" and uncool to try, and they eventually solved the hard problems and made them extremely useful.

Technology is much more about trends and psychology than people realize. In some ways, so is the whole world. It seems to me that at some level, most humans never _really_ progress beyond middle-school level.

The starting point for analyzing most things should probably be from the context of teenage primates.

jmull · on March 30, 2022

> This is something unique: an abstraction that hides the distributed nature of a system and actually succeeds.

That's not even remotely unique.

OP is grappling with "the map is not the territory" vs. maps have many valid uses.

Abstractions can be both not accurate in every context and 100% useful in many, many common contexts.

Also (before you get too excited), abstractions have quality: there are good abstractions -- which are useful in many common contexts -- and bad abstractions -- which overpromise and turn out to be misleading in some or many common contexts.

I'll put it this way: the idea that The Truth exists is a rough (and not particularly useful) abstraction. If you have a problem with that, it just means you have something to learn to engage reality more fruitfully.

simne · on March 30, 2022

Unfortunately, this idea fights vs idea of least responsibility.

Because, user level programs are all at one level of abstraction, and this distribution is distributed over many levels of abstraction.

So in desktop systems, mean mostly successors of business micro machines, access to other levels of abstraction intentionally hardened for measures of security and reliability. The same thing applied to crowd computing - there also vps's are isolated from hardware and from other vps's.

These measures usually avoided in game systems and in embedded systems, but they are not allowed to run multiple programs from independent developers (for security and reliability), and their programming magnitudes more expensive than desktops and even server side (yes, you may surprised, but game consoles software in many case more reliable than military, and usually far surpass business software).

To solve this contradiction, need some totally new paradigms and technologies, may be some revolutionary, like usage of GAI to write code.

it · on March 30, 2022

The Erlang VM (BEAM) can be viewed as a distributed operating system, or at least the beginnings of one.

simne · on March 30, 2022

Agree, and could add, that ALL Erlang flavors (exists at least 4 independent implementations for different environments and for different targets) are distributed.

And Erlang is based on relatively new syntax from Prolog, which also have cool ideas.

amelius · on March 30, 2022

Yes, and your computer is a ball of interconnected microservices too.

Koshkin · on March 30, 2022

Yes, and concurrency is, in fact, an implementation detail. Which is why I think that in most applied scenarios it should be hidden, and taken care of, by the compiler.

benreesman · on March 30, 2022

Eric Brewer thinks this is a good point of view on such things:

https://codahale.com/you-cant-sacrifice-partition-tolerance/

L1-blockchain entrepreneurs and people who got locked into MongoDB aside, I think most agree.

tonymet · on March 30, 2022

i recommend people model their apps this way. spin up more threads than needed, one each for api , DB , LB, async, pipelines etc. you can model an entire stack in one memory space. It's a great way to prototype your complete data model before scaling to the proper solutions. Lots of design constraints are found this way . everything looks great on paper but then falls apart when integrating layers.

pkilgore · on March 30, 2022

What is the kernal and the bus for the cloud?

simne · on March 30, 2022

These all now are virtual state machines, which store some state and convert all kernel/bus behavior to interaction with connected via network devices.

At the moment there are lot of such devices - exists for sure many full featured, like Raspberry; but also there are network connected ATA drives, network connected sensors, RAM, ROM (Flash); BTW IEEE 1394 FireWire is serial interface, could been used as networking bus; exists adapters ethernet-usb (and many commodity devices work well with such connection), so virtually anywhere could been considered as connected via network bus. Even exists USB 3.0 to PCIe adapter, to use PCIe device throw USB connection.

And in reality exists problem, that FireWire so distributed, that it where possible on Macs with FireWire interface, to read memory via this interface.

So hardware and software exists, but need some steps to make it's usage safe.

syngrog66 · on March 30, 2022

once you learn to bias to thinking in terms of message passing between actors, and, bias to having immutable shared state, then,a lot of problems become easier to decompose and solve elegantly, esp at scale

JL-Akrasia · on March 30, 2022

You are also a distributed system.

__turbobrew__ · on March 30, 2022

Some more distributed than others

dredmorbius · on April 3, 2022

The people are already here, they're just not very evenly distributed.

andrey_utkin · on March 30, 2022

Your body is a distributed system. Your brain is a distributed system. A live cell is a distributed system. A molecule is a distributed system. In other news, water is wet.