Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nobody ever ported Doom to run on a Cray 1 (twitter.com/id_aa_carmack)
224 points by tosh on Dec 21, 2020 | hide | past | favorite | 121 comments


The Cray Computers felt like magic back in the day.

A guy is making a replica. He had some issues finding the operating system and systems software but he overcame it

Persistence pays off.

I am sure it would be fun to part doom to it. Easier too since the new versions draws Considerable less power.

https://gigaom.com/2014/01/14/the-search-for-the-lost-cray-s...


How did he obtain the OS?


Per TFL,

"Andy Gelme, an Australian software developer who once worked for Cray. He too had a disk pack [containing the OS]."

"For the greater part of the last year, [Tantos, a Microsoft electrical engineer] arduously reverse engineered the OS from the [corrupted / incomplete] image. Despite a few remaining bugs, the Cray OS now works."


I ported parts of a TCP stack to the Cray X-MP in the 1980s. It was so fast that at first I thought it was broken. Hit <return> on the compile command line and the prompt would come right back almost immediately.


One of my professors joked that they were so fast, they could run an infinite loop in 2.2 seconds.


I used to program a Y-MP2 around 1990. That had an SSD and was the abolute fastest machine I had ever used at that time.


> 1990 > SSD

Can you elaborate?


All of the early Cray mainframes (even the Cray-1 discussed here, although maybe only the somewhat later Cray-1/S) supported a "Solid State Disk" peripheral, which was basically a refrigerator full of RAM that was accessible via DMA channel. The intent was to be used to swap in and out large data sets from memory - I think even the early ones went up to a gigabyte in size.


Yes, the Cray Solid State Storage Device (SSD) was a standalone unit that looked like a quarter of a Cray, complete with bench. It held up to 2 gigabytes of data in MOS memory and was accessed via an I/O channel at 100 megabytes to 1.25 gigabytes per second. It was used for "temporary storage of datasets" and could "significantly increase data transfer rates." The SSD was separate from the Cray's main memory.

It is described in detail in the manual: http://www.bitsavers.org/pdf/cray/Disk/HR-0031_SolidStateSto...


> was accessed via an I/O channel at 100 megabytes to 1.25 gigabytes per second.

That's even faster than many modern SSDs.


I believe the idea also made its way to microcomputers (or was realized independently) shortly after, like with the SemiDisk:

https://web.archive.org/web/20201112040756/http://www.s100co...


It's my understanding that it wasn't a flash memory based SSD like we would see today, but just a huge amount of RAM (for its era) that could be used as storage. The facilities that typically hosted a Cray could be counted upon to have very reliable power, so the power failure risk of keeping stuff directly in RAM was low.


Rather you always had a back up generator that would let you shut down everything gracefully in the event of a power outage.

An uncle that worked in that area in the 80s/90s had stories about people who still argued these new fangled journaling file systems were pointless.


SGI machines had capacitor banks in the power supplies large enough to power the machine long enough for the kernel interrupt handler to park all of the hard drive write heads. It turns out that the voltage required to keep the write head writing is often less than the voltage required to keep RAM from corrupting. At least for a time, XFS journaling correctness relied on the kernel being able to prevent garbage writes on loss of power.

There used to be big caveats to using XFS on non-SGI hardware. Though, I think those caveats were roughly "XFS guarantees aren't much better than ext3/ext4 guarantees on power loss, on non-SGI hardware".


This isn't about a Cray, but it's roughly contemporaneous. According to https://trs.jpl.nasa.gov/handle/2014/26062, the Cassini space probe had an "SSD," and the contract to build it was awarded in 1992. It sounds like it was pretty much a RAM disk made from DRAM.

I'd imagine it would be practical to build and use something similar for a multi-million dollar supercomputer where cost is secondary to performance.


I have also seen mention of this in spacecraft design discussion. The concept being that for things which are powered from a radioisotope thermoelectric generator, it's safe to treat the RAM as storage.

Because there's absolutely no scenario after launch, in which the constant trickle of wattage flow from the RTG to the onboard DC distribution buses and computers would ever be interrupted.

If the power from the RTG were to ever be cut off it would be a catastrophic mission failure anyways, entirely aside from the power to the onboard computing systems being interrupted.


> I have also seen mention of this in spacecraft design discussion. The concept being that for things which are powered from a radioisotope thermoelectric generator, it's safe to treat the RAM as storage.

I don't buy it. They specifically design these systems with duplication in nature in case RAM gets corrupted by radiation in space. So you can't treat it as storage, but for very different reasons.


Presumably the OS would be in some nonvolatile medium and the SSD would only be used for stuff like storing sensor data anyway. In the event of corruption or a system restart you would lose all of your working data but could bootstrap the system again.


While it would make sense to have recovery routines in some nonvolatile memory to handle a system corruption use case, I'm not sure it if would be necessary for a system restart. IIRC, core-based systems had nonvolatile RAM, and I think at least some of them would continue to execute the same memory image after reset. I think the point of the GP was that one of these DRAM based recorders is basically nonvolatile for all intents in purposes in this application.


It was at the NSCEE (National Supercomputing Center for Energy and the Environment) installation at UNLV for Yucca mountain. There were two vector processors and an amazing 5GB SSD. The OS was UNICOS, a successor of System V and COS (Cray OS), and a user was actually fully swapped out on context switches.

The SSD wasn't new by any means. A decade earlier the Cray-1M optionally could have had an SSD.


To be clear, we're talking about battery-backed RAM (specifically MOS RAM) there, right?


I don't think it was battery backed by anything in the Cray, though I could be wrong. The general idea being that the Cray would be installed in a facility that should have massive whole AC circuit, true online type UPS, and generators backing that up.


I don't know how the solid state storage was implemented. There are bunches of Cray brochures on the Internet that might have that information.


It's not an SSD; they're referring to their main RAM as solid state storage to differentiate it from core memory.


I think they're talking about the (not battery backed) SRAM, that would have been referred to as solid state storage to differentiate it from core memory technology.

The gigabytes of SRAM used as main RAM you could stick on a Cray was a key differentiator for the platform (and would be still today).


I guess a RAM drive, where you aren't using mechanical methods for mass storage, can be considered a solid state drive


Even the Amiga had a RAM disk


RAM disk is the one thing I really miss on PC/Mac


There are many software tools to give you a RAM disk these days on pretty much all major operating systems - even ones that give you a disk based off of GPU memory!


Has any of you actually successfully used a ramdisk recently on a modern GPU? The documentation available for doing this in Linux is pretty scarce.


I think /dev/shm is available on most(?) Linux systems by default these days. I tend to use it instead of /tmp when I can.


On some distros, /tmp is a tmpfs (ramdisk) anyways. Which of course I discovered when building a package that was bigger than my installed memory (really big package on Arch Linux, which both mounts tmpfs there and then uses it as the default makepkg location).


I really hate this hack. It feels like an attempt by distro maintainers to make their distro feel and/or benchmark quicker than others. I don't know if that's the original reason it was done, but I do know that it's a big pain to work around for a program author. If I'm writing to /tmp, I expect that the file might be swapped to disk, and indeed that it will be paged to disk before the OOM killer comes for me. If what I wanted was named shared memory, I'd put it in /dev/shm.


What about /var/tmp ?


Amiga had a nice trick called the RAD disk which was a ram disk whose content would be preserve across soft reboot


On most Linux /dev/shm is a RAM disk. Nothing to install even.


Has anyone seen a program actually put something here?


You can easily setup a RAM disk on Windows at least.

These days with M2 SSDs it isn't really necessary and (as was the case back then) you are usually better off using any spare RAM for programs.


You can still create RAM disks with macOS. See `man diskutil`.


1990 is probably a little early for something flash-based, but battery-backed DRAM- and CCD-based devices existed before that.


Even Sinclair in the 80s had a prototype for the QL, which didn't go into production: http://www.computinghistory.org.uk/det/31270/Sinclair%20QL%2...


...and in red, since it was going so fast.


Or would it be shifted to be more blue??


Only if it was getting closer.


It's the reply from the machine to you. It should be blue.


The message is always heading in the same direction. The relevant variable is the movement between the machine and you. You should probably hope it’s red because if the machine is fast enough that the blue shift is perceivable to the human eye, it might be the last thing you see.


A real working Cray 1 is a bit hard to find today, but in addition to the FPGA version by Chris Fenton that was already mentioned (https://www.chrisfenton.com/homebrew-cray-1a/), there is also a simulator for various Cray models at https://github.com/andrastantos/cray-sim

The biggest problems will be finding a C compiler for the Cray and adding a framebuffer output, I think... but this is a very nice challenge :).


> The biggest problems will be finding a C compiler for the Cray and adding a framebuffer output, I think... but this is a very nice challenge :).

I suppose you could run Doom as a batch job: input is a file with the control input and output is a gameplay video.


Doom can record game sessions to LMP file format, which is a recording of the inputs on every frame. That's usually how people share runs in the Doom speedrun community. So yeah a LMP-to-video converter could be legitimately useful to some people.

https://doom.fandom.com/wiki/Demo#Custom_demos


Perhaps you'd want to reference the wiki that was not abandoned ages ago.

https://doomwiki.org/wiki/Demo

https://doomwiki.org/wiki/Doom_Wiki:Departure_from_Wikia

“LMP-to-video converter” is any Doom port which output you can record by common (or less common) means. PrBoom+ has a built-in demo playback recording (if you consider running the console encoders and feeding them sound and image data “built-in”).

https://github.com/coelckers/prboom-plus/blob/master/prboom2...


Andras Tantos has ssh and X11 working [1] on his simulator so maybe there's a fairly straight-forward path to porting something like this? [2]

Seems possible that this aspect of the platform might drive the need some rework though [1]:

    sizeof(unsigned int) = 8; UINT_MAX = 18446744073709551615
    sizeof(unsigned long) = 8; ULONG_MAX =     18446744073709551615
    sizeof(unsigned char) = 1; UCHAR_MAX = 255
    sizeof(unsigned short) = 8; USHRT_MAX = 4294967295
i.e. ints are 64-bit longs, shorts are 32-bit but use 64-bits

[1] http://www.modularcircuits.com/blog/articles/the-return-of-t...

[2] https://github.com/ozkl/doomgeneric


CAL was the most programmable (and readable) assembly language I ever got to use, and the machine encoding of the instruction sets of the Cray-1 and Cray-2/-3 were so clean that one could read code straight from an octal dump. I think that something important has been lost since those days.

My favorite anecdote of 25 years on those machines is what we put into the 64-bit word at address 0 on the Cray-2. In ASCII, it read ~Z~E~R~O. If you jumped to it, it worked as four no-op ("PASS") instructions, and at word 1 was a jump to the library's routine that dumped registers and said "hey you jumped to a null function pointer". If you ever saw ~Z~E~R~O in a dump, you knew that a load from null had taken place.

I still wish a modern ISA would implement real vectors; SIMD is still a distant second-best.


You should check out the new ARM Helium and SVE extensions, as well as RISC-V's vector extension.

There's this newly refound idea that maybe the Seymour Cray guy knew what was up.


What’s the difference between NEON and Helium?


Could you elaborate on what you mean by real vectors? It's not immediately obvious looking at the Cray specs what differentiates it from the kind of SIMD we have today.


The Cray vector processors had a set of 8 64-element x 64-bit 'vector' (V) registers, as well as 8 64-bit 'scalar' (S) and 8 24-bit 'address' (A) registers - so it would sort of be similar to 4096-bit wide SIMD. When you did an operation like a vector add, you could do "V0 V1+V2", and it would automatically do 64 consecutive adds, and it would be done in 64 + a few cycles (since the hardware was still only doing 1 add per cycle). As someone else mentioned, it also supported "Vector chaining", so if your next instruction was "V2=V0*V3", it could take the result from the adder and pipe it into the multiplier so now your addition and multiplication are nearly fully overlapped (and you're cruising along at 160 MFLOPS in 1976!). I think it might have supported 3 chains, so you could very briefly peak at 240 MFLOPS, but you couldn't sustain it because of the startup latencies involved.

As a 'practical' example, I was able to write an N-body simulator of Jupiter and 63 of its moons (using the vector registers) orbiting one another in only 127 total instructions!


Thank you! That's very interesting.


Another key feature of these architectures is they had a vector length register. This allowed you to write strip mined loops that would move through arbitrary size vectors in units of the hardware vector lane width, without knowing that width until runtime. This means unlike MMX/SSE, the same binary works on machines with different numbers of lanes.

This idea has been resurrected recently with RISC-V and ARM's scalable vector instructions. There the general idea is an instruction that assigns the minimum of an argument value and the hardware vector register length to a register, and sets the masking appropriately if the argument is smaller. This makes for a very straightforward strip mined loop without a branch to check for and handle the remainder in the last iteration.


A few things: Vector operations were controlled by a VL (vector length) register, so the length (simd "width") was dynamic. On the Cray-2/-3, you could set the length to zero, and turn vector operations into no-ops. So vectorization of a loop with an unknown length generated a "strip-mined" loop in which each iteration performed 1-64 (later 128) iterations, and there was no epilogue problem as with SIMD. The last proprietary vector ISA from Cray had a "compute VL" instruction that attempted to smooth out the lengths of the final iterations.

The Cray-1 line could "chain" the results of one vector operation into operand(s) of another without waiting for the first to complete. (On the Cray-1, the later operation had to issue at the exact "chain slot" cycle at which the first result element appeared, so scheduling was fun; on the X-MP and later, "flexible chaining" was possible). Scheduling vector code involved grouping operations into "chimes" that would run as parallel chained operations, and so long as you could pack more vector instructions into a chime without causing synchronization due to register use or blocking on a functional unit busy, you won. Getting a 3-chime loop down to 2 chimes was fun puzzle solving, and if the loop used (say) the floating adder twice, you knew you could stop optimizing.

The Cray-2 didn't chain, but the Cray-3 had "tailgating", which was kind of the opposite -- a new vector result could start writing to a vector register that was in use as an operand without having to wait for that operand use to complete.

It helps to think about these vector machines as being pipelined (which they were). A single chime sequence was basically flowing data from memory to functional units and back to memory without really needing to use the vector registers per se for anything unless an interrupt arrived in the middle of the sequence.


Typically from an ISA perspective something like

- vector length register, to avoid loop epilogues.

- scatter-gather and strided memory ops

- support for predication / masking

- looser alignment restrictions, ideally as small as the element size rather than the entire vector width.

Look into Arm SVE and RISC-V V extension for modern incantations of this. Though the x86 world is slowly getting closer too.


Support for vector chaining, I imagine.


I've seen these in at least two museums (Computer History Museum in Mountain View, CA and the Smithsonian Air & Space Museum in DC). They definitely make for good museum pieces, and you could probably pull enough parts together to get at least one working if you really wanted to pull this off.

Incidentally, a Cray-1 uses 115 kW of power when running. Would that be a new power consumption record for a device that's just running Doom?


I saw one last summer at the Musée des Arts et Métiers in Paris. I was more fascinated then I expected https://commons.wikimedia.org/wiki/File:Musée_des_Arts_et_Mé...

They looked more like abstract art pieces or 70ties science fiction props.


To this day, it a dream of mine to sit on the bench of a Cray 1


interesting. that huge transparent block is here for aesthetics only?


I gather it was used to regulate the temperature of the liquid coolant - but instead of just some pipes or whatever, the designers turned it into a nice "waterfall" feature. I think it even lit up.


The first one ever (I believe) is about a block up the street from here. Before Coronavirus, I would stop in and say hello to it, every couple of months.

A museum piece now. Would be great to see it running again. Not likely.


Depends, we could always create a cryptocurrency which renders and plays doom in a distributed manner. Reward people with "DoomCoins"


Was there ever a GUI for Crays? I don’t bet on it but think there’s no image to run Doom on GCP or AWS for the same reason, a keyboard is not but a graphical array display is among requirements for void portDoomWithoutReason(); to be called from OnFoundCapableComputerDoomPlayable(machine Box)


One of the purposes of X was to run the work on a super computer and display it back to your desktop. Generally it was expected that time on the super computer was too expensive for interactive use, but you might display back a running simulation to see status.


Ah that seems so obvious now that you mention it - a sort of "thin client" model - but I was genuinely confused finding an X11 binary on a company AS400 the other day. Granted, that machine is older than I am. At least that binary makes sense now.


There was a third party frame buffer. Apple had one and used it to prototype advanced UX on their Cray.


That sounds like an interesting story. Do you have any further info about Apple using a Cray for prototyping?


There is an often cited quote from Seymour Cray when asked about Apple using Cray computers to design the next Apple: "That's interesting, because I'm designing the next Cray with an Apple!"

https://wiki.c2.com/?AppleCrayComputer


Some people take this seriously, not realizing that Seymour was being sarcastic. He used pencil and paper to design his computers, not even an Apple II.


Interestingly enough, Jean-Louis Gassée briefly mentions something about using a Cray to develop what was to be an Apple-designed processor decades before it was fully realized https://mondaynote.com/joining-apple-40-years-ago-805114536a... (See a little more than halfway through the article)


For those interested in the details of the cancelled Apple processor, here is the manual:

https://archive.org/details/scorpius_architecture


Thermal design, fluid dynamics. Apple said as much in advertising at the time, IIRC


If all else fails, a vnc server might be an option...


Well...since this is on HN, I wonder how long it's going to be before the new Cray 1 port of doom comes out.


I'm disappointed they never released a Cray Z computer.


Has anyone got one lying around?

(It draws a mere 100kW at the wall, I'm guessing no)


It actually has a 150kW 208v 3-phase generator set that provides power to the power distribution cabinet which then uses variacs to transform the voltage to 5v. The genset has a theoretical current output of 415A, pretty serious power!

https://www.edn.com/cray-1-super-computer-the-power-supply/


Ohm's Law says 5.2V @ 770A is only 4.004kW. With one of these on each of the three phases, that's a total of 12,012kW.

So where did the rest of the power go, into cooling? Perhaps it was a linear power supply instead of a switcher? That would make for lots of inefficiency.


The Cray-1 has 36 power supplies: 20 that supply -5.2 volts, and 16 that supply -2.0 volts. (The weird negative voltages are for ECL, the fast logic family used in the Cray.) The bench around the Cray holds the power supplies.

Interestingly, the power supplies themselves are unregulated; regulation is done by the motor-generator unit.


Just plug it in at a supercharger.


Looks like most superchargers supply exactly 150kW (newer ones supplying 250kW) [1].

1: https://en.wikipedia.org/wiki/Tesla_Supercharger


Yeah, that was the joke :) It helped put the power draw in perspective for me.


For me too, thanks :) Interesting given the decades lag.


Sounds like a good thing to not touch by accident...


It's not Cray if the bowels; don't touch. Merry Christmas. May the light shine upon yous. < 3



Man, this guy has some great articles.

http://www.chrisfenton.com/cray-1-digital-archeology/

http://www.chrisfenton.com/cos-recovery/

I just wish he'd made higher resolution pictures.

He's also seriously hardcore (or maybe I feel out of depth because I'm a software guy):

> I built a robot that would manually move the head forward 1/5200th of an inch at a time (there are 400 data tracks per inch, so this gives me a whopping 13 steps per data track!), while a high-speed analog-to-digital converter would take the analog signal straight from the drive’s read amplifier and buffer it into an FPGA at a blistering 80 million samples-per-second (like I said earlier, the theme here was overkill…the data was only changing at ~10 MHz or so)

Related: http://www.modularcircuits.com/blog/articles/the-cray-files/


Lol - the 'overkill' approach was mostly one of expediency, since I had spent most of the summer I had to work on this project just trying to get my hands on the disk drive and then powering it on without blowing a fuse. I only had 2-3 weeks to throw everything together, so the 'robot' was basically the Z-stage of a Makerbot 3D printer with some acrylic widgets to push the read-head, then the signal was fed into an analog comparator chip and then into a shift register in my FPGA board. The craziest part is that it worked, and I was able to recover the data!


Very nice.

The FPGA thing isn't that inaccessible to normies like us (i.e. some FPGA Dev boards will easily cost thousands and thousands), but the ecosystem is a complete clusterfuck so getting started is the hard bit. Obviously you need to know electronics too but 80MSPS will let you get away with a lot (The term is signal integrity) of bad circuitry (as I can attest to...).

Verilog isn't too bad as long as you treat it as a description of a circuit (in a language from the past, not even mentioning VHDL...)


Computer History Museum had one when I stopped by about 10 years ago. Not certain if operational as anything other than a cozy bench. Anyone know?


No, the Cray is not operational. Only a few things at the CHM are operational: the PDP-1, the IBM 1401 lab, and the RAMAC disk drive.


When I was in high school we had a PDP System 10 and PDP 11 where I took a programming course on PDP-11 assembly language. It makes me feel old knowing these weren't antiquated at the time.


Wow! Did you ever meet Charles Babbage? ;)


No but something nearly as bad was typing on punch cards and having to feed them into a card reader to get the code into the computer. And all but one of the terminals was a TTY at 110baud and printed on paper. There was a single Dec VT100/102 terminal you had to get in early to use.


They used Freon coolant so it's possible they can't/won't


Couldn't they use one of the freon substitutes like R-407c?


Not my area of expertise but if I had a literally priceless old supercomputer, I wouldn't necessarily want to risk running it outside of its intended operating conditions (especially after 40 years)



Wouldn't that make Doom much harder? In-game adversaries would take advantage and execute vectorized attack patterns supported by cached-ammunition flow. Would Doomguy be able to survive at all?


You should be okay with your Reciprocal BFG as a force multiplier.

EDIT for those who may not have understood my comment. You would be hard-pressed to find a Cray that performed division and would instead need to perform a reciprocal multiplication in constant time.


How do you compute the reciprocal without division? A precomputed lookup table?


Newton-Rapheson method to calculate X = 1/D

1) Make a guess at the reciprocal X(0) (this can come from a small LUT, but a static "guess" can also work)

2) Calculate an improved reciprocal X(n) iteratively using the relation:

    X(i+1) = X(i)*(2-X(i)*D)
Repeat (2) until X(i) is accurate "enough". This is relatively fast, since the # of correct bits in X() doubles for each iteration.

There are several other methods: https://en.wikipedia.org/wiki/Division_algorithm#Fast_divisi...


On the other hand, I can't wait to see E1M1 in <4s.


What's E1M1?


Episode 1, map 1. The first level in Doom.


The Computer Museum in Mountain View has a Cray-I. The last time I saw it, it was at the left side of the lobby, unmarked, and some caterers were using its padded bench seats on the power supplies to stack their stuff.

Has anyone been able to emulate Tandem systems and their OS? That was a interesting high-reliability system, still worth attention. The hardware just cost too much back when it was a product.


There are two in the Computer Museum - one on the left side, you can sit on this one :-). And then one in the actual museum area, this one is opened to show the insides.


> Has anyone been able to emulate Tandem systems and their OS? That was a interesting high-reliability system, still worth attention. The hardware just cost too much back when it was a product.

HPE still sells the Tandem OS, NonStop, and they have ported it to run on x86. Unfortunately, unlike OpenVMS, they’ve never (to the best of my knowledge) run a hobbyist program for it.


I sat on the bench seats of the Texas A&M Cray back in the 80s.


I remember porting my text editor to UNICOS (someone loaned me an account on a Y-MP I think): it really only had 64-bit word pointers. The byte offset in such pointers were stuffed in to the upper unused three bits. This meant that address arithmetic had better be done with pointers, and not casted to integers.

Also I'm pretty sure ints were 64-bits.. it's possible that shorts were also 64-bits, but I don't remember.


i remember finding one board of it in a drawer at my university. it instantly felt like a special piece of electronic history, even before i realized it was a cray! (i ran to the old cray in display in the hall and indeed was a board for it :)

the guy at the computer museum was happy


I remember a story (perhaps apocryphal) about Seymour Cray showing off the Cray-1 and a hardware bug was discovered during the demo. While the attendees ate lunch, Cray redid the wire-wrap to fix the bug.


Are there any functioning Cray-1s left?


just picturing the Jurassic Park scene.....

"This is E1M1. I know this!!"


Until now?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: