Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is an old article from the early 90s and I believe it may have been the first public mention of this fact about the x86 encoding, although no doubt many have independently "discovered" it before --- especially in the times when microcomputer programming consisted largely of writing down Asm on paper and then hand-assembling the bytes into memory using something like a hex keypad.

All of these are features inherited from the 8080/8085/Z80.

Here are the corresponding opcode tables in octal:

https://dercuano.github.io/notes/8080-opcode-map.html

http://www.righto.com/2013/02/8085-instruction-set-octal-tab...

http://www.z80.info/decoding.htm



Thanks for these links - very interesting.

Astonishing to think that we can see traces of the 8008 still today and that it wasn’t actually an Intel designed ISA (came from CTC / Datapoint).


The Datapoint 2200, the source of the 8008 instruction set, is an interesting machine. The CPU was built from TTL chips. To decode instructions, they used decimal BCD decoder chips, specifically the 7442. But they'd use them as octal decoder chips, only using 8 outputs.

The Datapoint 2200 documentation gave the opcodes in octal, so they were clearly thinking in octal. The 8008 documentation, however, didn't use octal or hexadecimal. The opcodes were given in binary, but grouped in 3 bits, octal style, e.g. 10 111 010. (They didn't specify opcodes in octal or hex!) I think the 8008 was right at the time where octal was on the way out and hexadecimal was taking over. (The 8008 assembler manual uses both octal and hexadecimal, but hexadecimal primarily.)

The Intel 8080 still specified the instruction set in binary, not octal or hexadecimal. The 8085 had opcodes in binary in a 1983 manual, but now split with a line into 4-bit chunks (i.e. hexadecimal-style). And then an appendix gave the opcodes in hexadecimal.

(Just some random history.)


One thing I forgot to mention: the 6502 microprocessor also uses groups of 3 bits in its instructions. However, they group them in the "wrong" way, aaabbbcc, so looking at the instructions in octal doesn't help you at all.

Also, after using the Xerox Alto, which uses 16-bit words, I realized that octal is terrible. The problem is that if you're looking at two bytes in a word, the values make no sense in octal. For example, the characters "AB" form the hex word 0x4142, while "BA" forms 0x4241; the two letters are clear. But in octal, "AB" is 0o40502 and "BA" is 0o41101; the two letters turn into unrecognizable numbers.


With 12-bit, 18-bit, or 36-bit words octal is pretty great. It just sucks with 8-bit bytes being grouped into 16-bit or 32-bit words.


That's just a consequence of us sticking to 8-bit bytes (and derivative word sizes), no? Octal would have made a lot more sense if it was, say, 12-bit.


“Us sticking to 8 bit bytes” is a consequence of having preferred BCD to octal in the past, so the causation is reversed (“12-bit words would have made a lot more sense if it was, say, octal.”) [Edited: actually 12 bit words would make sense in either case, as it's three BCD digits or four octal digits]

The Intel 4004 used four bits to manipulate a single BCD digit. The 8086 had BCD instructions. There were many reasons for preferring BCD when designing computer architectures, though my favourite which was already becoming less relevant at 8086-time was that it meant a full column on a punchcard wouldn’t be “all holes” and reduced the likelihood of the cards tearing.


I'm thinking of earlier times, before microprocessors in general. 6-bit bytes were a thing for a while - fairly logical, too, given that it was just enough bits to encode the entirety of ITA2 without needing any control codes to switch between character banks.


I implemented an 8086 emulator using the 1981 "iAPX 86, 88 USER'S MANUAL". It specifies the opcodes in bit patterns, so add is specified as "0 0 0 0 0 0 d w | mod reg r/m", where d is direction (mem to reg or reg to mem) and w is width (byte or word). Since this kind of pattern is used across many instructions it makes the code fairly easy to comprehend (to me at least).

Extracting the alu function from bits 4-6 means that you can group together the implementation of add, or, adc, sbb, and, sub, xor, to and from memory, for bytes and words into one function.

The code's not as fast as the "one code block per instruction" approach of something like DosBOX but at least it doesn't cause me dread to look at.


In case anyone is curious like I was and wants to look at the documentation, Bitsavers has some[1], including the octal opcode reference[2].

[1] http://bitsavers.org/pdf/datapoint/2200/

[2] http://bitsavers.org/pdf/datapoint/2200/2200_Programmers_Man...


Do you know if Federico Faggin copied the logic design of the 2200 or implemented the ISA using his own design?


The implementation of the 8008 is completely different from the 2200 (as is Texas Instruments' forgotten TMX 1795 implementation). It would be extremely inefficient to copy the TTL implementation, since that depended on what chips were available. But the biggest difference is that the Datapoint 2200 was a serial machine that used serial shift-register memory while the 8008 had a "normal" 8-bit datapath.


Thanks - even in 1972 we had multiple radically different implementations of the same ISA!


His own. There would have been no way to fit the logic design of the 2200 on a chip, both because of lack of metal layers, and because the 2200 was designed to be frugal in how many chips it used, using off-the-shelf chips, which is very different from being frugal in the amount of transistors you use when you have freedom to lay out each one by hand.


I thought the 8008 manuals used hex, but, as kens points out in https://news.ycombinator.com/item?id=30409889, the Datapoint manuals used octal.

My notes on the 8080 are at https://dercuano.github.io/notes/8080-opcode-map.html.


I kind of doubt it was actually the first. It's definitely an interesting/cute thing to notice and write up but I think the weird longevity of this particular piece owes more to the pioneering clickbaity framing (x86 is not really an octal machine, it's not surprising that people 'hadn't noticed' because obviously they had, etc) than the observations themselves.


The sibling comments disagree with you


I’m not seeing any disagreement in sibling or other comments


The comment by kens explains how the Datapoint 2200 both documented the design of the ISA in octal and used collections of three bits to decode at the fairly high TTL chip level.

https://news.ycombinator.com/item?id=30409100#30409889


The fact that instructions have some 'octal' structure doesn't make the thing an 'octal machine' and as importantly, a Datapoint 2200 is not an x86. The x86 is not an octal machine.


> The fact that instructions have some 'octal' structure doesn't make the thing an 'octal machine'

Basic concepts like the 8 GPRs are rooted in it's octal decoding roots. MOD/RM is still octal decoded, SIB is still octal decoded, etc. These fields aren't just three bits long, but also aligned to a three bit boundary within the byte being decoded.

> and as importantly, a Datapoint 2200 is not an x86. The x86 is not an octal machine.

The x86 traces its lineage to that and the points still hit. For instance, even when they added more registers in x86_64, it's still a three bit bank with simply a new prefix to select whether it's referring to the top or bottom 8 register bank out of now 16 total registers. There's some awkward places where you can't address different 8 register banks in the way you'd want to from an encoding perspective because of these continued restrictions going back to the Datapoint 2200.

Having written the HDL for a simple x86_64 decoder, it is very much still an octal machine.


They are still bits, not octdigits or whatever. When 4 bits are used, nobody calls these 'hex machines'. We can definitely spend a lot of time pedanti-digging into the details but at the end of the day, it's just an early example of 'viral title'.


> They are still bits, not octdigits or whatever.

Once again, it's not just that they're just groups of three bits, but the fields are also three bit aligned.

> When 4 bits are used, nobody calls these 'hex machines'.

I mean, most systems aren't aligned nearly as well on clear repeated boundaries the same way. The only other one that I can think of (the SH series) I for one have absolutely called a hex machine because you can read most of the machine instructions directly from the 4-bit nybbles. A four bit opcode and three address RISC instructions out of 16 GPRs means you can read the hex just about as easily as ASM.

The fact that most other machines correctly take a more bit level almost huffman coding route doesn't make x86 any more less octal derived at it's base.


On a machine where opcodes were easily decodeable by just looking at their individual nybbles, calling the machine a “hex machine” would be relatively natural. Obviously the term is to be seen in context, and not (necessarily) the defining characteristic of the machine.


Breaking instructions into 3 bit groups is awfully convenient (and you find it elsewhere, e.g. in portions of the THUMB encoding).

Datapoint did it consistently. And in a way that aligns with octal encoding. And then used octal in their documentation.

In turn, their instruction set basically became the 8008's, which influenced the 8080 and then 8086/8088. In turn, we still have this structure in x86 today: the instructions are prettily readable in octal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: