James Routley

Last week’s adventures with the Exidy Sorcerer led me to write a Z80 version of the LZ4 decompressor I’d previously used on the SNES, the CoCo, and the Genesis. At this point, this has become generic enough that I took my previous implementations and broke them out into their own little library directory so I could use them in other projects. The SNES implementation turned out to be too tightly tied to the project it was in to be made generic, but the other three were just fine.

However, while I was looking at those implementations, I rapidly found that I had not three implementations, but six; the Z80 implementation inspired versions for the earlier Intel 8080 and the later Intel 8086, and I felt the lack of a dedicated 6502 version and wrote a decompressor there as well.

My original plan was to present all four new implementations side by side as a way of highlighting the similarities and differences between the various CPUs, with maybe a little historical scene-setting at the start to map out the relationships between all the various chips. This got entirely out of hand so I’ve split it in two; this week’s article is just about comparing the CPUs in their context, and next week I’ll dig into the implementations as a worked example of why these differences matter.

The Z80 and the 8080

The Z80 is a direct descendant of the 8080; it was designed as a binary-compatible upgrade. This means that the 8080, despite not being directly used by any famous 80s-era machines, would still be very familiar to an 80s-era developer; we can describe it very well by simply treating it as a cut-down Z80. It’s pretty easy to characterize what’s missing, too:

Relative jumps. No JR or DJNZ instructions.
Shadow registers. The EX AF,AF' and EXX instructions are missing.
All multibyte instructions. That removes the index registers and indirect access to I/O ports, drastically restricts our options for interrupt modes and bitwise operations, and modestly restricts our ability to do 16-bit math directly on register pairs.

Interestingly, we get to keep our conditional procedure calls and returns. If you ever wondered why JR had fewer possibilities for what flag to branch on compared to JP, CALL or RET, that’s why; there weren’t enough unused opcodes in the 8080 to fit in all the possibilities somewhere that kept instruction decoding reasonable.

Speaking of instruction decoding, the 8080’s native assembly language format was very explicitly designed to closely match its instruction encoding. The full instruction set is described on its Wikipedia page, and its table makes it very clear how the opcodes directly map to the arguments the syntax permits. This is a sharp contrast to Zilog’s assembly syntax, which instead revolve around what the instructions actually do. Thus the 8080’s INR and INX instructions become the 8- and 16-bit versions of INC on the Z80, while the LD instruction on the Z80 side absorbs an absolutely dizzying number of 8080 instructions: MOV, MVI, LXI, LDA, STA, LDAX, STAX, LHLD, SHLD, SPHL, and potentially PCHL.

That collapse of instruction mnemonics on the Z80 side allowed other parts of the syntax to be more regular as well. Following the 8008, for instance, the 8080 has a notional “memory” register M that represents the byte loaded from the memory address stored in HL; the 8080’s LDAX and STAX instructions expand that capability to get the address out of BC or DE instead. Zilog followed the 6502 syntax in using parentheses to represent indirection: M becomes (HL), and the new instructions simply became places where the operands (BC) or (DE) were also legal. Less happily, it also introduces the same parsing anomaly we saw recently on the 6502; here it manifests as the instruction LD A,(20+6) translating to 8080 syntax as LDA 26 but LD A, (20)+6 translating to MVI 26 instead. The fixes are identical; there’s nothing about the processors that causes the problem.

Toolkits for 8080 development

Since the Z80 is binary-compatible, it is entirely possible to program the 8080 with a Z80 assembler as long as one avoids the instructions that the 8080 does not support. The Pasmo assembler includes a -w8080 command-line option that generates a warning if you use incompatible instructions, so it’s probably the best option for this approach. Meanwhile, WLA-DX and ASMX support traditional 8080 syntax directly.

The Z80 vs the 8086

The Z80 came out two years after the 8080, and the 8086 came out two years after the Z80. The 8086 is a full-fledged 16-bit CPU, and as such it’s in more of a class of its own compared to the 8080 or Z80. Still, the relationship is pretty clear; while the Z80 answered the question “how can we improve the 8080?”, the 8086 attacks the question “what do we get if we design a 16-bit CPU on the same principles as the 8080?”

The 8080 offers an 8-bit accumulator, supplemented by 6 less capable registers that meld into 3 16-bit “register pairs” that can also be used as pointers to access memory. It also provides a 16-bit stack pointer that allows only limited programmatic control. The Z80 supplements this with more extensive control of the stack pointer, along with two new 16-bit index registers (IX and IY) that cannot be split, and which can serve as a base address for a hardcoded displacement. It’s a little odd that the “index” register is the base address, but it’s worth noting that the Motorola 6800 (released the same year as the 8080) works the same way.

The 8086, on the other hand, has a 16-bit accumulator AX alongside three 16-bit registers named BX, CX, and DX, each of which can be split into two 8-bit registers (AX becomes AH and AL, and the other registers split similarly). These are supplemented with two index registers named SI and DI (“source” and “destination” index), and a BP “Base Pointer” to supplement the stack pointer. None of these were splittable.

Memory access is far more sophisticated in the 8086. Any of BX, BP, SI, or DI may be used directly as a pointer or with a hard-coded displacement, like Z80 index registers, but also the base registers may be added to an index register, meaning the indices are themselves proper, full 16-bit indices. Also, while the 8080 and Z80 both generally could only do arithmetic operations on A or HL, the x86 is considerably freer about what may be used where.

The 8086 also can address 1MB of memory space instead of the 64KB of the 8-bits. It manages this using a banking system similar to what we saw on the 65816; special secondary registers (CS, DS, ES, and SS, the code, data, extra, and stack segments) hold the extra necessary bits of each address. Unlike the 65816, these segments are multplied only by 16 instead of 65536 before being added to the main 16-bit address. This allows segments to partially overlap, and results in the the 16-byte “paragraph” becoming an important unit of memory measurement in 16-bit x86 systems. This segmentation system is easily the most reviled aspect of the entire 8086 design, but I must admit that I find it enormously preferable to the 65816’s bank system. The primary advantages are twofold: segment overrides may be provided to any pointer, which means that it is less necessary to juggle segment values the way it is necessary to juggle the 65816’s data bank pointer, and—even more crucially—the 8086 has two simultaneous data bank pointers (DS and ES), which allow accessing two “far” pointers simultaneously without any register juggling at all. This makes it much, much easier to write code that consumes an input buffer and produces an output buffer while working in larger memory spaces. The finer grain of the 8086’s paragraphs also means that while it shares a 64KB pointer-offsets limit with the 65816, it doesn’t have to care about buffers hitting bank boundaries nearly as much. A 64KB buffer on the 8086 is trivially fully accessible using only a 16-bit pointer as long as it’s 16-byte aligned in physical memory.

There are a few other Z80-like features the 8086 has picked up as well, including a more generic version of its LDIR family of instructions; this is where we get to see more strict assumptions from the instruction set about what registers are for. By analogy with the Z80, DS:SI serves the role of HL as the bulk source pointer; ES:DI serves the role of DE as the bulk destination pointer, and CX or CL work like BC (or just B) in holding the 16- or 8-bit counters for instructions that repeat in an LDIR-like manner.

Toolkits for 8086 development

The IBM PC, its compatible clones, and the DOS it used from 1981-1995 was kind of a big deal so there’s no shortage of toolkits for developing on the chip for any language under the sun. However, for assembly language development in the modern world, I will strongly suggest NASM over all alternatives; it uses a simplified and more regular assembler syntax than many of the period assemblers, and it felt no need to stay compatible with the older, messier systems the way that that other modern tools such as MASM did.

The Motorola 6800 and 6809

The 6800 came out in 1974, the same year as the 8080, but struggled to compete with it on price. One year later, some of its designers released the 6502 as a much cheaper competing product, and this largely turns the 6800 into a footnote. It was used as the basis of the 6809, though, which was also enormously more expensive than the 6502 or Z80, but also was quite likely the most powerful 8-bit microprocessor of its era. The 6809 saw some success, especially in arcade machines, but it did not steamroll the world the way the 6502 and Z80 did.

The 6800’s design is markedly different from the 8080 and its descendants. It offers two 8-bit accumulators (A and B), both of which may participate fully in arithmetic operations and which can interoperate in some limited ways. Unlike the 8080 design, where values are loaded out of memory into registers and then operated upon, the 6800 prefers to take memory locations directly as operands. As part of that focus, it also includes a 16-bit index register X which offers very similar capabilities to the Z80’s IX or IY register, right down to the in-instruction displacements being limited to 8 bits, thus meaning that once again our “index” register is really more properly understood as a base register.

The 6809 expands this design considerably. The two accumulators can now be glued together into a 16-bit accumulator D which can do 16-bit math. An additional index register Y and an additional stack register U supplement the pre-existing X and S. Stack registers gain all the capabilities of index registers, and index registers themselves may now take full 16-bit offsets when dereferenced. A final addition is a “direct page” register; the 6800 used shorter instructions to refer to memory addresses where the high byte was zero (the “zero page”); in the 6809, the high byte for these shorter instructions was taken from the Direct Page register.

Less immediately visible to someone working at the assembly language level instead of the machine code one is that relative addressing is much more common on the 6809, meaning that it’s significantly more viable to write position-independent code on it than any of the other chips we’ve looked at here. Only the 8086 comes close, and it achieves it by using its segment registers as a de facto relocation base.

The biggest improvement, in my experience with the 6809, is the more numerous and powerful index registers. Having both the register and the in-instruction displacement be the size of the entire 16-bit address space means that either value may be used as a base or index, which grants the programmer considerable freedom. Being able to use the stack pointers as index registers is also very important to modern practice; it lets us finally set up stack frames with local variables in the ways we can on the 8086 or on more modern systems.

Toolkits for the 6800 and 6809

I used asm6809 for my own 6809 work. I haven’t actually done anything with the 6800 yet, but of my usual stable of assemblers it looks like WLA-DX supports it.

The MOS Technology 6502

Like the Z80, the 6502 was intended to compete with the chip that inspired it. Unlike the Z80, the 6502 is a much more distinct design and it is not a strict upgrade. Compared to the 6800, we’ve lost one of our accumulators, and while we’ve gained a new index register Y, both X and Y are only 8 bits wide; these cannot be sensibly used as base registers. This obliges us to lean more heavily on the indexed modes, and since no register is large enough to hold a pointer, any kind of memory indirection has to be vectored through the zero page. Even the stack pointer is only 8-bit; the stack is hardcoded to the range $0100–$01FF.

This is essentially the opposite of how the Z80 treated its predecessor, but it does make sense. The 6800 struggled; it was very difficult to manufacture and it was far too expensive. The 6502’s design emerges by identifying a set of core features, leaning very hard into them, and then using them as an excuse to remove or simplify other aspects of the design.

As long as we have every instruction able to use memory arguments, we don’t really need two accumulators.
The accumulator is 8-bit, so it’s kind of silly for the other registers to be 16-bit.
We’ve leaned heavily into operands being in memory; let’s put the pointers there too.
With the pointers in memory, we can bring the index registers back as indices, since they aren’t being used as base pointers anymore.
We can strip down the instruction set; no need for an “add ignoring carry” instruction when we have an “add with carry” instruction and a “clear carry” instruction.

The end result is a system where there are a lot of moving parts but where the most obvious way to express a computation within it is usually a practical one. There really isn’t an equivalent on the 6502 to the way that getting up to speed on the Z80 includes learning that OR A is how you clear the carry bit or that XOR A is better at setting A to zero than LD A,0 is. The subtleties tend to be quirks like how carry interacts with subtraction, or bugs like how Jump Indirect can’t have its address cross a page boundary.

Revisiting My Earlier Advice

The vast majority of my 8-bit work has been on either the 6502 or Z80, and as we’ve seen here, they’re from very different traditions. I do not remember where I first saw the claim, but back when I was learning either the 68000 or the Z80 one of the books I was studying divided the assembly language programming community of the time into the “sixers,” who preferred the 65xx and 68xx chips, and “eighters,” who preferred the 8080 and Z80. There’s a pretty obvious source for this split, looking at the different lineages and design decisions here.

I started out this blog very firmly as a sixer. The 6502 was the first assembly language I’d ever learned; I didn’t pick up 8086 until much later and, broadly speaking, under duress. However, a lot of the systems I’ve come to care about over the course of this blog are Z80-based, and this has obliged me to get myself up to speed with those. It’s really only in the last year or so that I think I’ve reached parity; at this point I’m confident in my ability to “switch-hit” between the sixes and the eights as needed and produce decent-to-good code in either.

My stock set of maxims for programming the two chips, however, comes from several years before I think I got there. Is there anything I’d correct?

A solid chunk of the advice, in retrospect, turns out to just be characterizing the primary differences between the sixer and eighter programming model. I think all of these leap immediately out of our design comparison above.

6502: When evaluating complex expressions, decompose the operations into two-address code, with the destination as the accumulator if possible or a memory location if not. Fixed memory locations can be used freely as operands. This was a design consequence that was clearly inherited from the 6800.
Z80: Computation should keep all of its data either in 8-bit registers or accessible via pointer dereference through a 16-bit register. This, again, falls directly out of what the instruction set lets you directly ask the CPU to do. We see this style not merely in the Z80 but in the 8080 as well.
6502: 6502 code does not want to work with pointers. It wants to work with arrays, and if the process involves accessing multiple arrays at once, it wants to work with corresponding elements in each of those arrays. This was completely unique to the 6502, as it turns out; the 6809 is much more comfortable working with pointers thanks to its 16-bit index registers, and the 6800’s X register was effectively an HL register that didn’t break apart.
Z80: Memory access is overwhelmingly through unindexed pointer dereference. Looping through arrays will rely on mutating the pointers themselves. We see this on the 8080 as well.
6502: The stack on the 6502 is extremely weak, and is best for caching temporary values at the beginning and end of computations to make scratch space available. This is also pretty 6502-specific. The 6809 can actually use stack frames and local variables at a level we’d consider acceptable today.
Z80: Rely on the stack when you can. It is ideally suited for stashing temporaries of all kinds. The Z80, on the other hand, much like the 8080, cannot match the 6809 here and is pretty much restricted to stashing temporaries. I think my experience since then has underlined that the EX (SP),HL instruction (equivalent to the 8080’s XTHL) does a very significant amount of work in making the Z80 stack more usable mid-function than the 6502. We don’t see proper stack frames amongst the eights until the 8086, and even that requires jumping through a few hoops that the 386 ultimately made unnecessary.

Beyond that were some extremely specific pieces of advice for each chip. For the 6502 I had these:

Inner loop variables in 6502 code should generally be held in index registers, with memory taking a distant second place and the accumulator being most unsuitable. This still holds up pretty easily; it’s a direct consequence of the instruction set and the amount of time each instruction takes.
When working through a set of arrays, the innermost loop variable should be held in the Y register. This one, however, I would add some caveats to. The idea here is that the indexed-indirect mode requires the Y register to be its index, and you’ll be much more efficient if that’s also your loop variable. That’s great when you can get it, but you can’t always, and when you can’t, the earlier advice becomes more important. In those cases, use X as the innermost loop variable and leave Y to control the indexing.

For the Z80, I offered these:

The Z80 provides a grab-bag of specialized capabilities, so it’s often worth reviewing the instruction set to see if it’s got something for you to use. This is less advice, I think, and more a warning that more work has to be done to really get the whole design properly in your head. There’s just a bunch more opcodes and a larger number of total moving parts. To the extent that the design is more chaotic (for example, JP allowing branches on more conditions than JR), it’s pretty clearly due to the need to fit the new capabilities around the existing ones of the 8080.
When assigning registers to tasks, prefer HL for standalone or source pointers, DE for destination pointers, and BC for count values. When using 8-bit loop counters, one of them should use B so that it may take advantage of the DJNZ instruction. This is solid advice as far as it goes, but it boils down to “your own ABIs should play nice with DJNZ and LDIR.”

I think I would add one new guideline here: Z80 code is often faster when it’s also 8080 code. The fastest instructions are single-byte instructions that only touch registers, and the only Z80 instructions that do this that aren’t also in the 8080’s instruction set are the shadow-register instructions EX AF,AF' and EXX. In particular, I find I need to carefully justify any use of the Z80-specific bitwise logic instructions; 8080-friendly rephrasings are very often smaller and faster once we look at the final sizes and cycle counts.

Next week: we put theory to practice and render the same algorithm on four chips at once.

Comparing the Z80 and 6502 to Their Relatives