6502 assembly language: a quick overview with Commodore 64 programming examples

This article is a little overview of 6502 Assembly Language, intended for those with little knowledge of 6502/6510 assembly or for those who need to dust off some topics. It will not explain all there is to know about machine language, but it will deal with some aspects that will be very useful for a better understanding of the ML (machine language) programs I will hopefully post in the future. Examples are coded for the Commodore 64 retrocomputer.

Machine Language is the language the computer actually “speaks”. It is a language based on the operative codes (opcodes) of the CPU. Opcodes are instructions that are shaped around the hardware of the CPU.

The 6510 processor of the Commodore 64 is a 6502 compatible CPU. There are some special “memory locations”, inside the CPU, called registers. On a 6502/6510 CPU, registers that are currently of interest to us are:

  • Accumulator (A), used to store a value (an 8 bit number, from 0 to 255 decimal, or 0 to ff hex). It is possible to load a value from memory to the accumulator. It is also possible to directly load a desired value to the accumulator. Those operations involving loading the accumulator are performed by the instruction LDA (LoaD the Accumulator), in its different forms.
    We can also transfer a value hold by the accumulator to memory. This is called “storing” to memory, and it is performed by the instruction STA (STore the Accumulator), in its different forms.
    The Accumulator is the only register that can be used for additions and subtractions.
  •  X register, Y register. Those registers are similar to the accumulator: they can hold an 8 bit value, it is possible to load a value from memory and store it on the X/Y registers, it is possible to store a desired value on X/Y registers, and finally we can load a memory location with the value inside the X, Y registers. The instructions LDX, LDY, STX, STY are designed for those purposes, and are totally similar to LDA and STA instructions.
    Furthermore, LDA, LDX, LDY and STA, STX, STY instructions can be indexed. For example, an instruction such as LDA $800 will load the accumulator with the content of $800. If X holds the value $01, the indexed instruction LDA $800,X will load the accumulator with the content of location $801 ($800+X = $801). This is indexing. An instruction such as LDX $800,Y is also acceptable. LDY $800,X will also work properly. So, X and Y registers are called Indexing Registers, since they act as indexes (like subscripts on arrays).
    Instead, instructions such as LDX $800,X or LDY $800,Y are not allowed.

 

STATUS REGISTERS

When instructions are performed, some things may happen. For example, if the instruction LDA #$00 is executed, the zero flag is set to on (please note that the “#” sign means: load directly the value, not the content of a location). Also, after running the instruction LDA #$ff, the negative flag is set to on. LDA #$ff also sets the zero flag to off, as it loads a value different from 0 on the accumulator.

Please note that those flags can be only on (or 1) or off (or 0).

An instruction such as LDA #$10 will turn the negative flag to off. This is because, for machine language, the following convention may be used when needed: on each byte, positive numbers are from 0 to 127 decimal, and negative numbers are from 128 to 255 decimal. Please note that on negative numbers, bit 7 is always ON. So, negative numbers are just those in which bit 7 is set to on. We will make important use of this feature.

The carry flag is used for mathematics. There are instructions allowing us to set it to on (SEC, SEt Carry) or to off (CLC, CLear Carry) at will. This flag is of course used by the processor to perform additions and subtractions.

In order to do a quick recap of what we have said so far, some important facts are as follows:

  • The Accumulator is a register used to store a value from 0 to 255 decimal (or 0 to $ff hex).
  • LDA $800 loads the content of location $800 in the accumulator.
  • LDA #$01 loads the VALUE $01 on the accumulator.
  • STA $C000 transfers the value currently hold by the accumulator on location $C000.
  • LDX $800 loads the content of location $800 in the X register.
  • LDY $800 loads the content of location $800 in the Y register.
  • STX $A00 stores the content of the X register on location $A00.
  • STY $A00 stores the content of the Y register on location $A00.
  • LDA $800, X loads the accumulator with the content of $800 plus the content of the X register. So, a program such as:
    LDX #$02
    LDA $800,X

    will load the accumulator with the content of location $802 ($800+X). The X register acts here as an index register. Please note that the content of the index is added to the memory location address, not to its content.

  • There is no POKE equivalent on 6502 assembly language. It is not possible to load a memory location with a value within a single instruction. So, to change the border color to black, the BASIC statement POKE 53280,0 must be replaced with:
    LDA #$00
    STA $d020

    Again, please note the “#” sign before $00, as we want the accumulator to be loaded with the value 00, not with the content of location $00. On the STA instruction, as we want to store the content of A on a memory location, the sign “#” must NOT be used.

A simple program we may have a look at can be the following:

;assembly language program 01 (this is just a comment)
LDA #$00
STA $d020
LDA $c000
STA $d021
RTS

It sets the border to black, and it sets the background color according to the content of location $C000.

RTS means: return the control to the caller. If we call this program from BASIC, after it will be executed, BASIC will be “reactivated” from the very exact point it was stopped to call the machine language program.

The BASIC language equivalent is as follows:

 5 rem program 01
10 poke 53280,0
20 poke 53281,peek(49152)

Please note that LDA, STA etc are mnemonics. Those are words that let you remember CPU instructions more easily. But, on the hardware, those instructions are stored with numbers. And, LDA and LDA # are actually two different instructions.

Each number that represents an instruction is called “operative code”.

LDA # opcode is 169 decimal. STA opcode is 141 decimal. LDA opcode (like used on the above program) is 173 decimal. RTS is 96 decimal.

Each number in memory can be only from 0 to 255 decimal. That’s no problem for the first instruction: LDA #$00 is stored in memory as the following two bytes:

169, 00     (decimal values)

STA $d020 is a problem. $d020 cannot be stored on a single RAM location. So, two RAM locations are used, the first one with the low byte, and the second with the high byte. So, STA $d020 is represented in memory by the following bytes:

141, 32, 208     (decimal values)

141 is the opcode for STA (as used in the above program), while 32 and 208 represent the location 53280. In facts:

high byte = int (53280/256) = 208

low byte = 53280 - high byte *256 = 32

High byte tells how many times the value 256 is contained on a 16 bit number. The low byte holds the “remainder”.

On 6502, addresses are given in the format low byte – high byte. This was part of the 6502 design to give it more speed.

53280 decimal is $d020 hex. Please note that the low byte is made up of the last two digits ($20), and the high byte is made up of the first two digits ($d0). So, given a hex address, obtaining its low byte and its high byte is quite straightforward.

So, the above instructions can be all easily represented on memory by “splitting” their arguments when needed (the memory addresses such as $C000 and $D021). As a result, the above assembly language program (which uses mnemonics) can be translated to the following machine language program (using only numbers).

Machine Language program 01:
169,0,141,32,208,173,0,192,141,33,208,96

The two programs can be shown together for the sake of clarity:

LDA #$00            169,00 
STA $d020           141,32,208
LDA $c000           173,00,192  
STA $d021           141,33,208 
RTS                 96

(Note: opcodes are in bold).

The program on the left, is Assembly Language, that is, machine language with mnemonics that is used by assemblers and that is more readable to humans.

The program on the right is Machine Language, that is, a program written as a sequence of numbers, exactely like it is on memory.

However, informally the terms “machine language” and “assembly language” are used interchangeably.

We can assemble the program (that is, translating it from mnemonics to numbers only and putting them in memory) directly from BASIC, for example by using this BASIC program:

5 m=4096
10 reada:ifa=-1thenend
20 pokem+c,a:c=c+1:goto 10
30 data 169,0,141,32,208,173,0,192,141
40 data 33,208,96,-1

After running the BASIC program, the machine language program can be run with the BASIC instruction SYS 4096. BASIC will regain control thanks to the instruction RTS inside the ML program.

This is not handy and there are actually tools called assemblers.

Tools required

As we have seen, machine language programs can be poked on memory by the use of POKEs. This approach is of course really tedious. Very early monitors made the task easier, but they only allowed to insert numbers. Most common monitors, instead, do offer the possibility of using mnemonics. This way, you can also use letters instead of only numbers.

An assembler is the best for machine language coding, as it allows to use labels and symbols. Labels represent memory locations containing code with names, and symbols represent values with names. This way, there is no need to make always reference to real locations while coding. Monitors, instead, require that you always enter real addresses on your code.

I tend to prefer cross-platform assemblers, so that I can write and assemble the code on a PC. This is faster and easier than using a real Commodore 64.

I use the cross-platoform assembler DASM. You can write your assembly language program by using a simple text editor. Once the file has been saved (as a .txt file for example), it can be assembled by using a simple command (for example, from the Windows Command Prompt).

The above simple program can be written in assembler format as follows:

* = $1000

  LDA #$00
  STA $d020
  LDA $c000
  STA $d021

  RTS

The code:

* = $ 1000

is a notation for assemblers and means: assemble the program starting from $1000 (4096 decimal).

Suppose you have saved your source code with the name “myprog.txt”. Then, you can assemble it with the command:

dasm myprog.txt -omyprog.prg

Decision making, branches and jumps

The Status Registers change their state pending on the results of the execution of some instructions. Those results may be used for decision making. Each Status Register may be either off or on, zero or one.

Branch instructions are similar to the IF… THEN GOTO BASIC instruction.  The condition upon which the branch is taken depends on a particular Status Register.

  • BEQ performs the branch if the zero flag is set to 1.
  • BNE will perform the branch if the zero flag is set to 0.
  • BPL will perform the branch if the negative flag is set to 0.
  • BMI will perform the branch if the negative flag is set to 1.
  • BCS will perform the branch if the carry flag is set to 1.
  • BCC will perform the branch if the carry flag is set to 0 (or clear).

For instance, the instruction:

BEQ $1100

will branch to location $1100 if the zero flag is set to 1.

As we have seen, Status Registers can be changed by LDA, LDX and LDY instructions. But, there are some decision making instructions that will change the status registers as well.

  • CMP will compare the content of the accumulator with a value (CMP #value) or with the content of a location (CMP location). If the result is “equal”, the zero flag will be set (Z=1). If the result is “not equal”, the zero flag will be cleared (Z=0).
  • CPX will compare the content of the X register with a value (CPX#value) or with the content of a location (CPX location). If the result is “equal”, the zero flag will be set (Z=1). If the result is “not equal”, the zero flag will be cleared (Z=0).
  • CPY will compare the content of the Y register with a value (CPY #value) or with the content of a location (CPY location). If the result is “equal”, the zero flag will be set (Z=1). If the result is “not equal”, the zero flag will be cleared (Z=0).

We can also jump on a part of a program without doing any testing. This is a jump that is taken without any necessary condition (like a GOTO instruction in BASIC). This is done by the instruction JMP, followed by a memory address.

JSR is the equivalent of the BASIC instruction GOSUB: it jumps to another part of the program, then it goes back to the instruction after the JSR itself. The return to the caller is performed, in BASIC, when the instruction RETURN is encountered. In machine language, likewise, the return to the caller is performed when the instruction RTS is encountered.

 

Increments, decrements and transfers

There are instructions specifically designed to increment by one or decrement by one some registers or a memory location.

  • INX will increment the X register by one.
  • INY will increment the Y register by one.
  • DEX will decrement the X register by one.
  • DEY will decrement the Y register by one.

The above instructions have no argument.

  • INC will increment the content of a memory address by one. INC $1000 increments the content of location $1000 by one.
  • DEC will decrement the content of a memory address by one. DEC $1000 decrements the content of location $1000 by one.

So, INC and DEC do have an argument instead, which is a memory location.

Those instructions will affect status registers and can be used in conjunction with compare instructions and branch instructions as well. Furthermore, they can be used along with indexed instructions, so that it is possible to create loops where for example an index grows on each iteration.

Now, there are simple instructions which make it possible to transfer the content of a register to another one. Those instructions are:

  • TAX: transfers the content of A to the X register;
  • TAY: transfers the content of A to the Y register;
  • TXA: transfers the content of the X register to A;
  • TYA: transfers the content of the Y register to A.

 

Program example

A simple program using most instructions we have been talking about may be useful at this stage. The following program will show the colors of the Commodore 64 by changing background color in a loop. A pause will be placed between each color switching.

Furthermore, before changing the color, we will wait for the rasterbeam to be on the bottom border, so that no flicker will happen while setting background color. To do that, we will use the instruction:

loop      lda $d011
          bpl loop

This is the same as the BASIC statement:

WAIT 53265,128

We already know this instruction from the simple BASIC programs I posted earlier on this blog (for example, you may refer to this article). In machine language, the negative flag is set to 1 when bit 7 is set to 1, so when this happens the branch dealing with the instruction “BPL LOOP” will not take place. As we know, bit 7 of location $d011 (53265 decimal) is set when the rasterbeam is out of the viewable area. Just what we need.

Again, you may wonder why numbers with no sign can be seen as either positive or negative in machine language. It’s just a convention: for the negative flag of the 6510, numbers from 0 to 127 are positive, numbers from 255 to 128 are negative (as we already know). This is used for signed numbers math. Note that on negative numbers, bit 7 is always set.

It is also important to notice that, on machine language, if a location (or a register) contains the value 255 ($ff), if you increment it by one you get no error. As each location can only hold 8 bit numbers, if you increment 255 by one you obtain 0. If you increment that location again, you get 1, then 2 and so on. So, counting on a single byte is from 0 to 255, then it just wraps around.

Source code – background color change.

PRG file. Use SYS 4096 to run the program.

An example on indexing

As we have mentioned earlier, instructions such as LDA and STA can be indexed with either X or Y registers. X or Y can hold any value from 0 to 255 decimal.

We may print a message on the screen, then we can shift/scroll it to the right by one character. To do so, indexing is really helpful, and you can see how it works with both LDA and STA. Of course, the text must be no longer than 256 characters.

Shift message (source code)

Shift Message (prg file)

 

Self-modifying code

It is very handy to use self-modifying code in machine language. Assemblers allow us to easy find low byte and high byte of an address being the argument of 6502 instructions.

For instance, the following program sets the border to white:

* = $1000
border = $d020

          lda #$01
modify    sta border
          rts

As we know, “modify” is a label that represents the location the statement “sta border” starts from. By now, we know that such a statement is stored in memory with three bytes. One byte, containing the operative code for STA. Another byte, containing low byte of address border ($20 or 32 decimal).  One more byte, containing high byte of that address ($d0 or 208 decimal).

So, location “modify” holds the opcode of STA, “modify+1” holds the low byte of address $d020, “modify+2” holds the high byte of address $d020. If we change the content of memory locations “modify+1” and “modify+2”, the statement “STA BORDER” will operate on a different address.

With that in mind, the following program will change border and background color to white:

Change border self-modify (source code)

Prg file (SYS 4096)

Each time you run the program, it must always behave the same way. It may seem not worth saying, but with self-modifying code this is definetely to always keep in mind. The first thing such programs must do is to initialize any code that will be modified later. This way, running the program more times will have no issue. The code at the beginning of the program:

lda #<border
sta modify+1

performs such initialization. Please note that the “<” operator means “take the low byte of the following address”. So, LDA in the above code loads into the accumulator the value $20 and stores it on location modify+1, which is just the part of the code that will be later modified.

This may seem not very useful here. By the way, the use of this kind of code allows you to access any memory location, without the $ff limit of the maximum index value. You can accomplish this even without self modifying code of course, but I think this kind of approach is very handy.

Now, let’s look at another example. The following program will fill the text screen with “A” characters. As you can see from the code, it uses an index to reach 250 adjacent locations. Then, it modifies the code so that the address used by the statement “sta screen,x” is incremented by 250. This way, after the value 250 has been added to the address used by that statement, it is possible to reach with the X index the following 250 locations, and so on. This program actually reaches 1000 locations by accessing 4 blocks of 250 bytes, in sequence.

Filling with A (source code)

Filling with A (prg file)

As you can see, the program makes use of mathematics. The value 250 is added to the current screen addres in order to reach the next 250 locations with the index. Addition is performed by using the instruction ADC (ADd with Carry). ADC adds current content of the accumulator to the value following ADC. For example:

CLC
LDA #$01
ADC #$03
STA result

This program performs the addition 1 + 3 and stores the result in the location “RESULT”.

The instruction CLC (clear carry) must be performed first. That is because the carry will be ALWAYS added. Since we are only adding 8 bit numbers, no carry is needed.

The “filling with A” program, instead, performs a 16 bit addition. On this case, the carry is important.

 clc
 lda #250
 adc modify+1
 sta modify+1
 lda #00       ;high byte of value 250
 adc modify+2
 sta modify+2

As you can see, the value 250 is added to the address made up of locations modify+1 and modify+2. 16 bit addition of two 16 bit numbers  is performed by clearing the carry, adding together the two low bytes, then adding together the two high bytes. 250 is actually an 8 bit number, but it can be seen as a 16 bit number with a 0 high byte.

At the beginning, the carry is cleared. But, before adding the high bytes together, please note that the carry flag is NOT cleared. This is because the first addition can produce a carry, and we must take it into account if we want to get a correct final result.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Insert math as
Block
Inline
Additional settings
Formula color
Text color
#333333
Type math using LaTeX
Preview
\({}\)
Nothing to preview
Insert