One important part of a BASIC interpreter is string handling. This can be done in two ways: either by using static strings or by using dynamic strings.
On Commodore 64 BASIC 2.0, strings are dynamic. That means, they don’t have a defined fixed length. Instead, any string can be from 0 (null string) to 255 characters long, and can be changed at any time within a program. That allows for flexibility, but it does have a rather important disadvantage, as we will see in a moment.
Other BASICs, like Sinclair BASIC or Atari BASIC, use static strings. That means that strings, once created, have a fixed length. For instance, on Sinclair BASIC, it is possible to create a string variable of virtually any length. But, once it is created, it stays that length. The same concept applies to string arrays.
On Commodore 64 BASIC, even if an array has already been created, it is still possible to re-assign to any element a new string longer than the old one. No truncation occurs. On static string arrays instead, the common maximum length for elements must be carefully selected, so that it allows for shorter and longer strings to be put on the array. If a string happens to be longer than that limit, it will be truncated.
So, apparently, the use of dynamic strings may seem better than static ones, but this is definetely not always the case. String handling on the Commodore 64 BASIC does have a big design flaw. This problem is called slow garbage collection. This is not a Commodore 64 only issue, but it is related to different 8 bit computer models using Microsoft BASIC. Commodore fixed the issue at some point, and newer versions of BASIC (BASIC 3.5, 4.0 and 7.0) do not suffer from this problem.
When using dynamic strings, space for them must be carefully allocated. As their length may vary, this is not straightforward. When a string variable is created, its name, its length and a pointer are stored in the variables storage area. Name, length and pointer make together a descriptor. The pointer tells the BASIC interpreter where the content of the string lies in memory. If within a program, we have an instruction such as B$ = “HOME”, the pointer will make reference just to the part of the program code containing the word HOME. Instead, if a program has an instruction such as B$= CHR$(65), a string containing the “A” character will be created and stored in memory. The pointer of B$ will point on a location of a storage area specifically designed to store strings. That’s were the “A” character, the content of B$, will be put.
String operators such as CHR$ and STR$ may produce garbage, that is, strings that are created at the moment, stored in the string storage area and no longer used. On the other hand, if a variable is re-assigned, a new shorter or longer string will be created and the pointer of that variable will make reference to that string. The old string will become unused, but it will be still on memory. No space for it will be reclaimed at this stage. It will be not replaced and it will become garbage.
The string storage area is created starting from the Top of BASIC (TOB). Low byte and High byte of the Top of BASIC memory address are stored on location 55 and 56 decimal on the Commodore 64. The string storage area grows backwards from that location. This area is between the TOB and the End of Variables address (Actually, this address is called the End of Arrays).
The End of Arrays is the address at the end of the whole Variable Storage Area (the area for storing numeric variables/arrays, and string variables-string array descriptors).
Please note that when creating a string array, two things happen. When this array is dimensioned, descriptors for all the array elements are created in the variable storage area (on the area for arrays). When elements of this array are assigned, strings are created in the string storage area. Each pointer in the array descriptors will make reference to them.
So, when an array is created and its elements keep being assigned, the bottom of the String Storage area is moving down, towards the end of arrays. Please remember this later.
When many strings are created, either as a result of string variables / arrays assignments or strings manipulations resulting in garbage, the String Storage area becomes bigger and bigger, going towards the end of arrays.
At a certain point, memory is going to be full, and some action to reclaim memory space used by garbage must be taken. BASIC 2.0 contains a routine specifically designed for this, called the “Garbage collector”. It performs the so called “Garbage collection” process. In short words, by scanning pointers of existing string variables and arrays, it makes a “defragmentation” of the string storage area, so that useful strings will be adjacent to one another, removing the garbage. The process must be done with extreme care, so that no useful string is harmed.
Sadly, this routine can be very slow. Since it has to scan each string variable and array element, if there are many string variables and arrays in memory it will become very slow. On some cases, it may take minutes or even hours to complete its task. When strings are involved, very carefully programming is in order.
Evaluating speed of the garbage collection
Suppose we have the following program:
5 dim a$(800) 10 for t = 1 to 800 20 a$(t) = "a" 30 next t 40 print "garbage collection..." 50 print fre(0) 60 print "done."
The instruction PRINT FRE(0) is used to print free memory, but it actually starts the garbage collection routine. An instruction such as X = FRE(0) will start that routine as well.
If you run the above program, you will see that the garbage collection process takes no time. That’s because this program has produced no garbage. In line 20, the assignment A$ = “A” produces no garbage because the content of A$ is taken directly within the program, so no garbage string must be created.
However, if you change line 20 in:
20 a$(t) = chr$(65)
you will notice that, this time, the garbage collection routine will take some time. This is due to the fact that the string operator CHR$ creates a string that is stored in the String Storage Area and becomes garbage. This happens on each iteration, so the above FOR… NEXT cycle produces many garbage strings.
If you reduce the size of the array, let’s say to 400, you will see that the garbage collection is much faster. Despite array size has been halved, it actually takes a quarter of the time previously required. That means, as string variables in memory grow, the garbage collector becomes slower by a reason of the square of the number of string variables. So, when manipulating big string arrays, the time for garbage collection will more than likely become enormous.
How to fix the problem
The following little program will clearly show the problems that may arise due to a slow garbage collection routine.
5 rem not working program 10 dim a$(4600) 20 for t = 1 to 4600 30 a$(t) = "#" + str$(t) 40 print t : next
This program is actually a killer routine. A string manipulation inside a long loop is very likely to create a lot of garbage. In this case, line 20 produces garbage. Garbage is created by the STR$ string operator, which converts the number hold by T to a string. Then, a string with the result of the concatenation is created so that it can be assigned to A$(T). These strings are put in the String Storage Area, on that order. So, on each loop, this routine creates garbage and a good string. As the program is being executed, the string storage area becomes populated of scattered good strings with as well scattered garbage strings. This leads to a high level of defragmentation.
At a certain point, the String Storage Area becomes very big, and its bottom gets very near to the End of Array Address. At this point, the garbage collector is activated, but the machine will seem just dead. There are thousands of array elements to scan, and to make things even worst the String Storage area is highly defragmented due to scattered garbage. The program would take hours to complete, not to mention that it will in addition generate an ?OUT OF MEMORY ERROR, due to the fact that the array is too big for the available BASIC memory.
The STR$ operator adds a space in front of the number when producing the string. This space adds an additional useless byte to each array element. Truncating that space is possible to reduce the memory required by the array, but that requires an additional string operation inside the loop. That would just add even more garbage.
Trying to repeat the garbage collection within the program is of no help either. It just slows down things even more! In facts, strings keep growing as the program is running, so on each call the garbage collection will be slower and slower, on a square of the number of elements rule!
It may seem that it is not possible to create such an array with Commodore 64 BASIC. Array strings and descriptors seem to take too much memory, the garbage produced by strings manipulation eventually fills the memory, and the slow garbage collection routine just makes a disaster. While doing garbage collection, the RUN/STOP key is inactive. To escape that program, you can at least press RUN/STOP + RESTORE.
A fix does exist however, and I managed to create this array with a little fast BASIC program. The code will be shown at the end of this article.
However, program performance may be greatly improved by performing several local garbage collections.
Local garbage collection
The garbage collection routine will scan string variables and string arrays only as long as there are strings in the String Storage Area. If we make it to appear that there are little variables and garbage just before calling the garbage collection routine, it will be executed in a very short time.
Let’s have another look at the following example:
5 dim a$(800) 10 for t = 1 to 800 20 a$(t) = chr$(65) 30 next t 40 print "garbage collection..." 50 print fre(0) 60 print "done."
There is an important vector that we must know about. It is the pointer/index to the String Storage area. It keeps track of the current available address were a string must be stored from. Low byte is on location 51 decimal, high byte is in location 52 decimal.
Now, the used string Storage Area extends from the Top of BASIC to the address this pointer makes reference to. So, if we temporary replace the Top of BASIC with the address contained on the string area pointer, the garbage routine if called would just think no garbage is present, and it will run instantly. If we then create a few strings and garbage, if the routine is called it will just think only those strings exist, and it will perform its job very fast. And this will happen no matter how many strings were in memory before changing the TOB pointer. After operations are done, the TOB must be replaced to its previous value, so that it is possible to use all the variables that are in memory.
The example given above can be modified so that it performs two faster garbage collections:
100 dim a$(800) 110 for j=1 to 400 120 a$(j)=chr$(65) 130 next j 140 print "garbage collection..." 150 print fre(0) 160 print "done" 210 a1=peek(55):a2=peek(56) 220 poke55,peek(51):poke56,peek(52) 240 forj=401to800:a$(j)=chr$(65):next 250 print"garbage collection...":print fre(0):print"done." 260 poke 55,a1:poke56,a2
As you can see, array elements are created with two iteration loops, 400 elements for each.
The first time, the garbage collector is called when only 400 elements are in memory. So, the collection is faster than for 800 elements. This is also true for the second time, but before calling the collector, the TOB pointer is modified. This way, the second collection will take the very same amount of time of the first collection, even if 400 array elements were already created. By changing the TOB pointer assigning it to the String Pointer, the previously created 400 array elements and the garbage produced by the previous iterations are just hidden to the collector.
At the end of the program, the TOB is restored so that all variables are accessible.
If you take a look at the total amount of time required by the program, you will notice that some seconds are earned with respect to the previous program version performing global garbage collection.
If we split the garbage collection even more, we will end up having several fast garbage collections which will result in an overall greatly faster program execution.
Let’s try to apply this principle to the above “killer routine” creating 4600 elements and let’s see what happens. For now, I reduced the number of elements to 4000 to be sure that the array will fit on available BASIC memory. More elements are actually possible, however.
50 ti$="000000" 100 rem multi garbage split 110 dim a$(4000) 120 a1=0:a2=0:k1=1 125 a1=peek(55):a2=peek(56) 130 for si=1to40 135 poke 55,peek(51):poke56,peek(52) 140 for t=k1tok1+99:a$(t)="#"+str$(t):next 150 print fre(0):k1=k1+100:next 160 poke 55,a1:poke56,a2 170 print ti$
As you can see, the program is still slow, but it is much, much faster than the previous “killer” program. 4000 elements are created in “only” 19 minutes. Of course, you can run this program in VICE and set the warp mode for your convinience.
All garbage collections require the same time to be executed, because by the time each call is made, the garbage collector thinks only 100 array string elements and only a fraction of the garbage are present.
After executing the program, you can print the array content by entering this command line in direct mode:
for t = 1 to 4000:print a$(t),:next
As you can see, all elements are on their place.
Now, in order to create 4600 elements like the original killer program, we could try to truncate the additional space created by the STR$ operator. This way, a byte for each array element could be saved.
50 ti$="000000" 100 rem multi garbage split 110 dim a$(4600) 120 a1=0:a2=0:k1=1 125 a1=peek(55):a2=peek(56) 130 for si=1to46 135 poke 55,peek(51):poke56,peek(52) 140 for t=k1tok1+99:a$(t)="#"+right$(str$(t),len(str$(t))-1):next 150 print fre(0):k1=k1+100:next 160 poke 55,a1:poke56,a2 170 print ti$
An additional string manipulation has been introduced inside the loop, but the splitted local garbage collection scheme works anyway. Still, this routine takes 25 minutes to be executed… that’s indeed slow. But much better than hours.
The problem here is that we still have to use the slow garbage collection routine. In order to obtain a faster routine, we must find a way that allows us to perform the job without any garbage collection.
Static garbage storage area
Despite the slow garbage collection issue, the following program will create the infamous 4600 elements string array in just 4 minutes and 29 seconds.
You can download the code here: STATIC GARBAGE ROUTINE
As you can see, the program never calls the garbage collection routine. The BASIC interpreter never calls that routine either.
The principle here is as follows: since garbage strings are used only once within each iteration, why leaving them unused in memory all the time? If we have no garbage strings, there is no need for collecting them either.
This program sets up a static garbage storage area, which is used indipendently from the String Storage Area. This is done in BASIC with no additional utilities. This way, garbage strings no longer waste memory. Those strings are just written again and again on this small restricted memory area. For this program, setting up a 20 characters static garbage storage area was enough.
The program works this way. At the very beginning of the program, actual string index is stored (on variables G1 and G2). Then, a dummy string manipulation instruction is issued (see line 10 of the above program). This instruction is useless on its own, but it does actually one important thing: it creates the static storage area for garbage, by creating a garbage string (the 20 spaces). Please note that at the moment the string index is just below the static storage area, and it is ready to write other strings in memory. From this point on, we will be writing USEFUL strings (those that will be referenced by the array), so we must save current index pointer. This is done in line 15, before the FOR… NEXT loop. S1 and S2 hold the current index for useful variables storage.
Inside the iteration, the initial value of the string index is retrieved. This way, garbage strings that will be created from now on will just be put on the static garbage storage area. Just what we wanted.
After the following garbage-making instructions are performed:
B$ = STR$(T) B$ = RIGHT$(B$, LEN(B$)-1)
the string index is assigned the value contained in S1 and S2. This way, the string index now points to the area below the static string storage area, which will be used to write USEFUL strings into.
Now, current string index value must be saved again. So, variables S1 and S2 are updated with that value, so that when useful strings will be written again on the next iteration, the string index can be set so that those are written just below previous useful strings.
On the next iteration, the instructions:
POKE 51,G1: POKE 52,G2
change the string index so that it points to the garbage static area again. From now on, the cycle continues like above described, until all iterations are performed.
Please note that G1 and G2 stay the same during all the iterations: garbage strings are written again and again on the same 20 characters long static garbage area. S1 and S2 are instead updated, so that the strings of the array can be written in memory, one after the other.
At the end of the program, we end up with the desidered array in memory and with no garbage.
Such coding do require careful use. Pointers must be set properly. But, despite of this, this code is quite efficient. This principle can be used on any program, for instance changing the string index when needed by the use of simple subroutines, reducing the risk of programming mistakes.
We now have an advantage over static strings based interpreters. We can add to some array elements even lengthy strings without any problem. Once we have setup such an array with all those numbers converted to strings, it is now reasonable to add some notes near some numbers for instance.
If we write the instruction A$(2600) = A$(2600) + ” I LIKE MY ATARI 2600″, the string will be successfully assigned.
We can off course add other notes as well to other numbers, as long as there will be free memory.
On a machine with static strings, that longer string would have been just truncated. And to be able to write such a note, we would have to make ALL the array elements that long. As it is easy to note, that would be a big waste of memory space, and furthermore that space may be just impossible to have on some if not most 8 bit computers.
On those classic systems, the static strings approach seems to be better for handling large arrays containing very short strings of a common fixed size. For medium sized arrays containing strings of different sizes, using dynamic strings offers the flexibility required. For instance, a task like storing a little vocabulary may be performed better by using a dynamic string array, as the various words may have very different lengths.
For further information on the topic, you may have a look here:
On the Compute! magazine you will find excellent articles by Jim Butterfield on the topic.