How to use the interpreter
By writing BASIC code for the boot sector, I, in fact, transform my PC into an analogue of old home computers, like Commodore 64 or ZX Spectrum: this programming language was hardcoded in their RAM, and users could start programming in BASIC immediately after the boot-up.
For convenience purposes, I formulate the terms of reference (i.e. requirements to my interpreter) as a user manual.
The interpreter should support two modes: (1) interactive mode; and (2) command line mode. In the interactive mode, commands are executed immediately after the input.
In the command line mode, you input the program code into the memory and then execute the
If you need to delete a string from the program code, simply enter its number in the command line.
How does the interpreter know what mode to use for processing of the entered text? If the string starts with a digit, the interpreter processes it in the command line mode; if not, in the interactive mode.
The maximum program size is 999 strings. The maximum string length is 19 symbols. Note that Backspace key operates as usual – even though the characters don’t disappear on the screen, in the buffer everything is OK.
The programmer operates with:
- three commands:
list(display program code on the screen), and
- 26 variables (from
z): two-byte integer numbers without signs;
- expressions that can include numbers, four arithmetic operations, brackets, and variables;
- three operators: :
- two functions:
The interpreter understands the following language constructs:
var=expr: assign the
exprvalue to the
exprvalue and move the cursor to the next line;
exprvalue and leave the cursor on the current line;
input: read value from the keyboard and place it into the
goto: jump to the specified string in the program; and
expr1 goto expr2
expr1is not equal to 0, jump to the line
expr2; otherwise go to the line specified after
if (if c-5 is not equal to 0, jump to line 2).
Coding the interpreter
First, I define memory areas to be used by the program:
- buffer for text entered in the command line;
- buffer storing the program code;
- data array to store variables (from
- pointer to the current program line.
I start from pointing all segment registers to
CS. Then I remove the direction flag so that the lines are processed only from left to right (even when the program calls instructions like
stosb). I fill the buffer intended for the program source code by the
0x0D symbol (the carriage return character also known as the Enter key).
The program code written in BASIC will be processed as a two-dimensional character array 1000 × 20 in size.
If the user enters more than 19 symbols in one line, this line will extend to the next one. In the current interpreter version, this bug is not fixed. Just remember not to enter more than 19 symbols in one line.
Launching the main loop
First, I restore the stack pointer (
SP register) – in case the program in BASIC crashes due to an error.
Then I discard the
running pointer (current line of the program) and call the
input_line subprogram, which is waiting until the programmer types something. The subprogram saves the inputted line in the
Next, I check whether the line starts from a digit or not. If it does, it must be written to the buffer allocated for the program code. To do this, I first calculate the address to write the string to. The
find_address subprogram is responsible for this (it saves the result to the
DI register). After finding the required address, I copy the string there:
If there is no digit in the beginning of the string, it is executed immediately:
Processing program strings
Program strings are processed as follows: I take the first word from the string and sequentially compare it with each record in the
@@statements table (see the final piece of code in the end of the article). This table stores commands, operators, and functions understandable for the interpreter.
Note the heuristics I use to save bytes on processing the conditional operator: before the entry point
execute_statement, I add an additional entry to the same subprogram:
As a result, I avoid the need to write an additional handler for constructs like
if . If the expression result (in this particular case
a-2) equals zero, the program does not go into
if, i.e. ignores the rest of the string (
if. Then I process other commands, operators, and functions. I start from skipping extra spaces added by the programmer for convenience purposes. If a string contains nothing but spaces, it is ignored.
If the string is not empty, I examine it in more detail. First, I go through the
@@statements table and compare the string with each of its records. To do this, I read the string size (in case of
run, it is 3) and then perform the comparison using
repe / .
If a match is detected, then the
DI register starts pointing to the address of the respective handler, and I jump there:
jmp [. To get a better understanding of how it works, check the end of the article and review the
@@statements table. Hint: labels starting with
@@ are the handlers’ addresses.
If the entire table has been checked and no matches identified, then the current string of the program is neither a command, nor an operator, nor a function. Perhaps, this is a variable’s name? I jump to
@@to_get_var to check this.
I scroll through the
DI register to the next record of the table by adding
CX (length of the name of the current command, operator, or function plus two more bytes (the handler’s address)). Then I restore the value of the
SI register (
rep has scrolled it forward) so that it points again to the beginning of the string compared with the table of operators.
DI points to the next record in the table. If this is a nonzero record, I jump to
@@next_entry to compare the program string (to be specific, its beginning) with this record.
If the entire table has been checked and no matches identified, then the current string of the program is neither a command, nor an operator, nor a function. If so, this is likely an assignment construct (e.g.
var=expr). In theory, no more variants are possible (unless the source code contains a syntax error).
Now I have to calculate the
expr expression and save the result at the address linked with the
var variable. The
get_variable subprogram calculates the required address and pushes it into the stack.
After finding the address, I check whether an assignment operator exists after the variable name. If yes, it must be executed. But in order to save a few bytes, I won’t do this right here.
Later in the code, I will have to implement an assignment inside the
input function. So, I jump to that piece of code:
@@assign. I don’t need the entire
input function, only its final part. And I won’t return to
execute_statement anymore: the
input function will perform the required
If there is no assignment symbol, I print the error notification and terminate the program (i.e. jump to
@@main_loop). In the main loop, the interpreter will restore the stack pointer and continue its operation despite the encountered syntax error.
The “list” command
list command displays on the screen the program listing saved in the buffer.
First, I reset to zero the current string number in the program:
xor . Then I use the string number to calculate the address from where to read the program string:
find_address. After finding the string address, I compare its first symbol with
0x0D (i.e. check whether the string is empty). If it’s empty, I move to the next string.
If the string is not empty, I display it on the screen. First, I display its number:
output_number. Then I read the string character-by-character from the buffer until I stumble upon
0x0D. In a similar way, I display it on the screen character-by-character.
After that, I move to the next string of the program (
inc ) and repeat the same operations until I reach
max_line (i.e. 1000).
The “input” function
input function enables you to enter a number from the keyboard. It works as follows.
First, the interpreter calculates the address of the variable specified in the program. Then it displays a question mark and waits until the user prints something.
In case the user inputs not just a number but a sophisticated expression, the interpreter processes the inputted string with
process_expr and saves the result at the address calculated in the beginning of the subprogram. The
stosw instruction will assist me in this.
A three-level processing scheme will be applied to expressions:
- Addition and subtraction;
- Multiplication and division; and
- Nested expressions (in brackets), names of variables, and numbers.
At the first level (whose priority is the lowest), I immediately pass the control to the second level by calling
expr2_left. After processing higher-priority operations, I look at the next symbol. If this is an addition or subtraction sign, I process it.
First, I save the left value:
push . Then I pass the right part of the expression to the second level:
expr2_right. After performing higher-priority operations, I take the left value (
pop ) and perform the required operation with the right value:
Finally, I jump to
@@next_sub_add and launch a cycle to calculate expressions like
At the second level, I perform the same operations as described above. First, I pass the control to the higher-priority level by calling
expr3_left. Then I look at the next symbol. If this is a multiplication or division sign, I process it. Similarly to the previous level, I launch a cycle (
@@next_div_mul) in the end to process expressions like
Important: thanks to the three-level structure described above, the interpreter automatically takes into account the priority of operations. For instance,
5*6+3*4 is calculated as
At the third layer, I process brackets (nested expressions), names of variables, and numbers.
First, I remove spaces from the inputted string. Then I look at its left symbol. If this is an opening bracket, I start recursion by calling
process_expr. After processing the nested expression, I check whether it has a closing bracket. If not, I display the error message. If the closing bracket exists, I skip spaces going after it and execute
But what if some other character (i.e. not an opening bracket) is present on the left? Perhaps, this is a variable’s name? Checking this:
If my assumption is confirmed, I start reading the variable’s value. If not, then the current symbol is a fragment of a number. I make a step back (
dec ) and call
dec_str_to_number to read the number.
Subprogram: calculate variable’s address on the basis of its name
This subprogram has two entry points. At the first entry point (
get_var), I read the variable’s letter using
lodsb; at the second one (
get_var_2), I use the variable’s name transmitted via the
AL register. Yes, this is a pretty twisted way, but it saves me a few bytes.
Now that I have the variable’s name, I should calculate the address it’s linked with. This operation is performed in three steps:
- I execute
andto get the variable’s number; at this point, ASCII values
0x61-0x7Aare converted into
- I multiply the result by 2 because each variable uses a 16-bit word in the memory; and
- I add to the address its upper part:
ah, vars>> 8
Voila! I know the memory cell the variable is linked with.
Note that this code is interlinked with the subprogram that skips spaces. The subprogram also has two entry points. The first one (
skip_spaces) skips spaces located after the variable; the second one (
skip_spaces_2) performs the same operation but leaves the first space intact.
Subprogram: print decimal numbers
The subprogram prints on the screen numbers transmitted via the
AX register. It works as follows:
AX is recursively divided by 10. After each operation, the remainder in division is pushed into the stack. After reaching zero, I start sequentially exiting the recursion. Prior to each exit, I pop the respective reminder from the stack and display it on the screen.
A few examples. If
AX , then after the division by 10,
AX will contain 0, and the
output_number won’t start a recursion. Instead, it will just display the reminder (i.e. 4), and that’s it.
AX , then after the division by 10, AX` will contain 1. Accordingly, the subprogram starts a recursion. It displays 1 there, then passes from the internal call into the main one, and prints figure 5 there.
jmp is present in the end; and (2) the program doesn’t have its own
ret instruction. This is another hack enabling me to save a few bytes. I don’t call the subprogram displaying the character but jump to it instead. Then this subprogram executes
Subprogram: from decimal string to hexadecimal number
The subprogram translates a string containing a decimal number (the
SI register points to it) and writes the result to the
A decimal string is converted into a hexadecimal number as follows. I use a cycle to go through all digits from left to right. Every time I multiply the accumulator by 10 (at the first time, it contains zero) and add the current digit to it. If I encounter in the string a character that is not a digit, an error occurs, and I exit.
Note that two operations are present between
cmp and the conditional transfer. So, the final result is always written to the
AX register, even in the case of an error.
goto commands launch the program saved in the buffer.
Run launches it from the beginning, while
goto, from a certain line. In the interpret, both commands are processed by the same subprogram.
When the interpreter sees the
run command, it converts it into
goto first and then handles it similarly to
goto. To convert
goto , the interpreter resets the
AX register to zero and jumps to
I start executing the
goto command from calculating the expression written after
goto. The result (i.e. number of the line to jump to) is written to
Then I use the line number to calculate its address:
Note that the
goto command can be used both in the interactive and command line modes (i.e. it can be used as an operator or entered from the keyboard for immediate execution).
A comparison instruction called
cmp checks whether I have got here from the interactive mode or from the command line mode.
If I got there from the command line mode, I write in the
running variable the address from where the program execution should continue. At the next cycle, the interpreter will retrieve the required string from this address and execute it.
But if I get there from the interactive mode (the programmer uses
run, to enter the program), then I jump to the program string specified in
Below is a subprogram that calculates the string address in the program code by its number. The string number is saved in
AX; I multiply it by the standard string length and add the address of the first string in the program. This is how the required address is calculated.
Subprogram: input strings from the keyboard
The interpreter must be able to accept program strings inputted from the keyboard. The
input_line subprogram is responsible for this. At the input, it receives the command line symbol (
>) via the
First, I display the command line. Then I point to the buffer where the inputted text should be saved. Next, I read the character inputted from the keyboard and save it in the buffer (provided that this is not the Backspace key (
0x08)). If this is Backspace, I reduce the
DI register by one so that it points to the previous character.
The “print” function
The first comparison,
cmp , checks whether this is the first syntax variant (no arguments).
The second comparison,
cmp , checks whether this is one of the two syntax variants used to print text. In this case, the program displays all characters one-by-one until it stumbles upon a closing quote or
0x0D. If it encounters
0x0D, this means that the programmer forgot to enter the second quote. No error messages are displayed in this situation.
Then I check whether a semicolon is present after the quote. If not, the cursor moves to the next line.
At the label
@@no_quote, the program processes the two syntax variants used to print numbers. If there is an expression instead of a number, it is calculated. Then the program prints its result and checks whether a semicolon is present after it. If not, the cursor moves to the next line.
Subprograms: input and print characters
input_key subprogram reads the pressed key using BIOS interrupt
0x00. The ASCII code of the pressed key is written to the
Important: this program overlaps with the next one; therefore, it does not have the
ret instruction in the end. Why use such tricks? To display the character received from the keyboard without spending extra bytes.
output_char subprogram first checks whether the Enter key is pressed:
cmp . If yes, then
0xA is printed in addition to the
0xD character. Without
0xA, the cursor moves left without jumping to the next line. The characters are printed using BIOS: interrupt
Table of commands, functions, and operators
The next step is to create a table of commands, functions, and operators understandable to the interpreter. Each record consists of three fields: string length; command/function/operator name; and entry point for the processing subprogram. Zero in the end of the table notifies the interpreter that the table is over.
The above structure makes it easy to add new commands, functions, and operators. You just add one more line, then insert the processing subprogram in a suitable location, and that’s it. No more changes have to be made in the code.
The last two bytes notify BIOS that the code written in the boot sector is not some nonsense – but an interpretable program (in this particular case, a BASIC interpreter).
Testing the interpreter: the Pascal’s triangle program
Pascal’s triangle is a triangular array of binomial coefficients. The upper row contains figure 1. Each entry of each subsequent row is constructed by adding the number above and to the left with the number above and to the right. Below is the source code to be fed to the interpreter.
If you have done everything right, the interpreter should display something like this.
The interpreter works correctly? If yes, congrats! You have just created a fully featured BASIC interpreter. You may also want to expand its capacity by adding new functions, for instance:
- add the
clscommand (clear the screen);
returnoperators to be able to write subprograms; and
- implement the support of data arrays.