How to use the interpreter
By writing BASIC code for the boot sector, I, in fact, transform my PC into an analogue of old home computers, like Commodore 64 or ZX Spectrum: this programming language was hardcoded in their RAM, and users could start programming in BASIC immediately after the boot-up.
For convenience purposes, I formulate the terms of reference (i.e. requirements to my interpreter) as a user manual.
The interpreter should support two modes: (1) interactive mode; and (2) command line mode. In the interactive mode, commands are executed immediately after the input.
In the command line mode, you input the program code into the memory and then execute the run
command.
If you need to delete a string from the program code, simply enter its number in the command line.
How does the interpreter know what mode to use for processing of the entered text? If the string starts with a digit, the interpreter processes it in the command line mode; if not, in the interactive mode.
The maximum program size is 999 strings. The maximum string length is 19 symbols. Note that Backspace key operates as usual – even though the characters don’t disappear on the screen, in the buffer everything is OK.
The programmer operates with:
- three commands:
run
(run program),list
(display program code on the screen), andnew
(delete program); - 26 variables (from
a
toz
): two-byte integer numbers without signs; - expressions that can include numbers, four arithmetic operations, brackets, and variables;
- three operators: :
if
,goto
, and=
; and - two functions:
print
andinput
.
The interpreter understands the following language constructs:
-
var=expr
: assign theexpr
value to thevar
variable (froma
toz
); -
print
: display theexpr expr
value and move the cursor to the next line; -
print
: display theexpr; expr
value and leave the cursor on the current line; -
print
: print the line and move the cursor to the next line;"][ ello" -
print
: print the line and leave the cursor on the current line;"][ ello"; -
input
: read value from the keyboard and place it into thevar var
variable (froma
toz
); -
goto
: jump to the specified string in the program; andexpr -
if
: ifexpr1 goto expr2 expr1
is not equal to 0, jump to the lineexpr2
; otherwise go to the line specified afterif
.
Example if
(if c-5 is not equal to 0, jump to line 2).
Coding the interpreter
First, I define memory areas to be used by the program:
- buffer for text entered in the command line;
- buffer storing the program code;
- data array to store variables (from
a
toz
); and - pointer to the current program line.
I start from pointing all segment registers to CS
. Then I remove the direction flag so that the lines are processed only from left to right (even when the program calls instructions like stosb
). I fill the buffer intended for the program source code by the 0x0D
symbol (the carriage return character also known as the Enter key).
The program code written in BASIC will be processed as a two-dimensional character array 1000 × 20 in size.
If the user enters more than 19 symbols in one line, this line will extend to the next one. In the current interpreter version, this bug is not fixed. Just remember not to enter more than 19 symbols in one line.
Launching the main loop
First, I restore the stack pointer (SP
register) – in case the program in BASIC crashes due to an error.
Then I discard the running
pointer (current line of the program) and call the input_line
subprogram, which is waiting until the programmer types something. The subprogram saves the inputted line in the SI
register.
Next, I check whether the line starts from a digit or not. If it does, it must be written to the buffer allocated for the program code. To do this, I first calculate the address to write the string to. The find_address
subprogram is responsible for this (it saves the result to the DI
register). After finding the required address, I copy the string there: rep
.
If there is no digit in the beginning of the string, it is executed immediately: execute_statement
.
Processing program strings
Program strings are processed as follows: I take the first word from the string and sequentially compare it with each record in the @@statements
table (see the final piece of code in the end of the article). This table stores commands, operators, and functions understandable for the interpreter.
Note the heuristics I use to save bytes on processing the conditional operator: before the entry point execute_statement
, I add an additional entry to the same subprogram: @@if_handler
.
As a result, I avoid the need to write an additional handler for constructs like if
. If the expression result (in this particular case a-2
) equals zero, the program does not go into if
, i.e. ignores the rest of the string (goto
).
Done with if
. Then I process other commands, operators, and functions. I start from skipping extra spaces added by the programmer for convenience purposes. If a string contains nothing but spaces, it is ignored.
If the string is not empty, I examine it in more detail. First, I go through the @@statements
table and compare the string with each of its records. To do this, I read the string size (in case of run
, it is 3) and then perform the comparison using repe /
.
If a match is detected, then the DI
register starts pointing to the address of the respective handler, and I jump there: jmp [
. To get a better understanding of how it works, check the end of the article and review the @@statements
table. Hint: labels starting with @@
are the handlers’ addresses.
If the entire table has been checked and no matches identified, then the current string of the program is neither a command, nor an operator, nor a function. Perhaps, this is a variable’s name? I jump to @@to_get_var
to check this.
I scroll through the DI
register to the next record of the table by adding CX
(length of the name of the current command, operator, or function plus two more bytes (the handler’s address)). Then I restore the value of the SI
register (rep
has scrolled it forward) so that it points again to the beginning of the string compared with the table of operators.
Now DI
points to the next record in the table. If this is a nonzero record, I jump to @@next_entry
to compare the program string (to be specific, its beginning) with this record.
If the entire table has been checked and no matches identified, then the current string of the program is neither a command, nor an operator, nor a function. If so, this is likely an assignment construct (e.g. var=expr
). In theory, no more variants are possible (unless the source code contains a syntax error).
Now I have to calculate the expr
expression and save the result at the address linked with the var
variable. The get_variable
subprogram calculates the required address and pushes it into the stack.
After finding the address, I check whether an assignment operator exists after the variable name. If yes, it must be executed. But in order to save a few bytes, I won’t do this right here.
Later in the code, I will have to implement an assignment inside the input
function. So, I jump to that piece of code: @@assign
. I don’t need the entire input
function, only its final part. And I won’t return to execute_statement
anymore: the input
function will perform the required ret
instruction.
If there is no assignment symbol, I print the error notification and terminate the program (i.e. jump to @@main_loop
). In the main loop, the interpreter will restore the stack pointer and continue its operation despite the encountered syntax error.
The “list” command
The list
command displays on the screen the program listing saved in the buffer.
First, I reset to zero the current string number in the program: xor
. Then I use the string number to calculate the address from where to read the program string: find_address
. After finding the string address, I compare its first symbol with 0x0D
(i.e. check whether the string is empty). If it’s empty, I move to the next string.
If the string is not empty, I display it on the screen. First, I display its number: output_number
. Then I read the string character-by-character from the buffer until I stumble upon 0x0D
. In a similar way, I display it on the screen character-by-character.
After that, I move to the next string of the program (inc
) and repeat the same operations until I reach max_line
(i.e. 1000).
The “input” function
The input
function enables you to enter a number from the keyboard. It works as follows.
First, the interpreter calculates the address of the variable specified in the program. Then it displays a question mark and waits until the user prints something.
In case the user inputs not just a number but a sophisticated expression, the interpreter processes the inputted string with process_expr
and saves the result at the address calculated in the beginning of the subprogram. The stosw
instruction will assist me in this.
Processing expressions
A three-level processing scheme will be applied to expressions:
- Addition and subtraction;
- Multiplication and division; and
- Nested expressions (in brackets), names of variables, and numbers.
At the first level (whose priority is the lowest), I immediately pass the control to the second level by calling expr2_left
. After processing higher-priority operations, I look at the next symbol. If this is an addition or subtraction sign, I process it.
First, I save the left value: push
. Then I pass the right part of the expression to the second level: expr2_right
. After performing higher-priority operations, I take the left value (pop
) and perform the required operation with the right value: add
or sub
.
Finally, I jump to @@next_sub_add
and launch a cycle to calculate expressions like 1+2+5-4
.
At the second level, I perform the same operations as described above. First, I pass the control to the higher-priority level by calling expr3_left
. Then I look at the next symbol. If this is a multiplication or division sign, I process it. Similarly to the previous level, I launch a cycle (@@next_div_mul
) in the end to process expressions like 3*4*2/
.
Important: thanks to the three-level structure described above, the interpreter automatically takes into account the priority of operations. For instance, 5*6+3*4
is calculated as (
.
At the third layer, I process brackets (nested expressions), names of variables, and numbers.
First, I remove spaces from the inputted string. Then I look at its left symbol. If this is an opening bracket, I start recursion by calling process_expr
. After processing the nested expression, I check whether it has a closing bracket. If not, I display the error message. If the closing bracket exists, I skip spaces going after it and execute ret
.
But what if some other character (i.e. not an opening bracket) is present on the left? Perhaps, this is a variable’s name? Checking this: cmp
.
If my assumption is confirmed, I start reading the variable’s value. If not, then the current symbol is a fragment of a number. I make a step back (dec
) and call dec_str_to_number
to read the number.
Subprogram: calculate variable’s address on the basis of its name
This subprogram has two entry points. At the first entry point (get_var
), I read the variable’s letter using lodsb
; at the second one (get_var_2
), I use the variable’s name transmitted via the AL
register. Yes, this is a pretty twisted way, but it saves me a few bytes.
Now that I have the variable’s name, I should calculate the address it’s linked with. This operation is performed in three steps:
- I execute
and
to get the variable’s number; at this point, ASCII valuesal, 0x1F 0x61-0x7A
are converted into0x01-0x1A
; - I multiply the result by 2 because each variable uses a 16-bit word in the memory; and
- I add to the address its upper part:
mov
.ah, vars>> 8
Voila! I know the memory cell the variable is linked with.
Note that this code is interlinked with the subprogram that skips spaces. The subprogram also has two entry points. The first one (skip_spaces
) skips spaces located after the variable; the second one (skip_spaces_2
) performs the same operation but leaves the first space intact.
Subprogram: print decimal numbers
The subprogram prints on the screen numbers transmitted via the AX
register. It works as follows: AX
is recursively divided by 10. After each operation, the remainder in division is pushed into the stack. After reaching zero, I start sequentially exiting the recursion. Prior to each exit, I pop the respective reminder from the stack and display it on the screen.
A few examples. If AX
, then after the division by 10, AX
will contain 0, and the output_number
won’t start a recursion. Instead, it will just display the reminder (i.e. 4), and that’s it.
If AX
, then after the division by 10, AX` will contain 1. Accordingly, the subprogram starts a recursion. It displays 1 there, then passes from the internal call into the main one, and prints figure 5 there.
Important: (1) jmp
is present in the end; and (2) the program doesn’t have its own ret
instruction. This is another hack enabling me to save a few bytes. I don’t call the subprogram displaying the character but jump to it instead. Then this subprogram executes ret
.
Subprogram: from decimal string to hexadecimal number
The subprogram translates a string containing a decimal number (the SI
register points to it) and writes the result to the AX
register.
A decimal string is converted into a hexadecimal number as follows. I use a cycle to go through all digits from left to right. Every time I multiply the accumulator by 10 (at the first time, it contains zero) and add the current digit to it. If I encounter in the string a character that is not a digit, an error occurs, and I exit.
Note that two operations are present between cmp
and the conditional transfer. So, the final result is always written to the AX
register, even in the case of an error.
“Run/goto” commands
Run
и goto
commands launch the program saved in the buffer. Run
launches it from the beginning, while goto
, from a certain line. In the interpret, both commands are processed by the same subprogram.
When the interpreter sees the run
command, it converts it into goto
first and then handles it similarly to goto
. To convert run
into goto
, the interpreter resets the AX
register to zero and jumps to @@goto_handler
.
I start executing the goto
command from calculating the expression written after goto
. The result (i.e. number of the line to jump to) is written to AX
.
Then I use the line number to calculate its address: find_address
.
Note that the goto
command can be used both in the interactive and command line modes (i.e. it can be used as an operator or entered from the keyboard for immediate execution).
A comparison instruction called cmp
checks whether I have got here from the interactive mode or from the command line mode.
If I got there from the command line mode, I write in the running
variable the address from where the program execution should continue. At the next cycle, the interpreter will retrieve the required string from this address and execute it.
But if I get there from the interactive mode (the programmer uses goto
, not run
, to enter the program), then I jump to the program string specified in goto
.
Below is a subprogram that calculates the string address in the program code by its number. The string number is saved in AX
; I multiply it by the standard string length and add the address of the first string in the program. This is how the required address is calculated.
Subprogram: input strings from the keyboard
The interpreter must be able to accept program strings inputted from the keyboard. The input_line
subprogram is responsible for this. At the input, it receives the command line symbol (>
) via the AL
register.
First, I display the command line. Then I point to the buffer where the inputted text should be saved. Next, I read the character inputted from the keyboard and save it in the buffer (provided that this is not the Backspace key (0x08
)). If this is Backspace, I reduce the DI
register by one so that it points to the previous character.
The “print” function
The print
function must be able to understand various syntax variants:
-
print
moves cursor to the next line; -
print
prints text and moves cursor to the next line;"Hello" -
print
prints text and leaves cursor on the current line;"Hello; " -
print
prints number and moves cursor to the next line; and5 -
print
prints number and leaves cursor on the current line.5+2;
The first comparison, cmp
, checks whether this is the first syntax variant (no arguments).
The second comparison, cmp
, checks whether this is one of the two syntax variants used to print text. In this case, the program displays all characters one-by-one until it stumbles upon a closing quote or 0x0D
. If it encounters 0x0D
, this means that the programmer forgot to enter the second quote. No error messages are displayed in this situation.
Then I check whether a semicolon is present after the quote. If not, the cursor moves to the next line.
At the label @@no_quote
, the program processes the two syntax variants used to print numbers. If there is an expression instead of a number, it is calculated. Then the program prints its result and checks whether a semicolon is present after it. If not, the cursor moves to the next line.
Subprograms: input and print characters
The input_key
subprogram reads the pressed key using BIOS interrupt 0x16
, function 0x00
. The ASCII code of the pressed key is written to the AL
register.
Important: this program overlaps with the next one; therefore, it does not have the ret
instruction in the end. Why use such tricks? To display the character received from the keyboard without spending extra bytes.
The output_char
subprogram first checks whether the Enter key is pressed: cmp
. If yes, then 0xA
is printed in addition to the 0xD
character. Without 0xA
, the cursor moves left without jumping to the next line. The characters are printed using BIOS: interrupt 0x10
, function 0x0E
.
Table of commands, functions, and operators
The next step is to create a table of commands, functions, and operators understandable to the interpreter. Each record consists of three fields: string length; command/function/operator name; and entry point for the processing subprogram. Zero in the end of the table notifies the interpreter that the table is over.
The above structure makes it easy to add new commands, functions, and operators. You just add one more line, then insert the processing subprogram in a suitable location, and that’s it. No more changes have to be made in the code.
The last two bytes notify BIOS that the code written in the boot sector is not some nonsense – but an interpretable program (in this particular case, a BASIC interpreter).
Testing the interpreter: the Pascal’s triangle program
Pascal’s triangle is a triangular array of binomial coefficients. The upper row contains figure 1. Each entry of each subsequent row is constructed by adding the number above and to the left with the number above and to the right. Below is the source code to be fed to the interpreter.
If you have done everything right, the interpreter should display something like this.
The interpreter works correctly? If yes, congrats! You have just created a fully featured BASIC interpreter. You may also want to expand its capacity by adding new functions, for instance:
- add the
cls
command (clear the screen); - add
gosub
andreturn
operators to be able to write subprograms; and - implement the support of data arrays.
Important:
The article is in English but unfortunately the comments in the pictures are in Russian 🙁
did you mean to publish the source code or not. I couldn’t find the link. anyway great article!
It was really a huge huge help. Thanks for sharing.
Any source to this?