How to extend BASIC

The easy way - the usr functions, calls and variables

There are a few predefined data structures and function to extend BASIC. This can be useful if specific hardware features should be addressed. PORT access on an Arduino comes to mind as a usecase.

The user variable @U

@U is a predefined variable. It can be accessed in BASIC by assigning or reading values from it.

@U=10

A=@U

would be typical commands. Every time @U is read the function getusrvar() is called. Every time @U is written setusrvar() is called. You can add any functions you like here.

The user array @U()

This array works much like the variable @U. It every time an array element is accessed, it the functions getusrarray() and setusrarray() are called. Accessing memory outside BASICs own memory would be a possible usecase.

The user string @U$

This string is read only. Every time the string is read, makeusrstring() is called. A maximum of 32 characters can be transported from any low level function to BASIC this way.

makeusrstring() cannot be used to trigger I/O operations. It may be called serveral times during one string operation. The CALL or USR mechanisms should be used for this.

The user CALL

CALL is a low level call functions. BASIC syntax is

CALL n

with no arguments. Values from 0 to 31 are reserved for BASIC internal work. Currently only 0 is implemented. All values greater than 31 are passed to usrcall(). Out your won code here. One possible use case could be to trigger a low level I/O operation and then read the result with @U$.

The user function USR

USR() is the low level call function. BASIC syntax is

A=USR(n, v)

n is the function and v is a value. n will be interpreted as unsigned integer while v is a number. n values from 0 to 31 are reserved. All USR calls with n greater than 32 are passed to usrfunction(). You can put your own code here.

Stability and yielding

BASIC is very robust. You can do what you like in the user functions as long as you take a few precautions.

Platforms like the ESP8266 and some other systems using networking need enough time for background tasks. Otherwise they crash. Always call byield() in any tight loop, like for example if you wait for I/O for a long time. As a rule of thumb, byield() should be called every 32 ms. In platforms that don't need this, byield() is no operation.
Never change the core interpreter variables handling the token stream like token, here, himem and top if you don't know what you are doing.
Always access the BASIC memory through memread2() and memwrite2(). Never use mem[] or memread() unless you have understood the memory interface logic.

Dirty tricks

Two data structures in BASIC are well suited to pass arguments the quick and dirty way.

Single letter variables are stored in the array vars[]. var[0] is variable A and so forth. This array can be accessed safely in any of the user functions without the complexities of heap access.

The input buffer variable ibuffer can be used to pass strings to a user function. This only works in programs and not interactively. A typical program would look like this:

10 @$="hello world"

20 CALL 32

The first line puts the string "hello world" into the ibuffer variable, setting the first byte of ibuffer to the string length. After that the characters are stored. The CALL command would do usrcall(). This function can then take the argument from ibuffer.

Please keep in mind that ibuffer is used for almost all BASIC I/O operations. The call has to come immediately after the assignment.

The slightly harder way - add your own commands and functions

Let's assume you would like to add a new command or function MYCOMMAND to the interpreter.

Step 1: Create the tokens

To add your own command you would first need to create a token. Review the definition section of basic.h. Tokens are numbered from -127 for NUMBER on upward up to -24 for SLEEP. This may change in the future as I will add a few more commands later on. To add another command MYCOMMAND after SLEEP, first add a token definition line.

#define TMYCOMMAND -23

Token values have to be consecutive. It has to be -23 in our example. I used the convention that keyword tokens begin with a T.

Step 2: Create a keyword string

You then will have to add the keyword string as

const char smycommand[] PROGMEM = "MYCOMMAND";

If you add it outside any #ifdef ... #endif definition, it will always be there. Otherwise you can add it to any of the language sets and it will only be there is the language set is activatd.

Step 3: Add the keyword string to the string storage

The keyword has to added at the end of

const char* const keyword[] PROGMEM = {..., smycommand, 0}

Again, order matters here. Mind the #ifdef ... #endif language set sections. Either consistently in one or outside of one. The keyword for -23 has to come after the keyword for -24.

Step 4: Add the token to the token array

The token then needs to added to the token array, again in the right order.

const signed char tokens[] PROGMEM = {..., TMYCOMMAND, 0}

Again, mind the #ifdefs.

After these four steps the lexical analyser will recognise the token and convert the string MYCOMMAND to the token -23 whenever it is found in the input stream. This is so complicated because the code is made for Arduino PROGMEM. This is a bit weird as Arduinos are Harvard machines. The address space of the flash memory is different from the RAM address space.

The lexer has a little peculiarity. It matches strings and once it has found a match it tokenises. This means that command strings have to be unique from the start. You could not add the commands MYCOMMAND and MYCOMMANDA in the interpreter as the latter would be tokenised to MYCOMMAND A.

Step 5a: Create a command worker function and connect it.

For a new command, create a worker function and then add the worker function to the statement loop. Worker functions are defined in basic.c and prototyped in basic.h.

Your worker function that takes no argument and simply prints "Hello World" would look like this:

void xmycommand() { outs("Hello World\n"); nexttoken(); }

It would appear in the statement() function like this:

case TMYCOMMAND: xmycommand(); break;

Worker functions are always void and get no argument. They take all their information from the interpreter state variables. Look at xset() or xget() to see how this works.

Worker functions always need to call nexttoken() before returning to keep the interpreter going. A fresh token needs to be stored in token for the statement loop not to get trapped in an infinite loop.

There is one pitfall. The function expression() and functions using it like parsearguments() already have nexttoken() called at the end. If you use expression() in a worker function you should not call nexttoken() at the end of the worker function.

Step 5b: Create a function worker function and connect it.

If you want to create a function to be evaluated in expression, you need to add it to factor().

This would look like this

case TMYCOMMAND: parsefunction(xmycommand, 1); break;

Unlike commands in statement, functions do not call nexttoken() before returning, as they are part of the recursive decent parser and follow their own rules.

A function that would return two times the argument would look like:

void xmycommand(){ x=pop()*2; push(x); }

Functions used in factor always need to get all their arguments from the stack and then keep exactly one argument on the stack as a result.

A command with two numerical arguments would be connected like this:

case TMYCOMMAND: parsefunction(xmycommand, 2); break;

It would then have to do something like this with the stack for a function that calculates f(x,y)=2*y+x.

void xmycommand(){ x=pop()*2+pop(); push(x); }

Two pop(), one push(). A variable number of arguments is possible but a little bit more complicated to implement.

Debugging the interpreter after adding a command, error handling, segmentation faults, and other nasty things.

Using the command

set 0,1

switches on the build in debug mode of the statement loop. The token stream is logged to the console after each command.

If the interpreter hangs after a command, it may be that nexttoken() is not called before returning. Commands functions always must make sure that a new token is created before returning. If commands are swallowed and not executed, nexttoken() is called too often. Functions called in factor should not generate new tokens calling nexttoken(). The recursive decent parser code takes care of this.

Errors have to reported by calling the error() function. It takes a valid error number as an argument. EUNKNOWN is a good choice for a starter. This throws a syntax error. The global variable er contains the error state. It can and has to be checked after calling functions that can generate errors. If an error is produced from deeper down, simply return. statement() resets the error status after each iteration of the loop and then changes interpreter state to interactive.

Segmentation fault crashes usually mean that the keyword array is messed up. Keyword and token array have to match exactly. Token values need to be consecutive. A second source for segmentation faults is memory access and wrong string code.

Stack overrun and underrun errors usually mean that you have pop()ed or pushed() too many or too few arguments on the stack. All functions need to keep the stack clean. There is no stack reset mechanism except after error() when going to interactive.

Crashes can also happen when there is not enough heap memory. Commands should use a minimal set of local variables and should not use the C stack for recursion. If you need more memory for the heap. Either change the offset parameters in the freememory mechanism or set MEMSIZE by hand.

Doing I/O - create your own I/O stream.

BASIC I/O is done through I/O streams. They are numbered from 1 to 127. 0 is currently not assigned and should not be used. All numbers less than 32 are reserved. From 32 onward you can define your own I/O subsystems.

Here is how this is done. Basic does (almost) all I/O through the functions inch(), outch(), checkch(), availch(), ins(), and outs(). All for them are essentially big switch-case statements. For a new I/O subsystem, methods are to be implemented for these functions.

Three steps are needed for this

Step 1: define a stream number

To create a stream, define a stream number. In basic.h there is a section with the stream numbers

#define OSERIAL 1

#define ODSP 2

#define OPRT 4

#define OWIRE 7

#define ORADIO 8

#define OMQTT 9

#define OFILE 16

#define ISERIAL 1

#define IKEYBOARD 2

#define ISERIAL1 4

#define IWIRE 7

#define IRADIO 8

#define IMQTT 9

#define IFILE 16

Add an identifier for the output and input and give it a number. If your stream is MYIO and you want to use the stream number 32, define

#define IMYIO 32

#define OMYIO 32

This may seem a bit redundant as we define the same constant twice.

Step 2: Implement stream functions for byte streams

In hardware-arduino.h and hardware-posix.h implement the functions

void myiobegin(): start the I/O stream

mem_t myioread(): read one byte

myiowrite(mem_t): write one byte

mem_t myiocheckch() : check is there is one byte, return it but keep it in the buffer (this function is optional and can be empty, it must return a 0 if it is not needed)

mem_t myioavailable(): the number of available characters. It must return 1 if BASIC is supposed to read something.

if your stream is byte oriented I/O. Byte oriented I/O means that the stream can be read byte by byte like a serial interface. Add these function to inch(), outch(), checkch(), availch() using OMYIO and IMYIO in the case statement. If your I/O stream needs starting i.e. myiobegin() is not empty, add the function to ioinit(). 'myiocheckch()' is optional and only used if the stream is a primary input stream like a keyboard.

Now you are good to go. Compile BASIC and you have your own I/O stream which can be accessed with the identifier &32. INPUT will read characters and wait for a newline to return. PRINT will send date and terminate it with a newline. PUT and GET will write or read individual bytes from the stream. AVAIL will show the number of free bytes.

If there is an I/O error, the function should set ert to a value other than zero and then return with a zero result without calling the error() function.

Step 3: Implement stream functions for block input

Not all I/O operations are done as a byte stream. Wire is a good example. It reads and writes blocks of data at once. If you want to integrate a stream like this, you need to implement

myioins(char *b, uint8_t l): read a number of characters, maximum is l

myioouts(char *b, uint8_t l): write l characters

as well. These functions have a buffer as argument. It contains the data. l tells the function the maximum buffer size or the number of character to be written. The myioins() method must set the first element of the buffer to the number of bytes received. Payload begins at index 1. It also must leave this value in z.a. The myioouts() method can start to output from the first buffer element onwards and must write all l bytes before it returns.

If there is an I/O error, the function should set ert to a value other than zero and then return without calling the error() function. Leave zero in the buffers first element on read and set z.a to zero.

Always keep the buffer unchanged on write. The buffer content is a direct point to BASIC memory.

Hook the two functions into ins() and outs() case statements with IMYIO and OMYIO respectively.

If your I/O subsystem has a buffer that needs flushing, implement a myioclose() command and hook it to the BASIC xclose() and xcall() function just where the file I/O close is implemented. If your I/O subsystem needs an open command, implement an myioopen() function and hook it to the xopen() command just where the file I/O is.

Rules of the game

If you wait in a tight loop for I/O always call byield() in the loop. This is absolutely needed on ESP systems no matter if they do network or not. All background functions are tied to byield().
Implement interrupt functions for I/O where needed - look at the wire or picoserial streams to see how it is done.
Never change buffers handed down from BASIC for write. They are directly going into BASIC memory.
Never change and interpreter state variable in an I/O operation. State variables are all the global variables defined in basic.h. As long as you keep then intact, BASIC will continue to run stably.
Keep in mind that the heap in BASIC is very shallow. Do not use deep recursion or large temporary buffers in I/O function. If you need them, define them as static global variable instead.

Integrate a new filesystem

For the integration of new file systems, please consult the filesystem page of the wiki.

Integrate non BASIC code

If BASIC is to be combined with non BASIC code, loop() cannot be used to to this. The Arduino loop() function is used to control BASIC's run modes and will block most of the time. Use bloop() at the end of the code for all non BASIC code that needs to run in the background. bloop() is called at least every 20 microseconds or faster. Please note the comments in the code about the use of bloop(). No BASIC functions should be called there in order to avoid an infinite loop and stack overflow. Communications with BASIC has to happen through variables and USR functions.

bsetup() can be used to start non BASIC subsystems and allocate memory. It is called after BASIC I/O init and before BASIC memory allocation. No BASIC I/O subsystem should be restarted there. If BASIC uses Serial, SPI or Wire, it starts it. Only non BASIC I/O subsystems should be started there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly