Whitespace tutorial

This is not intended to be a language reference, but rather an informal introduction to the whitespace language. The closest thing to a formal specification is the implementation itself! Look in VM.hs in the distribution for the operational semantics.

The only lexical tokens in the whitespace language are Space (ASCII 32), Tab (ASCII 9) and Line Feed (ASCII 10). By only allowing line feed as a token, CR/LF problems are avoided across DOS/Unix file conversions. (Um, not sure. Maybe we'll sort this in a later version.).

The language itself is an imperative, stack based language. Each command consists of a series of tokens, beginning with the Instruction Modification Parameter (IMP). These are listed in the table below.

IMPMeaning
[Space]Stack Manipulation
[Tab][Space]Arithmetic
[Tab][Tab]Heap access
[LF]Flow Control
[Tab][LF]I/O

The virtual machine on which programs run has a stack and a heap. The programmer is free to push arbitrary width integers onto the stack (only integers, currently there is no implementation of floating point or real numbers). The heap can also be accessed by the user as a permanent store of variables and data structures.

Many commands require numbers or labels as parameters. Numbers can be any number of bits wide, and are simply represented as a series of [Space] and [Tab], terminated by a [LF]. [Space] represents the binary digit 0, [Tab] represents 1. The sign of a number is given by its first character, [Space] for positive and [Tab] for negative. Note that this is not twos complement, it just indicates a sign.

Labels are simply [LF] terminated lists of spaces and tabs. There is only one global namespace so all labels must be unique.

Stack Manipulation (IMP: [Space])

Stack manipulation is one of the more common operations, hence the shortness of the IMP [Space]. There are four stack instructions.

CommandParametersMeaning
[Space]NumberPush the number onto the stack
[LF][Space]-Duplicate the top item on the stack
[Tab][Space]NumberCopy the nth item on the stack (given by the argument) onto the top of the stack
[LF][Tab]-Swap the top two items on the stack
[LF][LF]-Discard the top item on the stack
[Tab][LF]NumberSlide n items off the stack, keeping the top item

The copy and slide instructions are an extension implemented in Whitespace 0.3 and are designed to facilitate the implementation of recursive functions. The idea is that local variables are referred to using [Space][Tab][Space], then on return, you can push the return value onto the top of the stack and use [Space][Tab][LF] to discard the local variables.

Arithmetic (IMP: [Tab][Space])

Arithmetic commands operate on the top two items on the stack, and replace them with the result of the operation. The first item pushed is considered to be left of the operator.

CommandParametersMeaning
[Space][Space]-Addition
[Space][Tab]-Subtraction
[Space][LF]-Multiplication
[Tab][Space]-Integer Division
[Tab][Tab]-Modulo

Heap Access (IMP: [Tab][Tab])

Heap access commands look at the stack to find the address of items to be stored or retrieved. To store an item, push the address then the value and run the store command. To retrieve an item, push the address and run the retrieve command, which will place the value stored in the location at the top of the stack.

CommandParametersMeaning
[Space]-Store
[Tab]-Retrieve

Flow Control (IMP: [LF])

Flow control operations are also common. Subroutines are marked by labels, as well as the targets of conditional and unconditional jumps, by which loops can be implemented. Programs must be ended by means of [LF][LF][LF] so that the interpreter can exit cleanly.

CommandParametersMeaning
[Space][Space]LabelMark a location in the program
[Space][Tab]LabelCall a subroutine
[Space][LF]LabelJump unconditionally to a label
[Tab][Space]LabelJump to a label if the top of the stack is zero
[Tab][Tab]LabelJump to a label if the top of the stack is negative
[Tab][LF]-End a subroutine and transfer control back to the caller
[LF][LF]-End the program

I/O (IMP: [Tab][LF])

Finally, we need to be able to interact with the user. There are IO instructions for reading and writing numbers and individual characters. With these, string manipulation routines can be written.

The read instructions take the heap address in which to store the result from the top of the stack.

CommandParametersMeaning
[Space][Space]-Output the character at the top of the stack
[Space][Tab]-Output the number at the top of the stack
[Tab][Space]-Read a character and place it in the location given by the top of the stack
[Tab][Tab]-Read a number and place it in the location given by the top of the stack

Annotated Example

Here is an annotated example of a program which counts from 1 to 10, outputting the current value as it goes.

[Space][Space][Space][Tab][LF] Put a 1 on the stack
[LF][Space][Space][Space][Tab][Space][Space] [Space][Space][Tab][Tab][LF] Set a Label at this point
[Space][LF][Space]Duplicate the top stack item
[Tab][LF][Space][Tab]Output the current value
[Space][Space][Space][Tab][Space][Tab][Space][LF] Put 10 (newline) on the stack...
[Tab][LF][Space][Space]...and output the newline
[Space][Space][Space][Tab][LF]Put a 1 on the stack
[Tab][Space][Space][Space]Addition. This increments our current value.
[Space][LF][Space]Duplicate that value so we can test it
[Space][Space][Space][Tab][Space][Tab][Tab][LF] Push 11 onto the stack
[Tab][Space][Space][Tab]Subtraction. So if we've reached the end, we have a zero on the stack.
[LF][Tab][Space][Space][Tab][Space][Space] [Space][Tab][Space][Tab][LF]If we have a zero, jump to the end
[LF][Space][LF][Space][Tab][Space] [Space][Space][Space][Tab][Tab][LF] Jump to the start
[LF][Space][Space][Space][Tab][Space] [Space][Space][Tab][Space][Tab][LF] Set the end label
[Space][LF][LF]Discard our accumulator, to be tidy
[LF][LF][LF]Finish

What could be simpler? The source code for this program is available here. Have fun!


e.c.brady@dur.ac.uk