WsAsm

Turns Whitespace assembler into executable Whitespace.

Writing Whitespace is hard. This assembler makes it a little bit easier.

Skip table of contents

Contents

Usage

java WsAsm [options] myfile.wsa ...

...or, if you use the scripts in the bin directory...

wsa [options] myfile.wsa

This will output Whitespace to myfile.ws

You can assemble multiple files.

Options

-s #!/path/exe
Lets you include a shebang line so that your whitespace program will be correctly executed if you run it from a suitable shell. Don't include any spaces in the shebang. If you need spaces, make the shebang #!/path/scriptName instead.
-x
Enables extensions to the original Whitespace language.

Examples

Simple example

# hash is a comment, blank lines are ok

# push two numbers on to the stack
push 1
push 2

# Add them up and print the result
add
printn

end # The tutorial says you should always end the program

Heap usage example

This example program will print "!" (in a round-about way).

# A symbol for the character to printg (just to show how symbols are used)
@pling '!'

# A macro to simplify writing a message
$print "About to print \"!\"...\n"

# Now the powerful business logic
push 1      # the size of the requested memory block
$malloc     # allocates a block, gets the address of the first element
dup         # duplicate the address - we'll need it twice
push @pling # add character to stack (actually, it's ascii value)
store       # write the character to the allocated address
            # Note: the character is now on the heap, and does not exist in the stack
retrieve    # get the character back (why did we bother writing it to the heap? because it's an example!)
printc      # print the character
end         # the spec says to always do this

# Oops - we didn't $free after $malloc

Counting example

This is the example program from the original Whitespace tutorial (with the original comments) but expressed as WsAsm assembler. It counts to 10.

push 1   # Put a 1 on the stack
:C       # Set a Label at this point
dup      # Duplicate the top stack item
printn   # Output the current value
push 10  # Put 10 (newline) on the stack...
printc   # ...and output the newline
push 1   # Put a 1 on the stack
add      # Addition. This increments our current value.
dup      # Duplicate that value so we can test it
push 11  # Push 11 onto the stack
sub      # Subtraction. So if we've reached the end, we have a zero on the stack.
jumpz :E # If we have a zero, jump to the end
jump :C  # Jump to the start
:E       # Set the end label
drop     # Discard our accumulator, to be tidy
end      # Finish

Fibonacci example

Note: this generates the Whitespace example given elsewhere.

#-----------------------------------
# Calculates some Fibonacci numbers
#-----------------------------------

# Store counter to heap
push 1      # [1]
push 0      # [0,1]
store       # [] {1=0}

# Put starter number on stack & print it
push 1      # [3]
dup         # [3,3]
call :prn   # [3]

# Put seecond number on stack
push 1      # [5,3]

:next
# Print the newest number
dup         # [5,5,3]
call :prn   # [3,5]

# Add the last two numbers
swap        # [5,3]
copyn 1     # [5,3,5]
add         # [8,5]

# Get the counter
push 1      # [1,8,5]
dup         # [1,1,8,5]
retrieve    # [4,1,8,5]

# Increment it & check counter
push 1      # [1,4,1,8,5]
add         # [5,1,8,5]
dup         # [5,5,1,8,5]
# Arbitrarily stop after 16 numbers
# Try 100 or 1000  8~)
push 16     # [16,5,5,1,8,5]
swap        # [5,16,5,1,8,5]
sub         # [11,5,1,8,5]
jumpn :done # [5,1,8,5]
store       # [8,5]
jump :next

:done
end

:prn
printn
push '\n'
printc
return

Assembler reference

Encoding

The assembler assumes that source files are encoded in UTF-8 (and therefore 7 bit ASCII will also work).

Comments and white space

Comments begin with # and end at a line break.

You can place comments on a line of their own, or after an instruction.

Blank lines are OK - they are ignored.

Standard operations

The standard operations of the Whitespace language are implemented as the following assembler instructions. The descriptions are copied from the original Whitespace tutorial, with added notes. The abbreviation IMP is from the original tutorial, and stands for Instruction Modification Parameter.

Stack manipulation IMP

push n
Push n onto the stack
Note: n is a decimal number or a character in single quotes. Here "character" means any Unicode code point, encoded in UTF-8.
dup
Duplicate the top item on the stack
copyn n
Copy the nth item on the stack (given by the argument) onto the top of the stack
Note: n is a decimal number.
swap
Swap the top two items on the stack
drop
Discard the top item on the stack
slide n
Slide n items off the stack, keeping the top item
Note: n is a decimal number.

Arithmetic IMP

add
Addition
Replaces the top two items on the stack with their sum.
sub
Subtraction
Replaces the top two items on the stack with their difference (first pushed minus second).
mult
Multiplication
Replaces the top two items on the stack with their product.
div
Integer Division
Replaces the top two items on the stack with their quotient (first pushed divided by second).
Note: this uses "floored" division, not "truncated" division (see Wikipedia discussion)
which, for negative inputs, produces results different to the / operator in some languages.
See also $truncdiv
mod
Modulo
Replaces the top two items on the stack with their division remainder.
Note: this uses "floored" division, not "truncated" division (see Wikipedia discussion)
which, for negative inputs, produces results different to the % operator in some languages.
See also $truncmod

Heap access IMP

store
Store
Writes an integer to the heap
Push the address, then the value, then use this instruction.
The address and the value are removed from the stack.
retrieve
Retrieve
Fetches an integer to the heap
Push the address, then use this instruction.
The address on the stack is replaced by the value.

If extensions are enabled, an extra instruction is available.

x-dump
Prints the instruction pointer, the stack and the heap for debugging purposes.

Flow IMP

:label
Mark a location in the program
Note: Labels are prefixed with : and in the next position $ and % are reserved for macros. Any ASCII character is allowed in a label.
call :label
Call a subroutine
Note: Labels are prefixed with : when used, just the same as when defined.
jump :label
Jump unconditionally to a label
jumpz :label
Jump to a label if the top of the stack is zero
jumpn :label
Jump to a label if the top of the stack is negative
return
End a subroutine and transfer control back to the caller
end
End the program
If extensions are enabled, this instruction takes a parameter which is the result to return from the program.

I/O IMP

printc
Output the character at the top of the stack
The character is removed from the stack.
See also $print, $printstr
printn
Output the number at the top of the stack
The number is removed from the stack.
readc
Read a character and place it in the location given by the top of the stack
Push an address, then use this instruction.
When the end of the input stream has been reached, using this instruction will cause an error, unless extensions are enabled, in which case it will return -1.
See also $readstr
readn
Read a number and place it in the location given by the top of the stack
Push an address, then use this instruction.
When the end of the input stream has been reached, using this instruction will cause an error.
See also $readstr, $str2int and $str2fl

If extensions are enabled, extra instructions are available.

x-args
Read command line parameters as an integer stream
The first integer is the number of parameters. Subsequent integers reresent the characters of the parameters, with zero as a separator/terminator.
If you read too many times, you will get an error.
See $x-args which simplifies the use of this instruction.
x-readfile
Read from a file
Behaves differently depending on the top of the stack.
x-writefile
Write to a file
Behaves differently depending on the top of the stack.
x-closefile
Close a file
Push a file handle, then use this macro.
If you try to close an invalid file handle, this will be reported to stderr, but the program will continue.

Built-in macros

There are some built-in macros which generate some code for you. The generated code is standard Whitespace. We're just making some things a bit easier.

When you use them they look like other instructions, but begin with $.

Although these macros have been tested and should work as described, bugs may lurk.

Output

$printc
Prints a char from the stack, without losing it from the stack, followed by a line break.
May be helpful as debug output.
$printn
Prints a number from the stack, without losing it from the stack, followed by a line break.
May be helpful as debug output.
$print "String\n"
Prints a literal String. The trailing line break is optional.
$printstr
Prints a string from the heap.
Push the address, then use this macro.
See strings section for how to create and manipulate strings.
$printfl
Prints a floating point number.
Push the address, then use this macro.
See floating point numbers section for how to create and manipulate floats.

Input

$readctostack
The readc instruction reads to the heap. This is a shortcut to put the value on the stack instead.
$readntostack
The readn instruction reads to the heap. This is a shortcut to put the value on the stack instead.
$readstr
Reads an input string to the heap.
Leaves its address on the stack.
The string read ends at a line break.
When the end of the input stream has been reached, using this macro will cause an error, unless extensions are enabled, in which case it will return -1.
See strings section for how to manipulate strings.
$readfl
Reads an input floating point number to the heap.
If successful, leaves the new address and a zero on the stack.
If unsuccessful, leaves a non-zero value on the stack.
So you should test for success with jumpz.
See floating point numbers section for how to manipulate floats.

If extensions are enabled, an extra macro is available.

$x-args
Leaves the address on the stack of an array of addresses of the arguments. This array and the tokens it points to are all in newly allocated memory.

Strings

Strings cannot be put on the stack. The stack can only ever hold the address of a string on the heap.

Note: see also $readstr and $printstr

$str2int
Parses an integer from a string.
Push the string address, then use this macro.
If successful, leaves the result and a zero on the stack.
If unsuccessful, leaves a non-zero value on the stack.
So you should test for success with jumpz.
$int2str
Turns an integer into a string in decimal.
Push the integer, then use this macro.
Leaves the address of the string on the stack.
$strlen
Gets the length of a string.
Push the string address, then use this macro.
Leaves the length on the stack.
$strlens
Gets the length of a string up to a separator character.
Push the string address first, then the separator, then use this macro.
Leaves the length on the stack.
$storestr "String"
Puts a literal string onto the heap.
Leaves its address on the stack.
$strcmp
Compares two strings.
Push two addresses, then use this macro.
If they match, leaves zero on the stack, or non-zero otherwise.
$strcpy
Copies a string to another place in the heap.
Push the target address, then the source address, then use this macro.
Leaves the target address on the stack.
Danger! No check is made that there is enough space at the target address.
$strncpy
Copies a fixed number of characters from one string to another.
Push the target address, then the source address, then the number of characters, then use this macro.
Leaves the target address on the stack.
If the source is shorter than the requested number of characters, it is right-padded with 0. Otherwise, no null terminator is copied.
Danger! No check is made that there is enough space at the target address.
$strcat
Writes the concatenation of two strings to the heap.
Push the address of the first string, then the second, then use this macro.
Leaves the new address on the stack. The resulting string is in newly allocated memory.
$strtok
Splits a string into tokens.
Push the address of the string, then the separator character, then use this macro.
Leaves the address on the stack of an array of addresses of the tokens. This array and the tokens it points to are all in newly allocated memory.
$startswith
Tests whether one string matches the start of another.
Push the string to compare, then the prefix to test, then use this macro.
Leaves zero on the stack if the prefix does not start the string, or non-zero if it does start it.
$strhash
Calculates a hash value for a string.
Push the string to compare, then use this macro.
Leaves a hash value on the stack.

Heap management

$malloc
Allocates a block of memory.
Push the size of a heap block, then use this macro.
Leaves the address of the first element of the block on the stack. The rest of the block follows sequentially.
If the allocation succeeded, all the allocated heap elements are set to 0.
If you try to $malloc a block of less than one, nothing is allocated, and the result on the stack is -1 You can test for this with jumpn.
$blocksz
Gets the size of a block that was previously allocated with $malloc.
Push the address of the block, then use this macro.
Leaves the block size on the stack.
$free
Frees a block of memory previously allocated with $malloc, making it available for re-use by $malloc.
Push the address of the block, then use this macro.
$freearr
Frees an array and it's contents (all previously allocated with $malloc) making them all available for re-use by $malloc.
This is an array of the type created by $strtok, consisting of a null-terminated list of addresses which point to other memory blocks.
Push the address of the block, then use this macro.
$lstnew
Creates a new list.
The list created works with the other list macros here.
Leaves the address of the list on the stack.
The list structure can hold either integers or heap addresses of other data stractures (e.g. strings, floats, etc).
$lstsz
Gets the size of a list.
Push the address of the list, then use this macro. Leaves the size on the stack.
$lstadd
Adds an item to the end of a list.
Push the address of the list, then the new item (integer or heap address), then use this macro.
$lstget
Gets the nth item from a list.
Push the address of the list and the index of the desired item, then use ths macro.
If the supplied index is out of range, leaves -1 on the stack. Otherwise, leaves the list item and a zero on the stack. You should test for this with jumpz.
$arr2lst
Makes a list that contains values copied from an array.
Push the address of an array, then use this macro.
Leaves the address of the new list on the stack.
Danger! The array elements are shallow-copied, so after using this macro you shouldn't use $freearr on the source array.
$lst2arr
Makes an array that contains values copied from a list.
Push the address of a list, then use this macro.
Leaves the address of the new array on the stack.
Danger! The array elements are shallow-copied, so after using this macro you shouldn't use $freelst on the source list. Consider $freelstlite instead.
$lstdup
Makes a copy of a list.
Push the address of a list, then use this macro.
Leaves the address of the new list on the stack.
Danger! The list elements are shallow-copied, so they are shared between the old and new lists, and after using this macro you shouldn't use $freelst on either list. Consider $freelstlite instead.
$lstrev
Reverses the elements in a list.
Push the address of a list, then use this macro.
The elements of the list are reversed in place rather than in a new list.
$freelst
Frees the memory used by a list and its elements.
Push the address of the list, then use this macro.
Danger! This macro assumes that the elements of the array are heap addresses of strings, and frees those addresses too. If your list elements are not strings (e.g. integers, addresses of other structures, etc) this won't work correctly and may do bad things to your heap. Consider $freelstlite instead.
$freelstlite
Frees the memory used by a list, but not its elements.
Push the address of the list, then use this macro.
If you have a list of integers, this is what you should use. If you have a list of strings, you might need $freelst instead.
$mapnew
Creates a new map of string keys to whatever values you like.
Leaves the address of the new empty map on the stack.
$mapsz
Gets the number of mappings in a map.
Push the address of the map, then use this macro.
Leaves the size of the map on the stack.
$mapput
Puts a key/value pair into a map. If the key already maps to something, the previous value is replaced. Otherwise a new mapping is added.
Push the addresses of the map, the key and the value, then use this macro.
$mapget
Gets a value that is mapped to a given key in the map.
Leaves the value on the stack, or if the key was not mapped, -1.
Push the addresses of the map and the key, then use this macro.

Heap value adjusters, etc.

$incr
Increments an integer on the heap by one.
Push the address, then use this macro.
$incrn
Increments an integer on the heap by some amount.
Push the address, then push the amount, then use this macro.
$decr
Decrements an integer on the heap by one.
Push the address, then use this macro.
$decrn
Decrements an integer on the heap by some amount.
Push the address, then push the amount, then use this macro.
$mult10^n
Multiplies an integer on the heap by 10 to the given power.
Push the address, then push the power of 10, then use this macro.
$abs
Gets the absolute value of a number on the stack, and puts the result on the stack.
Push the number, then use this macro.
$truncdiv
Divides two values, truncating any remainder.
This differs from the div instruction in that it will always round towards zero. For example, -111 divided by 10 is -12 if you use the div instruction, but -11 if you use this macro.
Push the numbers, then use this macro.
$truncmod
Calculates a remainder using truncated division.
This differs from the mod only for negative inputs. For example, for -4 and 3 the result is 2 if you use the mod instruction, but -1 if you use this macro.
Push the numbers, then use this macro.
$mag
Gets the magnitude (number decimal of digits) of an integer on the heap.
Push the address, then use this macro.
Leaves the result on the stack.
$32bcrop
Crops an integer to within the range of a 32 bit twos-complement integer. The effect is the same as down-casting to a 32 bit integer type in C, Java, etc., except that there is no limit on the size of the input value.
Push an integer, then use this macro.
Leaves the result on the stack.

Floating point numbers

The Whitespace language has only one data type: integer. These macros simulate floating point numbers.

One difference is that integers can be put on the stack or the heap, but floating point numbers can only exist on the heap and the stack can only ever hold the address of a floating point number.

Note: see also $readfl and $printfl

$storefl "x.y"
Stores a literal floating point number to the heap.
The parameter x.y must be a string that parses to a float. If not, the program will end - you should give a valid float!
Leaves the new address of the result on the stack.
$str2fl
Parses a floating point number from a string.
Push the string address, then use this macro.
If successful, leaves the new address of the new float and a zero on the stack.
If unsuccessful, leaves a non-zero value on the stack.
So you should test for success with jumpz.
$fl2str
Translates a floating point number to a string.
Push the string address, then use this macro.
Leaves the new address of the string on the stack.
$addfl
Adds two floating point numbers.
Push both addresses, then use this macro.
Leaves the new address of the result on the stack.
$subfl
Subtracts two floating point numbers.
Push the first address, push the second address, then use this macro.
Leaves the new address of the result (first minus second) on the stack.
$multfl
Multiplies two floating point numbers.
Push both addresses, then use this macro.
Leaves the new address of the result on the stack.
$divfl
Divides two floating point numbers.
Push the first number, push the second number, then use this macro.
Leaves the new address of the result (first divided by second) on the stack.
Note: unlike most other floating point macros, this one has limited precision because division may result in recurring decimals. After 19 significant figures, further digits may be truncated (if in decimal places).
$modfl
Returns the remainder of a division of two floating point numbers using floored division.
Push the first number, push the second number, then use this macro.
Leaves the new address of the result on the stack.
$truncmodfl
Returns the remainder of a division of two floating point numbers using truncated division.
Push the first number, push the second number, then use this macro.
Leaves the new address of the result on the stack.
$floorfl
Returns the largest (furthest from negative infinity) integer that is less than or equal to the given floating point number.
Push the address of the number, then use this macro.
Leaves the new address of the result on the stack.
$truncfl
Returns the integer portion of the given floating point number, with the fractional part truncated.
Push the address of the number, then use this macro.
Leaves the new address of the result on the stack.
$sqrtfl
Returns an approximation to the square root of a floating point number.
Push the address of the number, then use this macro.
If successful, leaves the new address and a zero on the stack.
If unsuccessful (e.g. because the input was negative), leaves a non-zero value on the stack.
So you should test for success with jumpz.
Note: unlike most other floating point macros, this one has limited precision because (a) most sqaure roots are irrational (an endless series of decimal places) and (b) the process of finding the root is iterative and stops after enough iterations to find a close approximation. In comparison with languages that use 64-bit IEEE 754 floating point numbers and return the nearest one to a square root, this implementation should give a little more accuracy.

Just for fun, I calculated the square root of 2 with this macro and compared with some other languages that I had to hand.
Languageexamplesquared resultcomment
WsAsm
$storefl "2"
$sqrtfl

1.41421356237309504880
1.9999999999999999999952235666390743814400
Java
Math.sqrt(2)

1.4142135623730951
2.00000000000000014481069235364401 Possibly the same C library being used?
Python
import math
math.sqrt(2)

1.4142135623730951
2.00000000000000014481069235364401
CLisp
(sqrt 2)

1.4142135
1.99999982358225Looks like a 32 bit result. Maybe I have a 32 bit version?
awk
BEGIN {printf \"%.15f\n\", sqrt(2)}

1.414213562373095
1.999999999999999861967979879025GNU awk 5.1.1 would give me 52 decimals, but after 15 they were wrong.
$expfl
Returns an approximation to a base raised to a power, where both are floating point numbers.
Push the address of the base, then the address of the power, then use this macro.
Note: if the base is negative and the exponent is negative and not an integer, exponentiation is not defined. This macro will complain and the Whitespace program will stop.
Note: unlike most other floating point macros, this one has limited precision. Currently it produces rather more decimals than it has any right to believe are accurate.

Just for fun, I calculated 2.33.2 with this macro and compared with some other languages that I had to hand. I didn't reverse the calculation. That involves another floating point exponentiation, and while I think the Whitespace results are accurate, I'm not sure exactly how many decimals to believe (which is why I haven't truncated them, and why there are far too many). Reversing in other languages produced 2.3 exactly, and I'm confident there's a bunch of rounding happening there.
Languageexamplecomment
WsAsm
$storefl "2.3"
$storefl "3.2"
$expfl

14.3723927079205007341980003311860100320555587
Over-claiming accuracy with all those decimals. My guess is that this is slightly less accurate than the other answers ending in ...499 but I'm not sure.
Java
Math.pow(2.3,3.2)

14.372392707920499
Possibly the same C library being used?
Python
import math
math.pow(2.3,3.2)

14.372392707920499
CLisp
(expt 2.3 3.2)

14.372393
Looks like a 32 bit result. Maybe I have a 32 bit version?
awk
BEGIN {printf \"%.15f\n\", 2.3 ^ 3.2}

14.372392707920499
GNU awk 5.1.1 would give me 49 decimals, but after 15 I'm not sure that they were right.

User-defined macros

You can define your own macros in external files. When you use them they look like other instructions, but begin with %. Here's an example.

In your wsa file, put this:

$print "Listen to the beep!\n"
%beep

The instruction %beep is a macro that must be defined in %beep.wsam, perhaps like this:

# :%beep is an implied label here
push 7
printc
return

Obviously your own macros could be more complex.

Your macro will be called as a subroutine, so it should end with return.

The name of your macro will be used as a label to call it, so don't use that label elsewhere in your macro. And because only ASCII characters are allowed in labels, this means that your macro name can only contain ASCII characters.

A user-defined macro can include:

A user-defined macro can call itself recursively.

Labels in macros

Whitespace has only one name space for labels. It would be easy to accidentally re-use a label with unintended consequences. To reduce the chance of this, it is recommended to begin labels with % in user-defined macros.

Bad
:foo
Better
:%foo
Best
:%MyMacroFoo

Symbol definitions

To save repeating literal values in different places, and to ensure that literal values are consistently used, you can replace them with symbols.

Symbol names begin with @ and are followed by a value. For example, to enumerate animals:

@cat 1
@dog 2
@fish 3

Then code that handles dogs can use the symbol @dog instead of the value 2. The Whitespace language has no notion of symbols, and the assembler will replace the name of the symbol with its value, but you can be confident that it's the right value.

The value of a symbol is used as the parameter to an instruction, so it could be an integer, a string in double quotes, or a character in single quotes.

@varieties 57
@YesNo "Please reply yes or no."
@LF '\n'

Some examples of using symbols in place of literal values:

push @varieties
$print @YesNo

Recommendations

Although symbols can be defined anywhere, they probably shouldn't be. There's only one name space for symbols, and symbols are not constants. If you define symbol name a second time, its value changes from there onward. So to avoid accidental redefinition of a name, define symbols in a block at the start.

Similarly, defining symbols in user-defined macros may lead to name space issues and accidental redefinitions, so such symbol names should be prefixed with % and their macro name.

@%myMacroName:Gallium 31

Symbols defined in user defined macros will not be available in the code that uses the macro.

Built-in macro data structures

string/array
Arrays are represented as values and a null terminator (character with value zero) being stored in a contiguous block of heap addresses. In the cases where this structure is generated by built-in macros, the values are all non-zero heap addresses.

Strings have the same structure as arrays, but the values are Unicode characters.

float
A float is stored on the heap as a pair of integers, a value and a scale, where the number represented is value×10scale. The float macros make this work as a floating point number.
list
A list structure has a variable length: you can append more items. Internally, it consists of a pointer to a block of addresses and a length for the list. The block of addresses contains the list data. The addresses may not all be in use. This block is managed by the list macros: when you append to the list, the block may change to a new larger block. This is transparent to the caller, provided that you use the list macros to interact with the list.
map
A map structure allows values to be found by name.

Heap address usage

Using a negative heap address is not a good idea in Whitespace. In this implementation it would work fine, but while it's not explicitly prevented in other implementations, it can cause the program to crash. So this implementation throws an exception if you use a negative heap address.

WsAsm imposes a further restriction because of the way $malloc and other macros are implemented. For safe heap usage, stick to addresses 0 to 255, or use $malloc to get you a chunk of memory.