Henceforth VM: Difference between revisions
m (→typo) |
|||
Line 762: | Line 762: | ||
===HFB=== | |||
[[FMF:HFB]] | [[FMF:HFB]] | ||
Revision as of 09:21, 15 December 2008
The Henceforth Virtual Machine (HVM, Henceforth VM) is used to dynamically interpret Henceforth (HF, Hamsterforth) scripts. The OHRRPGCE FMF does not use Hamsterspeak's traditional HSX/HSZ files; rather, it opts for a stack-based alternative. Henceforth files are stored with the .HF extension.
We are currently in the process of reviewing the entire design specification, writing a toy interpreter, and polishing up (and segmenting) this document.
We'll gladly take your suggestions on the "Discussion" page, but please do not modify, distribute, or trust this document. It will soon be as open to the community as everything else on this Wiki, but it'll take a few weeks to do the final review.
Thanks~
/Seth
Acknowledgements
A big thanks to both James and Ralph for their well-founded suggestions regarding the HVM, and to Mike for clarifying countless issues. If you start reading into the minute details of the HVM, you'll find that a whole lot of it is lean, clean, and sensible. This is almost entirely due to James & Ralph's suggestions and experience, which I truly appreciate.
Introduction to Henceforth
Henceforth is a simple, stack-based programming language which the OHRRPGCE FMF uses as a crossover language from traditional Hamsterspeak. Like all of the lumps exclusive to this project, .HF lumps containing Henceforth code are cross-compiled automatically for you when you convert your game to XRPG Format. However, all developers should consider learning basic Henceforth syntax to aid them in writing run-time debug scripts and compile-time optimized versions of their most time-critical Hamsterspeak scripts.
Henceforth is a stack-based language, designed to obviate register allocation problems and minimize script size. You should read James's excellent analogy of a stack, if you aren't familiar with the concept. Hamsterspeak uses the stack for ordering scripts; Henceforth, however, uses the stack for storing data.
A Henceforth Primer
Although Henceforth is not a compiled language, you can pretend that "source" Henceforth is compiled into "binary" HFZ lumps, which the game engine runs, and only one person will cringe and correct you. So let's have a look at some Henceforth source code, shall we? Any text after a hash mark until the next newline is considered a comment, and whitespace is used to separate commands. Consider the following script.
#One way to separate commands is by hitting "Enter" after each command. 4 #Push 4 5 #Push 5 add #Pop the top two elements and add them. Push the result. @result #Pop the top element and store it in the variable "result". #Another way to separate commands is using spaces. 4 5 add @result
Most programmers will use a mix of spaces and newlines to keep their scripts clear and sensible. For example, "4 5 add" should probably always be on the same line. Regardless of what format you use, manipulation of the stack is the same. The following diagram shows how the stack (and the variables list) changes as the HVM reads each command, from left to right.
The "Pure" Virtual Machine
From a theoretical point of view, Henceforth does not require a location to store variables. Rather, we could simply add block labels and a delayed execution (prepending) command:
- Surrounding some source with \name{} will label that block with "name"; if name later appears as a command, this block will be inserted in-place. Later writing /name will un-define "name". If multiple blocks are labeled similarly, the most-recent label is considered valid.
- Pre-pending "&" to a command will push the result of that command into the source code itself, at the time that command is first encountered.
Block labels allow functions to be written very naturally:
#The "modulo" function expects two arguments to be # on the stack; it transforms (X Y) to (X%Y) \modulo { swap dup rot dup rot swap #Now we have (X Y X Y) div mult sub #The heart of our algorithm }
There's no need to "return" anything; the labeled block operates in a way that leaves one item on the stack. Here's some code to use this block as a function:
#We want to compute 7%3 7 3 modulo #After this, "1" is left on the stack
This is almost the exact syntax for writing a function in Henceforth. Now, let's cover prepending:
#Avoiding variables with named blocks. 4 5 add \result{ #Label the start of the "result" block &pop #This is tricky: the first time we encounter this # (i.e., when "\result" is defined), it executes "pop", # resulting in "9", which is pushed into the source. } #Closes the "\result" label. result #Calling "result" will now push "9".
Unlike named blocks, prepending is not actually implemented in Henceforth; you can ignore that last bit of code if you don't understand it. Which brings us to our main point:
Henceforth was designed around a "pure" VM, but it was then modified to run efficiently on a phone, and to allow an easy mapping from existing Hamsterspeak scripts.
It is crucial that you keep the ideal virtual machine in mind, or else you will start asking about "Henceforth's registers" (which don't exist) or you'll start using variables way too much. You also need to be cognizant of the dirty details required to kludge Hamsterspeak onto a phone; otherwise, things like script-level subroutines and our extravagant caching mechanism will seem redundant and senseless.
(Note to readers: At this point, if you're only interested in the Henceforth language, you can skip down to the section on Henceforth Syntax.)
Where the HVM Fits Into the Game Loop
Let's pretend that on a certain map in your game, a certain NPC is set up to "run script 4" when the user walks up to it and presses "Accept". Upon pressing "Accept", the game engine checks if this NPC has a script ID assigned to its "accept" behavior. The classic OHR performs this same task. If one exists, it is initialized and cached. At this point, the game engine continues with its tasks for that cycle (e.g., message boxes, vehicles). This (including user input) constitutes the "Update" phase of the game loop. Following this is the "Paint" phase, which shows one frame's worth of updates to the user. After painting, the tick count is incremented, and the loop repeats. The first thing that happens then is the "scripts" phase, during which scripts are actually executed (see below). The loop repeats indefinitely.
Virtual Hardware Specification
The Wikipedia page for Virtual Machines focuses on their use in emulation. In fact, there is no hardware version of the HVM, just like there is no hardware equivalent of the Java Virtual Machine. Rather, Henceforth defines an unambiguous virtual hardware specification which can be ported to any platform with guaranteed interoperability. Currently, the only platform with a working HVM is the OHRRPGCEFMF, due to the bias of the HVM's design towards mobile devices.
Anything in the world can be described by its nouns and verbs --the things it has and what they do. So let's inspect the pieces of the HVM and how they interact.
VM Data Structures
Green and red lines in this diagram indicate references from one item to another, although they are not drawn for every single reference (because that would be craaaaaaazy).
What These Do: Top-Down Description
One way of learning how these data structures work is by discussing what each piece does on its own, and then assuming their combination. This is like studying one of those apple machines; we say that this piece pierces the apple and holds it, that piece slices the skin off, and that blade cores the apple.
Here's the lowdown on each data structure and its elements:
Global Variable Table | Contains global variables indexed by ID. |
ID | ID of the variable (implicit). |
Value | Value of the variable (signed integer). |
Running Scripts Table | Contains a script UDT and related data for any script which is held in the HVM (running or waiting). |
Script | The script UDT for a running script. |
Wait | True if the script is not ready to continue execution; false otherwise. |
Wait On | A series of bitflags signifying what this script is waiting on, concatenated to the number of cycles left to wait, for wait_cycles. The flags are:
wait_cycles
|
Script UDT | Wraps several fields necessary for a running script. |
Local Stack | With the exception of the "push/pop local" commands, all stack operations will take place on the local stack, allowing for easy multitasking. Conceptually "on top" of the local variable array. |
Program Counter (PC) | Current index into the "Bytecode" array (see: "Script Cache Pointer") of the bytecode being executed. At the end of every cycle, the PC is incremented by 1. |
ID | Index of this script in the "Script Lookup Table". |
Script Cache Pointer | A reference to this script's cached data in the "Script Cache" table. If this is null (or -1), this script UDT's cached data has somehow been lost, and must be re-loaded. |
Parent Script Pointer | A reference to the script which called this script. When this script terminates, it will "wake" the parent script. |
Return Address Stack | When a script calls a local subroutine, it pushes the current PC to this stack, and pops it upon returning. |
Local Variable Array | Stores the first seven parameters and the return value for a given script. Push and pop operations on these affect the stack of the calling script. Indexing this array at -1 will push an effective "return value", to use standard terminology. Could theoretically be used to clobber the stack (but please don't). |
Local Variable Hash | Local variables after seven are stored here, and the same push/pop behavior applies. Named variables such as "x" or "test" are also stored here, through the use of a hash table. Although stack-minded gurus detest local stores, they are allowed into Henceforth for the purpose of clarity when debugging a script at run-time. |
Do/It Stack | A stack of do UDT entries. One is pushed each time a dostart or ifstart imperative is encountered, and popped when the corresponding doend or ifend is found. |
Local Function Start Indices | Each script can define local subroutines, which are usually numbered starting from zero, increasing by one monotonically. For example, "\[2] { dup }" defines a local subroutine with ID 2 that serves to duplicate the top of the stack. The interpreter caches the start indices of these script-local subroutines in a small table for fast lookup. |
String Table | Contains a list of Unicode-encoded strings, ordered by ID, for use by this script. |
Do UDT | A pair of integers used to manage do loops and if statement. |
Start Index | The value of the PC when a do or if statement is encountered will be unique within a script. |
Else Index | Contains the index of "else_start" in an if block. (See below). When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur. -2 indicates that this is a "do" block, not an "if block"; any other number is a valid index in the bytecode array. |
End Index | Given a do loop's start index, the break command will jump to the next "doend" imperative (taking care to discard any nested "doend"s). Since bytecode is never modified at run-time, this scanning only needs to happen once; the PC index of the "doend" statement can be cached for later. If statements follow the same rule. When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur; any other number is a valid index in the bytecode array. |
Script Lookup Table | A full list of all static scripts contains additional data to help retrieve script source from the cache or from the file system. |
ID | This field is implicit; all scripts from ID 0 to ID N appear in order in this table. If auto-numbered scripts are used, the Henceforth cross-compiler will renumber them so that there are no gaps in this table. |
Name | The name of this script is stored as a string. The Henceforth cross-compiler will always call a static script by its ID, but developers may choose to call a script by name when debugging. |
Chunk | Scripts are stored compressed, but compressing a 10-byte file saves you nothing. Hence, there are several scripts "chunked" together in a file. If this value is "5" for example, then the lump "SCRIPTS_5.HF" should be loaded. Please see the relevant documentation for information on how to extract a particular script from its chunk. |
Cache | A reference to this script's entry in the Script Cache, null or -1 if no entry exists. If several copies of a script exist, they pull only one entry into the cache, and checking this value in the Script Lookup Table is faster than scanning each entry in the Running Scripts Table. Scanning must still occur when an entry is ejected from the Script Cache, but phones with a reasonable amount of memory will rarely have to eject entries. |
Script Name Multiplexer | A table which holds definitions for named scripts (those defined at runtime). |
Name | Unlike static scripts, runtime scripts are identified by name alone. Stored in hash table, no two scripts may have the same name (see below). |
Definitions | A stack of script cache entries corresponding to each definition, stored by reference. If, say, the script "exp" is declared twice, the second entry takes precedence until it is undeclared. Calling a static script by ID can get around this, but that is a discouraged usage of script IDs. |
Script Cache | A table which contains large amounts of recycleable data, to allow our scripts to run on machines with very little memory. Also contains a few flags. |
New Name | Although probably un-necessary, we store the name of the script (or the empty string, for static scripts) to help with debugging. |
ID | Stores the ID of a static script, or -1 if this is a runtime script. |
Pinned | A flag: if true, this script cannot be removed from memory without throwing the HVM into an invalid state. An example of this would be a runtime-declared scrit. Theoretically, a static script that is constantly being reloaded from memory or a script that contains a massive Do Table might be manually pinned by the HVM. |
In Use | Set to zero initially; incremented by 1 each time a script references this entry. Decremented by 1 each time a script terminates. An entry with an in-use count of 0 is only removed from the Script Cache if more space is needed, since the Do Table is also cached. |
Bytecode | An array for the script's Format B bytecode. This is static data --after being defined once, it can only be re-loaded or re-defined, never modified. |
Do Table | A cache for this script's Do UDTs. When a Script UDT pops its do/it stack, the value is saved here; if the same loop or if statement is encountered again later, a considerable amount of scanning can be skipped. |
What These Do: Bottom-Up Description
If you've ever seen an apple machine like the one just mentioned, you're probably aware that describing each element does absolutely nothing to help you understand how the thing works --you have to actually use it on an apple (preferably very slowly) to see how the magic is done. This is a bottom-up description: describe how certain parts interact at critical moments in the operation of the whole.
We will now list some critical operations performed by the HVM and how it uses the data structures to track these operations. Before reading this section, you should probably read the syntax dictionary, so that you have an idea of Henceforth's intended functionality. Otherwise, this section might not make a lot of sense.
Functional Primitives | |
Trigger
dup
|
Response
|
Numbers | |
Trigger
The HVM encounters a "short integer" or a "full-width integer" |
Response
Push that number to the "local stack" of the currently-running script. |
Script-Local Subroutines | |
Trigger
The HVM encounters a "define local script" with id x. |
Response
|
Trigger
The HVM encounters a "call local script" with id x. |
Response
|
Trigger
The HVM encounters an "end define" and the "return address stack" is not empty. |
Response
|
Global Variables | |
Trigger
The HVM encounters a "retrieve global variable" with id x. |
Response
|
Trigger
The HVM encounters a "set global variable" with id x. |
Response
|
Call Subroutines | |
Trigger
The HVM, while running a script with id y in the Running Scripts Table, encounters a "call user-defined function by id" with id x. |
Response
|
Trigger
The HVM, while running a script with id y in the Running Scripts Table, encounters a "call named subroutine" with name exp. |
Response
|
Trigger
The HVM encounters a "hamsterspeak api call" with id x. |
Response
|
Local Variables | |
Trigger
The HVM encounters a "push local variable by id" with id x. |
Response
|
Trigger
The HVM encounters a "pop local variable by id" with id x. |
Response
|
Trigger
The HVM encounters a "push local variable by name" with name test. |
Response
|
Trigger
The HVM encounters a "pop local variable by name" with name test. |
Response
|
If Statements | |
Trigger
The HVM encounters an "if_start" |
Response
|
Trigger
The HVM encounters an "if_end" |
Response
|
Trigger
The HVM encounters an "else_start" (This will only happen if the else branch is not taken) |
Response
|
Loops & Flow Control | |
Trigger
The HVM encounters a "do_start" |
Response
|
Trigger
The HVM encounters a "do_end" |
Response
|
Trigger
The HVM encounters a "break" or "break_x" |
Response
|
Trigger
The HVM encounters a "continue" or "continue_x" |
Response
|
Subroutines & Strings | |
Trigger
The HVM encounters a "string define" with string test string. |
Response
|
Trigger
The HVM encounters a "begin define" with string my_function. |
Response
|
Trigger
The HVM encounters an "un-define named subroutine" with string my_function. |
Response
|
Trigger
The HVM encounters an "end_define", or a "return" command in a user-defined script. |
Response
|
Some Technical Considerations
When a script is triggered by any means, it suspends the currently-executing script; see "The Stack" in "How does plotscripting work?". This makes sense; if your script calls another script, one expects the new script to take over. It also means that if you step on several map tiles which each have (long) scripts, the last maptile's script will finish first. (Threading will probably change this.)
But wait, there's more! Several small but important details of script initialization include:
- If you trigger the same script twice (e.g., by stepping on two copies of an NPC one after the other) the script loader will detect that you are trying to load a script with the same id as the currently executing script. If the "Permit double-triggering of scripts" general bitset is on (or if the script is explicitly called by another script) then this is allowed, and a second instance of the script is initialized.
- When a script instance is first loaded, its "delay" counter is set to 2 ticks. This is not the same thing as instructing the script to "wait", because it remains the active script while delaying. If another script pre-empts this one, the delay counter does not decrement until the new script finishes. (When the new script finishes, the old one immediately resumes what it was doing, be it running or decrementing the "delay" counter.)
- To prepare for parallel scripts, the active script is pushed to the top of the "waiting" stack, and is manipulated by a reference ID. This way, all scripts are kept in one place, and GAME_FMF can just scan through this list once each cycle, updating all scripts until they enter a waiting state.
HF Source Code & Formats
Strictly speaking, Henceforth has three different formats for its source code, and there is no such thing as "compiled Henceforth". These three formats are Format T (text-based), Format B (bytecode), and Format HF (compressed). Practically, most user code is generated in Format T, which is then converted to Format B and compressed (to Format HF) for storage. The HVM reads all three formats, but in doing so it converts T & HF to B.
That said, we shall begin our study of Henceforth syntax by exhaustively reviewing all Henceforth commands in "source code" format (Format T), and then describe the mappings to other formats.
Henceforth source is a rather slim language. Numbers and strings are supported, but concatenations like "Hello"+" world" and decimals like 2.4 are not. A syntax exists for "do" and "if" statements, and for declaring functions and variables. Several smaller hacks exist for efficiency's sake, as described earlier. And that's it. Anything else is either a "primitive" like add, or a "block label", subject to the polymorphic lookup that characterizes Forth-like languages. Henceforth is not based explicitly on Classic Forth, but it hopes to learn from the mistakes of this peculiar family of languages. One of the main disadvantages to ANSI Forth is its pandering to C++ fixations, overcomplicating a language which, at its heart, always championed simplicity and elegance.
Henceforth Syntax Dictionary
Numbers
Numbers are represented by signed integers such as 2, 18, -35, or 43. When the HVM encounters a number, it pushes it to the stack. That's all for numbers, so let's cover stack effect diagrams briefly; they're necessary to understand more complicated features. Consider:
1 2 3 | 7 | 1 2 3 7 |
Note: Stack-effect diagrams are oriented with the top of the stack on the right. This means that the stack starts with the elements 1, 2, and 3 already in it, is then given the command "7", and the resulting stack is now 1, 2, 3, then 7. These is an example of the prototype style of learning; to teach you how to use a command, I give you a real-world example of how it works.
If I want to write a stack effect diagram without the fancy table, I might use parentheses and the "--" symbol to mean "the operator we're talking about". For example: "if the interpreter encounters -5, the resulting stack-effect diagram would be (1 2 3 -- 1 2 3 -5)".
And now we're way beyond numbers.
Primitives
When the HVM interprets a literal string like exp, mycode, or abs, it searches for the relevant subroutine. The primitives are exceptions to this; they are always executed immediately. A primitive consists of lowercase letters and underscores, and must contain at least one letter.
Here is a list of primitives, and their stack-effect diagrams:
Category | Primitive | Stack Effect Diagram | Comments |
Stack Manipulation | dup | ( 1 2 3 -- 1 2 3 3 ) | "Duplicate" the top of the stack. |
swap | ( 1 2 3 -- 1 3 2 ) | "Swap" the top two elements of the stack. | |
drop | ( 1 2 3 -- 1 2 ) | "Drop" the top element off the stack. | |
over | ( 1 2 3 -- 1 2 3 2 ) | Pull the penultimate element "over" the top element. | |
rot | ( 1 2 3 4 -- 1 4 3 2 ) | "Rotate" the top three elements of the stack. | |
Arithmetic | add | ( 1 2 3 -- 1 5 ) | Addition |
sub | ( 7 3 2 -- 7 1 ) | Subtraction | |
mult | ( 1 2 3 -- 1 6 ) | Multiplication | |
div | ( 1 7 3 -- 1 2 ) | Division | |
random | ( 1 4 8 -- 1 6 ) | Random Number | |
Bitwise Operators | b_xor | ( 1 52 37 -- 1 17 ) | Exclusive Bitwise "Or" |
b_and | ( 1 52 37 -- 1 36 ) | Bitwise "And" | |
Logical Operators | eq | ( 1 2 2 -- 1 1 ) | Logical "Equals" |
lt | ( 1 5 2 -- 1 0 ) | Logical "Less Than" | |
not | ( 1 5 1 -- 1 5 0 ) | Logical "Not" | |
and | ( 1 5 1 1 -- 1 5 1 ) | Logical "And" | |
xor | ( 1 5 1 1 -- 1 5 0 ) | Exclusive Logical "Or" |
There is no boolean type; the number zero is considered "false", and any other number is considered "true". For clarity, the number "1" is the preferred way to represent "true".
A number of additional keywords such as break and continue will be discussed in the section on If statements and Loops.
Blocks (Subroutines)
Labeled blocks, also called subroutines, are the core of Henceforth's modularization. They are strikingly similar to primitives, except that they can be defined, undefined, and overloaded. Labels contain letters and underscores, just like primitives.
Defining a Subroutine | |||||||||
Syntax \NAME{CODE} |
Example \mod{ swap dup rot dup rot swap div mult sub } |
Comments This creates a function called "mod", which performs a series of stack manipulations and basic arithmetic. |
Calling a Subroutine | |||||||||
Syntax NAME |
Example 7 3 mod |
Comments Traditionally, we push the "parameters" to the subroutine in increasing order, and expect it to remove these arguments (and replace them with a return value, if any) during the course of its operation. |
Overriding a Subroutine | |||||||||
Syntax \NAME{CODE} |
Example \mod{ swap dup rot dup rot swap div mult sub } \mod{ drop drop 0 } |
Comments After executing this snipped, "mod" will have two definitions. Calling "mod" will result in the latter being evoked, which simply clears the parameters and returns zero. |
Un-defining a Subroutine | |||||||||
Syntax /NAME |
Example /mod |
Comments If we execute the code from "Overriding a Subroutine", and then execute this code, "mod" will be left with a single definition: the first one. |
Any labeled block remains in effect even if the script which defined it finishes execution; make sure to un-define any subroutines you do not wish to linger. The HVM capitalizes on this behavior to limit the number of built-in primitives. The modulus operator described earlier is an example, as is "less than or equal to", which can be defined compositely. Here is a list of all the subroutines the HVM loads by default, with their stack-effect diagrams.
Category | Subroutine | Stack Effect Diagram | Comments |
Arithmetic | exp | ( 1 2 3 -- 1 8 ) | Compute x raised to the y ("exponent") |
mod | ( 1 10 4 -- 1 2 ) | Calculate the "modulus" | |
Bitwise Operators | b_or | ( 1 45 52 -- 1 61 ) | Inclusive Bitwise "Or" |
Logical Operators | neq | ( 1 45 52 -- 1 1 ) | Logical "Not Equal To" |
lte | ( 1 45 45 -- 1 1 ) | Logical "Less Than or Equal To" | |
gt | ( 1 45 52 -- 1 0 ) | Logical "Greater Than" | |
gte | ( 1 63 45 -- 1 1 ) | Logical "Greater Than or Equal To" | |
or | ( 1 0 1 -- 1 1 ) | Logical "Inclusive Or" |
Variables
Shuffling the stack beyond the fourth element can get messy. As such, subroutines can allocate local variables which stay in scope as long as that script is running. Note that variables follow a similar naming convention to subroutines. They are accessed using the @VAR and VAR@ commands. You can remember them easily by putting an imaginary "stack" next to them:
- (stack)@VAR takes a value from the stack and throws it at VAR. This is the same as setting VAR = pop().
- VAR@(stack) takes the value in VAR and throws it at the stack. This is the same as pushing VAR.
Here is some sample source for the lte function, written using variables. Note that, in reality, lte only uses the stack. This is much more efficient for the HVM, which only initializes the local variable store if it detects your script trying to access it.
Using Variables | |||||||||
Code \lte{ #Pop the stack twice to get our local variables. #We assume that we are testing @lhs <= @rhs @rhs @lhs #Push our variables, do the first test (<) and # store it in @res1 lhs@ rhs@ lt @res1 #Push our variables, do the second test (==) and # store it in @res2 lhs@ rhs@ eq @res2 #Push our two tests and do an or test. Leave it on # the stack to be returned res1@ res2@ or } |
Comments As an exercise, you should try to convert this to a stack-based alternative. It should end up more concise, and also faster. |
Do/If & Loops
Most Forth-like languages have an "if" statement, and Henceforth is no exception. However, rather than encapsulating loops into labeled blocks (like Classic Forth), it adds another keyword: "do". This serves two purposes: first, it allows for better inlining than subroutines, and second, it provides a locus of control for jump commands. The second rationale is sometimes written as "otherwise, we'd have to use goto".
Inside of a do loop, the commands break and continue also take effect; you should read the HSZ documentation for an example of how these work conceptually. The mechanism for break and continue is different in the HVM, but the effect is the same.
The standard Hamsterspeak engine also allows breaking out of multiple loops at once. In order to keep the syntax clean in Henceforth, we introduce two bytecodes for that purpose: break_x and continue_x. These take the element from the top of the stack, and break out of that many loops. The break command is effectively equivalent to "1 break_x".
Thus, we add five more keywords to the list of Henceforth primitives: do, if, else, break, and continue. Brackets are used to denote code blocks where relevant. We will cover the syntax in simple examples. The first is a user-defined subroutine, "divide_safe", which avoids dividing by zero, and the second is an example of the Fibonacci sequence using "do" to achieve the effect of a "while" loop.
Save Divide-by-Zero Handling | |||||||||
Code \divide_safe{ #Do a zero-check on the divisor dup 0 eq #Now, branch on that if{ pop pop 0 }else{ div } } |
Comments Note that returning zero is just one way of handling the erroneous case. Also, take note of the two "pop" commands; these are required! |
Fibonacci Numbers | |||||||||
Code \fibonacci{ @count #Must be >=1 0 @fprev #fib(n-1) 1 @fnow #fib(n) do { #Test-and-decrement count@ 0 eq if { break }else{ count@ -1 add @count } #Iterate & Accumulate fprev@ fnow@ add fnow@ @fprev @fnow #Re-do the loop continue } fnow@ } |
Comments This subroutine fails given zero as input. Modify it by adding a simple if statement before the main loop. |
This second example is far more interesting; it answers the question "How do I Loop in Henceforth?"
A Note About Recursion: Although Henceforth fully supports recursion, the HVM is not designed for heavily-recursive functions. When possible, please use an iterative solution.
HSpeak API Calls & User Scripts
The HVM has knowledge of most "built-in" functions that Hamsterspeak provides. These are called like any other subroutine, with one exception: most format converters create specialized subroutine calls out of them. In other words, if I write a call to wait_npc in my (Format T) code, the converter will create (Format B) source that points explicitly to the definition of wait_npc in the HVM; it is not possible to overload wait_npc. This breaks our promise of fully-interchangeable source formats, but it is hardly noticeable to most users, and keeps our global function lookup table from becoming too cluttered.
See FMF:Plotdict for a listing of all API calls and their syntax in Henceforth.
User-defined scripts are treated exactly like API calls; they are assigned IDs and called explicitly.
Strings
Strings in Classic Forth are handled by simply pushing a series of integers (for each letter) to the stack. Eventually, Henceforth may go this route, but for now the API calls to strings are so simple that we chose a much simpler implementation:
- Strings are a series of letters between double-quotes, like "Hello World", "Going?", or "Unicode: \u1000". As hinted by the last example, the HVM supports Unicode.
- When a string is encountered (usually at the top of a script) it is pushed to the String Table for that script. Strings are pushed in order, cannot be removed or easily manipulated, and have IDs in increasing order.
This is an ad hoc implementation of strings, until we get a clearer picture where the standard OHR's development is going with these ideas.
Using a String | |||||||||
Code "Hello world!" 0 show_str |
Comments Defining "Hello world!" will place this string at index zero; pushing 0 and then calling the API function show_str will reference that string explicitly. |
Script-Local Subroutines
Here's where Henceforth starts to really get picky. When a Hamsterspeak script is converted to Henceforth, it is basically converted into a series of labeled blocks that call each other. Recalling that user-level "scripts" are effectively subroutines, the keen observer will note that a lot of resources are wasted supporting "subroutines" that are, actually, just glorified "goto" blocks. For example, a definition such as "\plus_two{ 2 add }" doesn't need its own stack space --it's just a snippet of code!
With that in mind, subroutines are allowed to define "script-local" subroutines which operate along a single thread of control within a named block. These are defined with a given ID, and called with an array-like syntax. They cannot be un-defined or overloaded, although theoretically they could be replaced (by re-defining them).
Defining a Script-Local Subroutine | |||||||||
Syntax \[ID]{CODE} |
Example \[5]{ 2 plus div mult sub } |
Comments Our "plus two" example from before, assigned to index 5. |
Calling a Script-Local Subroutine | |||||||||
Syntax [ID] |
Example 16 [5] |
Comments This pushes 16, and calls local block 5. The result is a stack containing the number 18, if we define [5] as listed above. |
Local Parameters & Return Values
An astute reader will have noticed that scripts currently have no way of passing values to other scripts. Calling "5 6 my_func" won't work, because my_func will get its own independent stack. This problem is circumvented through the use of local parameters, which also provide a slight speedup for small, frequently called subroutines. The local parameters are accessed by ID, with -1 a special ID that refers to the script's return value.
- Executing @[5] will "pop the (parent) stack to local parameter 5"
- Executing [5]@ will "push local parameter 5 onto the (current) stack
For the return value, it's backwards:
- Executing @[-1] will "pop the (current) stack to the return value"
- Executing [-1]@ will "push the return value onto the (parent) stack", unless the return value is set to invalid (INT_MIN, for now).
The only point of interest is that last bit. The problem is, the HSpeak command "return" sets the return value, and "exitscript" will exit and return that value. Some scripts simply aren't intended to use their return values, which would leave an extra element on the stack. We could always just pop the return value, but we don't want to punish a highly modularized script with an un-necessary drag on performance. This is the alternative we came up with.
Comments
Henceforth supports comments, although it's important to realize that Formats B and HF will drop your comments; they can't be encoded. A comment begins with a hash mark (#) and ends with a line feed.
Global Variables
All that's left are global variables, which are manipulated with syntax like @[65.G] and [65.G]@. We realize this is a bit awkward, and we're open to a cleaner syntax (once we actually get a release going).
The Three Formats of Henceforth
As mentioned, Henceforth source can appear in three different formats: T, B, and HF. Here's a side-by-side comparison of the three.
Type | Use | |
Format T | text | All human interaction (coding, discussing ideas, etc.) occurs in this format. |
Format B | bytecode (binary) | The HVM executes Format B code. The HSP2HF cross-compiler outputs this format. |
Format HF | compressed | The XRPG specification requires all Format B code to be compressed into format HF lumps, which are then stored locally on the user's phone. |
HFB
The Cross-Compiler
Fortunately, most of you will never have to touch Henceforth if you don't want to. (But who wouldn't want to?) The HSP2HF cross-compiler is used to automatically convert your HamsterSpeak into Henceforth bytecode. It is part of the RPG2XRPG conversion process.
How the Cross-Compiler Works: Motivations
Some parts of the OHRRPGCE FMF project are slightly incompatible with the standard OHRRPGCE, but for scripting this is simply unacceptable. Unlike things like, say, a slightly jittery enemy HP value -which is immediately apparent when you first load your game on a phone- a silent error in a script is worth hours of headache-inducing debugging, and probably not worth anything at all in the end.
So, the goal of the cross-compiler is script-level compatibility. Efficiency and conciseness, while important, take a backseat to this driving need.
How the Cross-Compiler Works: Naive Compiler
The Henceforth cross-compiler benefits from HamsterSpeak's tree-like structure; a naive conversion can simply convert each node to a local Henceforth function. Consider the following script (from Wandering Hamster, loaded in the HSP2HF utility)
Typical to most HSZ scripts, "setnpcspeed" contains a head "do" node. This node happens to contain only one element, which takes three arguments, each of which is a simple number or variable. Clicking "cross-compile" will invoke the naive converter, which produces the following code:
\LOC14{ 1 L[]@ } \LOC12{ 3 } \LOC10{ 0 L[]@ } \LOC4{ \LOC10() \LOC12() \LOC14() \BLT78() } #Init local variables @L[1] @L[0] #Main script loop do_start \LOC4() do_end
Let's start from the local variables section:
@L[1]
This is a shorthand syntax; it basically calls "local store", a function deep within the script interpreter which does something like this:
void localStore(int arg) { local_variables[arg] = pop_parent(); }
Next, we have the main loop. The "do_start" and "do_end" primitives are there to help the "break" and "continue" primitives to function properly. The meat of the main loop is the call to:
\LOC4()
...which is simply a function call. We use the back-slash to emphasize that LOC4() is stored locally, but this syntax will likely be revised, since HF treats functions, primitives, and built-ins nearly the same way.
The {} syntax to define functions will likely remain, so the following code defines LOC4()
\LOC4{ \LOC10() \LOC12() \LOC14() \BLT78() }
...as three local calls(LOC10, LOC12, and LOC14) and one built-in function call (BLT78, alterNPC). The remaining local functions are equally easy to understand. For example, "1 L[]@" pushes 1, and then calls "local load", defined internally as:
void localLoad() { int arg1 = pop(); push(local_variables[arg1]); }
How the Cross-Compiler Works: Reasonable Inlining
Simple functions like "setnpcspeed" are very easy to inline, just by copying the leaf nodes' source into their parents. The previous script can be re-written as:
#Init local variables @[1] @[0] #Main script loop do_start 0 L[]@ 3 1 L[]@ \BLT78() do_end
...which is much more concise. Due to the nature of HF bytecode, inlining usually improves both performance and storage efficiency. (When I have time to profile, I hope to collect some facts to back this up.) However, inlining everything is often either impossible or unwise, which is why one needs a policy for inlining. At the moment, the OHRRPGCE FMF's cross-compiler uses the following algorithm to determine what to inline:
1) Start by doing a naive conversion, and retain the tree structure. Mark all nodes as "not dead" and with a "count" of 1. 2) Loop until all nodes are inlined or dead. At each iteration, do the following. 3) Determine which nodes are leaf nodes. (Leaf nodes have no children, or only dead children). a) If this node cannot be inlined (e.g., it's self-referential, or checks b/c/d/e below fail), mark it as "dead". b) If this node's count is "1", inline it. (Copy its corresponding source into any node that references it, incrementing their "count" value by 1, and then delete it.) c) If this node's count is "2", and it is referenced 8 times or less, inline it. d) If this node's count is "3", and it is referenced 4 times or less, inline it. e) If this node's count is "4" or more, only inline it if it is referenced exactly once.
We are still discussing what makes a node impossible to inline. Technically, the problem is difficult, but Hamsterspeak byte-code is fairly structured in nature, which means we can probably define a few simple criteria for exclusion.
Primitives & HSPEAK->HF Snippits
The cross-compiler inserts snippets of HF code when it encounters an HSPEAK node of a given type. For example, at node 10, given a "number" node with value 75, it inserts:
\loc_10 { 75 }
Henceforth, we shall refer to a node of ID n as: \loc_n {} --this allows us to generalize HSPEAK nodes into simple templates. Here are the snippets used by the cross-compiler; we repeat numbers for the sake of completeness:
Numbers | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
number | value | \loc_n { value } |
Do Loops | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | do | node_x node_y ... |
\loc_n { do{ loc_x loc_y ... } } |
If Statements | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | if | conditional_x then_y else_z |
\loc_n{ loc_x if { loc_y } else { loc_z } } |
Then/Else Loops | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | then/else | node_x node_y ... |
\loc_n { loc_x loc_y ... } |
Break/Continue | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | command | amount if amount==1 then: skip amount else: append "_x" to command |
\loc_n { [amount] command } |
Returning | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | return | value | \loc_n { value @[-1] } |
flow | exitscript At a given depth |
\loc_n { invalid @[-1] depth break_x } | |
flow | exitscript At a given depth |
value | \loc_n { value @[-1] depth break_x } |
While/For | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | while | conditional_x do_y |
\loc_n { do { loc_x not if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } } } |
flow | for | count_id_x counter_start_s count_end_e counter_step_w do_y |
#Template: -2 4 set_var will set local variable 1 to value 4 \set_var { swap dup 0 lt if { 1 add -1 mult @[] } else { @[.G] } } #Template: 2 get_var will return the contents of global variable 2 \get_var { dup 0 lt if { 1 add -1 mult []@ } else { [.G]@ } } \loc_n { loc_x loc_s set_var do { loc_w 0 gt loc_x get_var loc_e gt xor not loc_x get_var loc_e neq and if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } loc_x get_var loc_w add } } |
Note 1: The block inline_y simply unrolls the do_y block into the body of loc_n. This is done so that break and continue will function properly. Note 2: The upshot of this is that do_y will be instantly culled from the source, unless another node references it (which would be a bit of a hack, in my opinion. Regardless, this is fine, and will not affect program validity in any way. |
Switch | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | switch | ??? | This is not yet documented in HSZ, so we will deal with it later. |
Variable Access | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
global | variable_x | \loc_n { loc_x [.G]@ } | |
local | variable_x | \loc_n { loc_x []@ } |
Math Functions | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
math | set_variable | lhs_l rhs_r |
\loc_n { loc_l loc_r set_var # As defined above } |
math | increment_variable | lhs_l rhs_r |
\loc_n { loc_l get_var # As defined above loc_r add set_var # As defined above } |
math | decrement_variable | lhs_l rhs_r |
\loc_n { loc_l get_var # As defined above loc_r sub set_var # As defined above } |
math | not | lhs_l | \loc_n { loc_l not } |
math | operand If the operand is listed above, use that code block, not this one. |
lhs_l rhs_r |
\loc_n { loc_l loc_r operand } |
Built-In and User-Defined Functions | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
built-in | func_id_x | \loc_n { hspeak_api_call_x } | |
user-script | func_id_x | \loc_n { user_script_call_x } |
Final Considerations
(later)