Revision as of 09:21, 15 December 2008

This article is about the OHRRPGCE FMF project, which is an alternate implementation of the OHRRPGCE for Java mobile phones. Technical implementation details discussed here should not be confused with those of the RPG format

The Henceforth Virtual Machine (HVM, Henceforth VM) is used to dynamically interpret Henceforth (HF, Hamsterforth) scripts. The OHRRPGCE FMF does not use Hamsterspeak's traditional HSX/HSZ files; rather, it opts for a stack-based alternative. Henceforth files are stored with the .HF extension.

Warning: This document is complete, except for the "final considerations" section, which will be trivial to add later.
We are currently in the process of reviewing the entire design specification, writing a toy interpreter, and polishing up (and segmenting) this document.
We'll gladly take your suggestions on the "Discussion" page, but please do not modify, distribute, or trust this document. It will soon be as open to the community as everything else on this Wiki, but it'll take a few weeks to do the final review.

Thanks~
/Seth

Acknowledgements

A big thanks to both James and Ralph for their well-founded suggestions regarding the HVM, and to Mike for clarifying countless issues. If you start reading into the minute details of the HVM, you'll find that a whole lot of it is lean, clean, and sensible. This is almost entirely due to James & Ralph's suggestions and experience, which I truly appreciate.

Introduction to Henceforth

Henceforth is a simple, stack-based programming language which the OHRRPGCE FMF uses as a crossover language from traditional Hamsterspeak. Like all of the lumps exclusive to this project, .HF lumps containing Henceforth code are cross-compiled automatically for you when you convert your game to XRPG Format. However, all developers should consider learning basic Henceforth syntax to aid them in writing run-time debug scripts and compile-time optimized versions of their most time-critical Hamsterspeak scripts.

Henceforth is a stack-based language, designed to obviate register allocation problems and minimize script size. You should read James's excellent analogy of a stack, if you aren't familiar with the concept. Hamsterspeak uses the stack for ordering scripts; Henceforth, however, uses the stack for storing data.

A Henceforth Primer

Although Henceforth is not a compiled language, you can pretend that "source" Henceforth is compiled into "binary" HFZ lumps, which the game engine runs, and only one person will cringe and correct you. So let's have a look at some Henceforth source code, shall we? Any text after a hash mark until the next newline is considered a comment, and whitespace is used to separate commands. Consider the following script.

   #One way to separate commands is by hitting "Enter" after each command.
   4        #Push 4
   5        #Push 5
   add      #Pop the top two elements and add them. Push the result.
   @result  #Pop the top element and store it in the variable "result".
   
   #Another way to separate commands is using spaces.
   4   5   add  @result

Most programmers will use a mix of spaces and newlines to keep their scripts clear and sensible. For example, "4 5 add" should probably always be on the same line. Regardless of what format you use, manipulation of the stack is the same. The following diagram shows how the stack (and the variables list) changes as the HVM reads each command, from left to right.

The "Pure" Virtual Machine

From a theoretical point of view, Henceforth does not require a location to store variables. Rather, we could simply add block labels and a delayed execution (prepending) command:

Surrounding some source with \name{} will label that block with "name"; if name later appears as a command, this block will be inserted in-place. Later writing /name will un-define "name". If multiple blocks are labeled similarly, the most-recent label is considered valid.
Pre-pending "&" to a command will push the result of that command into the source code itself, at the time that command is first encountered.

Block labels allow functions to be written very naturally:

   #The "modulo" function expects two arguments to be
   # on the stack; it transforms (X  Y)  to (X%Y)
   \modulo {
    swap  dup  rot  dup  rot  swap     #Now we have (X Y X Y)
    div  mult  sub                     #The heart of our algorithm
   }

There's no need to "return" anything; the labeled block operates in a way that leaves one item on the stack. Here's some code to use this block as a function:

   #We want to compute 7%3
   7  3  modulo    #After this, "1" is left on the stack

This is almost the exact syntax for writing a function in Henceforth. Now, let's cover prepending:

   #Avoiding variables with named blocks.
   4   5   add  
   \result{       #Label the start of the "result" block
     &pop         #This is tricky: the first time we encounter this
                  # (i.e., when "\result" is defined), it executes "pop",
                  # resulting in "9", which is pushed into the source.
   }              #Closes the "\result" label.
   
   result         #Calling "result" will now push "9".

Unlike named blocks, prepending is not actually implemented in Henceforth; you can ignore that last bit of code if you don't understand it. Which brings us to our main point:

Henceforth was designed around a "pure" VM, but it was then modified to run efficiently on a phone, and to allow an easy mapping from existing Hamsterspeak scripts.

It is crucial that you keep the ideal virtual machine in mind, or else you will start asking about "Henceforth's registers" (which don't exist) or you'll start using variables way too much. You also need to be cognizant of the dirty details required to kludge Hamsterspeak onto a phone; otherwise, things like script-level subroutines and our extravagant caching mechanism will seem redundant and senseless.

(Note to readers: At this point, if you're only interested in the Henceforth language, you can skip down to the section on Henceforth Syntax.)

Where the HVM Fits Into the Game Loop

Let's pretend that on a certain map in your game, a certain NPC is set up to "run script 4" when the user walks up to it and presses "Accept". Upon pressing "Accept", the game engine checks if this NPC has a script ID assigned to its "accept" behavior. The classic OHR performs this same task. If one exists, it is initialized and cached. At this point, the game engine continues with its tasks for that cycle (e.g., message boxes, vehicles). This (including user input) constitutes the "Update" phase of the game loop. Following this is the "Paint" phase, which shows one frame's worth of updates to the user. After painting, the tick count is incremented, and the loop repeats. The first thing that happens then is the "scripts" phase, during which scripts are actually executed (see below). The loop repeats indefinitely.

Virtual Hardware Specification

The Wikipedia page for Virtual Machines focuses on their use in emulation. In fact, there is no hardware version of the HVM, just like there is no hardware equivalent of the Java Virtual Machine. Rather, Henceforth defines an unambiguous virtual hardware specification which can be ported to any platform with guaranteed interoperability. Currently, the only platform with a working HVM is the OHRRPGCEFMF, due to the bias of the HVM's design towards mobile devices.

Anything in the world can be described by its nouns and verbs --the things it has and what they do. So let's inspect the pieces of the HVM and how they interact.

VM Data Structures

Green and red lines in this diagram indicate references from one item to another, although they are not drawn for every single reference (because that would be craaaaaaazy).

What These Do: Top-Down Description

One way of learning how these data structures work is by discussing what each piece does on its own, and then assuming their combination. This is like studying one of those apple machines; we say that this piece pierces the apple and holds it, that piece slices the skin off, and that blade cores the apple.

Here's the lowdown on each data structure and its elements:

Global Variable Table	Contains global variables indexed by ID.
ID	ID of the variable (implicit).
Value	Value of the variable (signed integer).

Running Scripts Table	Contains a script UDT and related data for any script which is held in the HVM (running or waiting).
Script	The script UDT for a running script.
Wait	True if the script is not ready to continue execution; false otherwise.
Wait On	A series of bitflags signifying what this script is waiting on, concatenated to the number of cycles left to wait, for wait_cycles. The flags are: wait_cycles wait_hero wait_npcs wait_anykey wait_pan wait_textbox Note that, in some cases, these flags can all be zero but the script may still be waiting on something.

Script UDT	Wraps several fields necessary for a running script.
Local Stack	With the exception of the "push/pop local" commands, all stack operations will take place on the local stack, allowing for easy multitasking. Conceptually "on top" of the local variable array.
Program Counter (PC)	Current index into the "Bytecode" array (see: "Script Cache Pointer") of the bytecode being executed. At the end of every cycle, the PC is incremented by 1.
ID	Index of this script in the "Script Lookup Table".
Script Cache Pointer	A reference to this script's cached data in the "Script Cache" table. If this is null (or -1), this script UDT's cached data has somehow been lost, and must be re-loaded.
Parent Script Pointer	A reference to the script which called this script. When this script terminates, it will "wake" the parent script.
Return Address Stack	When a script calls a local subroutine, it pushes the current PC to this stack, and pops it upon returning.
Local Variable Array	Stores the first seven parameters and the return value for a given script. Push and pop operations on these affect the stack of the calling script. Indexing this array at -1 will push an effective "return value", to use standard terminology. Could theoretically be used to clobber the stack (but please don't).
Local Variable Hash	Local variables after seven are stored here, and the same push/pop behavior applies. Named variables such as "x" or "test" are also stored here, through the use of a hash table. Although stack-minded gurus detest local stores, they are allowed into Henceforth for the purpose of clarity when debugging a script at run-time.
Do/It Stack	A stack of do UDT entries. One is pushed each time a dostart or ifstart imperative is encountered, and popped when the corresponding doend or ifend is found.
Local Function Start Indices	Each script can define local subroutines, which are usually numbered starting from zero, increasing by one monotonically. For example, "\[2] { dup }" defines a local subroutine with ID 2 that serves to duplicate the top of the stack. The interpreter caches the start indices of these script-local subroutines in a small table for fast lookup.
String Table	Contains a list of Unicode-encoded strings, ordered by ID, for use by this script.

Do UDT	A pair of integers used to manage do loops and if statement.
Start Index	The value of the PC when a do or if statement is encountered will be unique within a script.
Else Index	Contains the index of "else_start" in an if block. (See below). When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur. -2 indicates that this is a "do" block, not an "if block"; any other number is a valid index in the bytecode array.
End Index	Given a do loop's start index, the break command will jump to the next "doend" imperative (taking care to discard any nested "doend"s). Since bytecode is never modified at run-time, this scanning only needs to happen once; the PC index of the "doend" statement can be cached for later. If statements follow the same rule. When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur; any other number is a valid index in the bytecode array.

Script Lookup Table	A full list of all static scripts contains additional data to help retrieve script source from the cache or from the file system.
ID	This field is implicit; all scripts from ID 0 to ID N appear in order in this table. If auto-numbered scripts are used, the Henceforth cross-compiler will renumber them so that there are no gaps in this table.
Name	The name of this script is stored as a string. The Henceforth cross-compiler will always call a static script by its ID, but developers may choose to call a script by name when debugging.
Chunk	Scripts are stored compressed, but compressing a 10-byte file saves you nothing. Hence, there are several scripts "chunked" together in a file. If this value is "5" for example, then the lump "SCRIPTS_5.HF" should be loaded. Please see the relevant documentation for information on how to extract a particular script from its chunk.
Cache	A reference to this script's entry in the Script Cache, null or -1 if no entry exists. If several copies of a script exist, they pull only one entry into the cache, and checking this value in the Script Lookup Table is faster than scanning each entry in the Running Scripts Table. Scanning must still occur when an entry is ejected from the Script Cache, but phones with a reasonable amount of memory will rarely have to eject entries.

Script Name Multiplexer	A table which holds definitions for named scripts (those defined at runtime).
Name	Unlike static scripts, runtime scripts are identified by name alone. Stored in hash table, no two scripts may have the same name (see below).
Definitions	A stack of script cache entries corresponding to each definition, stored by reference. If, say, the script "exp" is declared twice, the second entry takes precedence until it is undeclared. Calling a static script by ID can get around this, but that is a discouraged usage of script IDs.

Script Cache	A table which contains large amounts of recycleable data, to allow our scripts to run on machines with very little memory. Also contains a few flags.
New Name	Although probably un-necessary, we store the name of the script (or the empty string, for static scripts) to help with debugging.
ID	Stores the ID of a static script, or -1 if this is a runtime script.
Pinned	A flag: if true, this script cannot be removed from memory without throwing the HVM into an invalid state. An example of this would be a runtime-declared scrit. Theoretically, a static script that is constantly being reloaded from memory or a script that contains a massive Do Table might be manually pinned by the HVM.
In Use	Set to zero initially; incremented by 1 each time a script references this entry. Decremented by 1 each time a script terminates. An entry with an in-use count of 0 is only removed from the Script Cache if more space is needed, since the Do Table is also cached.
Bytecode	An array for the script's Format B bytecode. This is static data --after being defined once, it can only be re-loaded or re-defined, never modified.
Do Table	A cache for this script's Do UDTs. When a Script UDT pops its do/it stack, the value is saved here; if the same loop or if statement is encountered again later, a considerable amount of scanning can be skipped.

What These Do: Bottom-Up Description

If you've ever seen an apple machine like the one just mentioned, you're probably aware that describing each element does absolutely nothing to help you understand how the thing works --you have to actually use it on an apple (preferably very slowly) to see how the magic is done. This is a bottom-up description: describe how certain parts interact at critical moments in the operation of the whole.

We will now list some critical operations performed by the HVM and how it uses the data structures to track these operations. Before reading this section, you should probably read the syntax dictionary, so that you have an idea of Henceforth's intended functionality. Otherwise, this section might not make a lot of sense.

Functional Primitives
Trigger dup swap drop over rot add sub mult div random b_xor b_and eq lt not and xor	Response Pop a pre-defined number of arguments Perform the desired math/logic Push the returned value, if applicable

Numbers
Trigger The HVM encounters a "short integer" or a "full-width integer"	Response Push that number to the "local stack" of the currently-running script.

Script-Local Subroutines
Trigger The HVM encounters a "define local script" with id x.	Response In the "local function start indices" array, at index x, insert (PC+1) Advance the PC without executing commands until you reach an "end define". Advance the PC by one.
Trigger The HVM encounters a "call local script" with id x.	Response Push (PC+1) to the "return address stack" Set the PC to the "local function start indices" array value at index x.
Trigger The HVM encounters an "end define" and the "return address stack" is not empty.	Response Pop the "return address stack" and set the PC to this value.

Global Variables
Trigger The HVM encounters a "retrieve global variable" with id x.	Response Push the value of that variable from the Global Variable Table.
Trigger The HVM encounters a "set global variable" with id x.	Response Pop the stack. Store the popped value at location x in the Global Variable Table.

Call Subroutines
Trigger The HVM, while running a script with id y in the Running Scripts Table, encounters a "call user-defined function by id" with id x.	Response Retrieve a new entry in the script cache Get entry x from the script lookup table; if "Cache" == -1, then load the source from the appropriate "Chunk" and add a new entry to the script_cache: New Name = null ID = x Pinned = false In Use = 0 Bytecode = loadFromChunk() Do Table = new Array[do_udt] For the appropriate script cache entry, set InUse = true Make a new Script UDT: Local Stack = new Stack[] Program Counter = 0 ID = x Script Cache Pointer = id of the script cache entry we retrieved Parent Script Pointer = y Return Address Stack = new Stack[] Local Variable Array = new Array[max+1] Set Local Variable Array to -1 at index 0 Local Variable Hash = new Hash{} Do/It Stack = new Stack[do_UDT] Local Function Start Indices = new Array[]* String Table = new Array[]* *Note: The HVM can leave "optional" components like this un-initialized until they are actually used. Make a new entry in the Running Scripts table: Script = our new Script UDT Wait = false Wait On = 0 Insert this new Running Scripts entry one step ahead of script Y; this ensures it will be run next. Set script Y's "Wait" flag to "true".
Trigger The HVM, while running a script with id y in the Running Scripts Table, encounters a "call named subroutine" with name exp.	Response The same as the previous call, with the following exceptions: Load from the script cache the last definition in the script name multiplexer for exp script_udt.id = -1
Trigger The HVM encounters a "hamsterspeak api call" with id x.	Response Pop the agreed-upon number of arguments Call API function x with said arguments Push the return variable, if applicable

Local Variables
Trigger The HVM encounters a "push local variable by id" with id x.	Response Pop the stack Store the popped value in script_udt.local_variable_array[x+1]
Trigger The HVM encounters a "pop local variable by id" with id x.	Response Retrieve the value in script_udt.local_variable_array[x+1] Push the retrieved value to the stack
Trigger The HVM encounters a "push local variable by name" with name test.	Response Pop the stack Create a key in script_udt.local_variable_hash for "test" Store the popped value in script_udt.local_variable_hash{"test"}
Trigger The HVM encounters a "pop local variable by name" with name test.	Response If the local variable hash has no key at "test", push -1 and return Retrieve the value in script_udt.local_variable_hash{"test"} Push the retrieved value to the stack

If Statements
Trigger The HVM encounters an "if_start"	Response Push a new do_udt with startIndex = PC to the do/it stack If the script_cache entry for this script contains a do_udt with the same startIndex, set the elseIndex of this do_udt to that value. Else, set it to -1. Do the same for the endIndex. Pop the stack. If this value is not 0, increment the PC. If 0, then set the PC to the elseIndex. (If the elseIndex is -1, scan & dispose bytecodes until the else_start command is found, and then update the script_cache.) Increment the PC.
Trigger The HVM encounters an "if_end"	Response Pop the do/if stack. If that entry's endIndex is -1, set endIndex = PC and update the script_cache.
Trigger The HVM encounters an "else_start" (This will only happen if the else branch is not taken)	Response Pop the do/if stack. If that entry's elseIndex is -1, set elseIndex = PC and update the script_cache. Set the PC = endIndex. (If endIndex is -1, scan & dispose bytecodes until the if_end command is found, and then update the script cache.) Increment the PC

Loops & Flow Control
Trigger The HVM encounters a "do_start"	Response Push a new do_udt with startIndex = PC to the do/it stack If the script_cache entry for this script contains a do_udt with the same startIndex, set the endIndex of this do_udt to that value. Else, set it to -1. Set elseIndex = -2. Increment the PC
Trigger The HVM encounters a "do_end"	Response Pop the do/if stack. If that entry's endIndex is -1, set endIndex = PC and update the script_cache.
Trigger The HVM encounters a "break" or "break_x"	Response Set x = 1 if this is a "break" command, or x = stack.pop() if this is "break_x" while x > 0, do the following: Pop the do/if stack until you encounter a do_udt with elseIndex == -2. Pop that as well. Set PC = endIndex of this do_udt. (If endIndex is -1, scan & discard bytecodes until a do_end is found. Then update the script_cache.) Increment the PC. Decrement x
Trigger The HVM encounters a "continue" or "continue_x"	Response Set x = 1 if this is a "continue" command, or x = stack.pop() if this is "continue_x" Pop the do/if stack until you have removed x do_udts with elseIndex == -2. Push the last entry back to the do/if stack. Set PC = do/if_stack.top().startIndex + 1

Subroutines & Strings
Trigger The HVM encounters a "string define" with string test string.	Response Push test string to the end of the string table
Trigger The HVM encounters a "begin define" with string my_function.	Response Look for my_function in the script name multiplexer. If it's not there, add it and set definitions = new Array[int] Create a new entry in the script_cache: New Name = "my_function" ID = -1 Pinned = true In Use = 0 Do Table = new Array[do_UDT] Scan all bytecodes up to and including an end_define into the "Bytecode" field for this new script cache entry. Push into script_name_multiplexer{"my_function"} the id of the script cache entry you just created.
Trigger The HVM encounters an "un-define named subroutine" with string my_function.	Response Pop the last definition from the script_name_multiplexer{"my_function"} into variable x. If script_name_multiplexer{"my_function"} has no more definitions, remove it. Set script_cache[x] to be a null row; remove it from the array if you can do so without introducing inconsistencies. At the very least, de-allocate all its data.
Trigger The HVM encounters an "end_define", or a "return" command in a user-defined script.	Response If there is a return value in script_udt.local_variable_array[0], push it to the parent script's stack. Set this script's parent's entry in the running scripts table to Wait = false Remove this script from the running scripts table. Decrement the in_use field for the script_cache entry for this script_udt Delete this script_udt

Some Technical Considerations

When a script is triggered by any means, it suspends the currently-executing script; see "The Stack" in "How does plotscripting work?". This makes sense; if your script calls another script, one expects the new script to take over. It also means that if you step on several map tiles which each have (long) scripts, the last maptile's script will finish first. (Threading will probably change this.)

But wait, there's more! Several small but important details of script initialization include:

If you trigger the same script twice (e.g., by stepping on two copies of an NPC one after the other) the script loader will detect that you are trying to load a script with the same id as the currently executing script. If the "Permit double-triggering of scripts" general bitset is on (or if the script is explicitly called by another script) then this is allowed, and a second instance of the script is initialized.
When a script instance is first loaded, its "delay" counter is set to 2 ticks. This is not the same thing as instructing the script to "wait", because it remains the active script while delaying. If another script pre-empts this one, the delay counter does not decrement until the new script finishes. (When the new script finishes, the old one immediately resumes what it was doing, be it running or decrementing the "delay" counter.)
To prepare for parallel scripts, the active script is pushed to the top of the "waiting" stack, and is manipulated by a reference ID. This way, all scripts are kept in one place, and GAME_FMF can just scan through this list once each cycle, updating all scripts until they enter a waiting state.

HF Source Code & Formats

Strictly speaking, Henceforth has three different formats for its source code, and there is no such thing as "compiled Henceforth". These three formats are Format T (text-based), Format B (bytecode), and Format HF (compressed). Practically, most user code is generated in Format T, which is then converted to Format B and compressed (to Format HF) for storage. The HVM reads all three formats, but in doing so it converts T & HF to B.

That said, we shall begin our study of Henceforth syntax by exhaustively reviewing all Henceforth commands in "source code" format (Format T), and then describe the mappings to other formats.

Henceforth source is a rather slim language. Numbers and strings are supported, but concatenations like "Hello"+" world" and decimals like 2.4 are not. A syntax exists for "do" and "if" statements, and for declaring functions and variables. Several smaller hacks exist for efficiency's sake, as described earlier. And that's it. Anything else is either a "primitive" like add, or a "block label", subject to the polymorphic lookup that characterizes Forth-like languages. Henceforth is not based explicitly on Classic Forth, but it hopes to learn from the mistakes of this peculiar family of languages. One of the main disadvantages to ANSI Forth is its pandering to C++ fixations, overcomplicating a language which, at its heart, always championed simplicity and elegance.

Henceforth Syntax Dictionary

Numbers

Numbers are represented by signed integers such as 2, 18, -35, or 43. When the HVM encounters a number, it pushes it to the stack. That's all for numbers, so let's cover stack effect diagrams briefly; they're necessary to understand more complicated features. Consider:

1 2 3 7 1 2 3 7

Note: Stack-effect diagrams are oriented with the top of the stack on the right. This means that the stack starts with the elements 1, 2, and 3 already in it, is then given the command "7", and the resulting stack is now 1, 2, 3, then 7. These is an example of the prototype style of learning; to teach you how to use a command, I give you a real-world example of how it works.

If I want to write a stack effect diagram without the fancy table, I might use parentheses and the "--" symbol to mean "the operator we're talking about". For example: "if the interpreter encounters -5, the resulting stack-effect diagram would be (1 2 3 -- 1 2 3 -5)".

And now we're way beyond numbers.

Primitives

When the HVM interprets a literal string like exp, mycode, or abs, it searches for the relevant subroutine. The primitives are exceptions to this; they are always executed immediately. A primitive consists of lowercase letters and underscores, and must contain at least one letter.

Here is a list of primitives, and their stack-effect diagrams:

Category	Primitive	Stack Effect Diagram	Comments
Stack Manipulation	`dup`	`( 1 2 3 -- 1 2 3 3 )`	"Duplicate" the top of the stack.
	`swap`	`( 1 2 3 -- 1 3 2 )`	"Swap" the top two elements of the stack.
	`drop`	`( 1 2 3 -- 1 2 )`	"Drop" the top element off the stack.
	`over`	`( 1 2 3 -- 1 2 3 2 )`	Pull the penultimate element "over" the top element.
	`rot`	`( 1 2 3 4 -- 1 4 3 2 )`	"Rotate" the top three elements of the stack.
Arithmetic	`add`	`( 1 2 3 -- 1 5 )`	Addition
	`sub`	`( 7 3 2 -- 7 1 )`	Subtraction
	`mult`	`( 1 2 3 -- 1 6 )`	Multiplication
	`div`	`( 1 7 3 -- 1 2 )`	Division
	`random`	`( 1 4 8 -- 1 6 )`	Random Number
Bitwise Operators	`b_xor`	`( 1 52 37 -- 1 17 )`	Exclusive Bitwise "Or"
Bitwise Operators	`b_and`	`( 1 52 37 -- 1 36 )`	Bitwise "And"
Logical Operators	`eq`	`( 1 2 2 -- 1 1 )`	Logical "Equals"
	`lt`	`( 1 5 2 -- 1 0 )`	Logical "Less Than"
	`not`	`( 1 5 1 -- 1 5 0 )`	Logical "Not"
	`and`	`( 1 5 1 1 -- 1 5 1 )`	Logical "And"
	`xor`	`( 1 5 1 1 -- 1 5 0 )`	Exclusive Logical "Or"

There is no boolean type; the number zero is considered "false", and any other number is considered "true". For clarity, the number "1" is the preferred way to represent "true".

A number of additional keywords such as break and continue will be discussed in the section on If statements and Loops.

Blocks (Subroutines)

Labeled blocks, also called subroutines, are the core of Henceforth's modularization. They are strikingly similar to primitives, except that they can be defined, undefined, and overloaded. Labels contain letters and underscores, just like primitives.

Defining a Subroutine
Syntax `\NAME{CODE}`	Example `\mod{ swap dup rot dup rot swap div mult sub }`	Comments This creates a function called "mod", which performs a series of stack manipulations and basic arithmetic.

Calling a Subroutine
Syntax `NAME`	Example `7 3 mod`	Comments Traditionally, we push the "parameters" to the subroutine in increasing order, and expect it to remove these arguments (and replace them with a return value, if any) during the course of its operation.

Overriding a Subroutine
Syntax `\NAME{CODE}`	Example `\mod{ swap dup rot dup rot swap div mult sub } \mod{ drop drop 0 }`	Comments After executing this snipped, "mod" will have two definitions. Calling "mod" will result in the latter being evoked, which simply clears the parameters and returns zero.

Un-defining a Subroutine
Syntax `/NAME`	Example `/mod`	Comments If we execute the code from "Overriding a Subroutine", and then execute this code, "mod" will be left with a single definition: the first one.

Any labeled block remains in effect even if the script which defined it finishes execution; make sure to un-define any subroutines you do not wish to linger. The HVM capitalizes on this behavior to limit the number of built-in primitives. The modulus operator described earlier is an example, as is "less than or equal to", which can be defined compositely. Here is a list of all the subroutines the HVM loads by default, with their stack-effect diagrams.

Category	Subroutine	Stack Effect Diagram	Comments
Arithmetic	`exp`	`( 1 2 3 -- 1 8 )`	Compute x raised to the y ("exponent")
Arithmetic	`mod`	`( 1 10 4 -- 1 2 )`	Calculate the "modulus"
Bitwise Operators	`b_or`	`( 1 45 52 -- 1 61 )`	Inclusive Bitwise "Or"
Logical Operators	`neq`	`( 1 45 52 -- 1 1 )`	Logical "Not Equal To"
	`lte`	`( 1 45 45 -- 1 1 )`	Logical "Less Than or Equal To"
	`gt`	`( 1 45 52 -- 1 0 )`	Logical "Greater Than"
	`gte`	`( 1 63 45 -- 1 1 )`	Logical "Greater Than or Equal To"
	`or`	`( 1 0 1 -- 1 1 )`	Logical "Inclusive Or"

Variables

Shuffling the stack beyond the fourth element can get messy. As such, subroutines can allocate local variables which stay in scope as long as that script is running. Note that variables follow a similar naming convention to subroutines. They are accessed using the @VAR and VAR@ commands. You can remember them easily by putting an imaginary "stack" next to them:

(stack)@VAR takes a value from the stack and throws it at VAR. This is the same as setting VAR = pop().
VAR@(stack) takes the value in VAR and throws it at the stack. This is the same as pushing VAR.

Here is some sample source for the lte function, written using variables. Note that, in reality, lte only uses the stack. This is much more efficient for the HVM, which only initializes the local variable store if it detects your script trying to access it.

Using Variables
Code \lte{ #Pop the stack twice to get our local variables. #We assume that we are testing @lhs <= @rhs @rhs @lhs #Push our variables, do the first test (<) and # store it in @res1 lhs@ rhs@ lt @res1 #Push our variables, do the second test (==) and # store it in @res2 lhs@ rhs@ eq @res2 #Push our two tests and do an or test. Leave it on # the stack to be returned res1@ res2@ or }	Comments As an exercise, you should try to convert this to a stack-based alternative. It should end up more concise, and also faster.

Do/If & Loops

Most Forth-like languages have an "if" statement, and Henceforth is no exception. However, rather than encapsulating loops into labeled blocks (like Classic Forth), it adds another keyword: "do". This serves two purposes: first, it allows for better inlining than subroutines, and second, it provides a locus of control for jump commands. The second rationale is sometimes written as "otherwise, we'd have to use goto".

Inside of a do loop, the commands break and continue also take effect; you should read the HSZ documentation for an example of how these work conceptually. The mechanism for break and continue is different in the HVM, but the effect is the same.

The standard Hamsterspeak engine also allows breaking out of multiple loops at once. In order to keep the syntax clean in Henceforth, we introduce two bytecodes for that purpose: break_x and continue_x. These take the element from the top of the stack, and break out of that many loops. The break command is effectively equivalent to "1 break_x".

Thus, we add five more keywords to the list of Henceforth primitives: do, if, else, break, and continue. Brackets are used to denote code blocks where relevant. We will cover the syntax in simple examples. The first is a user-defined subroutine, "divide_safe", which avoids dividing by zero, and the second is an example of the Fibonacci sequence using "do" to achieve the effect of a "while" loop.

Save Divide-by-Zero Handling
Code `\divide_safe{ #Do a zero-check on the divisor dup 0 eq #Now, branch on that if{ pop pop 0 }else{ div } }`	Comments Note that returning zero is just one way of handling the erroneous case. Also, take note of the two "pop" commands; these are required!

Fibonacci Numbers
Code \fibonacci{ @count #Must be >=1 0 @fprev #fib(n-1) 1 @fnow #fib(n) do { #Test-and-decrement count@ 0 eq if { break }else{ count@ -1 add @count } #Iterate & Accumulate fprev@ fnow@ add fnow@ @fprev @fnow #Re-do the loop continue } fnow@ }	Comments This subroutine fails given zero as input. Modify it by adding a simple if statement before the main loop.

This second example is far more interesting; it answers the question "How do I Loop in Henceforth?"

A Note About Recursion: Although Henceforth fully supports recursion, the HVM is not designed for heavily-recursive functions. When possible, please use an iterative solution.

HSpeak API Calls & User Scripts

The HVM has knowledge of most "built-in" functions that Hamsterspeak provides. These are called like any other subroutine, with one exception: most format converters create specialized subroutine calls out of them. In other words, if I write a call to wait_npc in my (Format T) code, the converter will create (Format B) source that points explicitly to the definition of wait_npc in the HVM; it is not possible to overload wait_npc. This breaks our promise of fully-interchangeable source formats, but it is hardly noticeable to most users, and keeps our global function lookup table from becoming too cluttered.

See FMF:Plotdict for a listing of all API calls and their syntax in Henceforth.

User-defined scripts are treated exactly like API calls; they are assigned IDs and called explicitly.

Strings

Strings in Classic Forth are handled by simply pushing a series of integers (for each letter) to the stack. Eventually, Henceforth may go this route, but for now the API calls to strings are so simple that we chose a much simpler implementation:

Strings are a series of letters between double-quotes, like "Hello World", "Going?", or "Unicode: \u1000". As hinted by the last example, the HVM supports Unicode.
When a string is encountered (usually at the top of a script) it is pushed to the String Table for that script. Strings are pushed in order, cannot be removed or easily manipulated, and have IDs in increasing order.

This is an ad hoc implementation of strings, until we get a clearer picture where the standard OHR's development is going with these ideas.

Using a String
Code `"Hello world!" 0 show_str`	Comments Defining "Hello world!" will place this string at index zero; pushing 0 and then calling the API function show_str will reference that string explicitly.

Script-Local Subroutines

Here's where Henceforth starts to really get picky. When a Hamsterspeak script is converted to Henceforth, it is basically converted into a series of labeled blocks that call each other. Recalling that user-level "scripts" are effectively subroutines, the keen observer will note that a lot of resources are wasted supporting "subroutines" that are, actually, just glorified "goto" blocks. For example, a definition such as "\plus_two{ 2 add }" doesn't need its own stack space --it's just a snippet of code!

With that in mind, subroutines are allowed to define "script-local" subroutines which operate along a single thread of control within a named block. These are defined with a given ID, and called with an array-like syntax. They cannot be un-defined or overloaded, although theoretically they could be replaced (by re-defining them).

Defining a Script-Local Subroutine
Syntax `\[ID]{CODE}`	Example `\[5]{ 2 plus div mult sub }`	Comments Our "plus two" example from before, assigned to index 5.

Calling a Script-Local Subroutine
Syntax `[ID]`	Example `16 [5]`	Comments This pushes 16, and calls local block 5. The result is a stack containing the number 18, if we define [5] as listed above.

Local Parameters & Return Values

An astute reader will have noticed that scripts currently have no way of passing values to other scripts. Calling "5 6 my_func" won't work, because my_func will get its own independent stack. This problem is circumvented through the use of local parameters, which also provide a slight speedup for small, frequently called subroutines. The local parameters are accessed by ID, with -1 a special ID that refers to the script's return value.

Executing @[5] will "pop the (parent) stack to local parameter 5"
Executing [5]@ will "push local parameter 5 onto the (current) stack

For the return value, it's backwards:

Executing @[-1] will "pop the (current) stack to the return value"
Executing [-1]@ will "push the return value onto the (parent) stack", unless the return value is set to invalid (INT_MIN, for now).

The only point of interest is that last bit. The problem is, the HSpeak command "return" sets the return value, and "exitscript" will exit and return that value. Some scripts simply aren't intended to use their return values, which would leave an extra element on the stack. We could always just pop the return value, but we don't want to punish a highly modularized script with an un-necessary drag on performance. This is the alternative we came up with.

Comments

Henceforth supports comments, although it's important to realize that Formats B and HF will drop your comments; they can't be encoded. A comment begins with a hash mark (#) and ends with a line feed.

Global Variables

All that's left are global variables, which are manipulated with syntax like @[65.G] and [65.G]@. We realize this is a bit awkward, and we're open to a cleaner syntax (once we actually get a release going).

The Three Formats of Henceforth

As mentioned, Henceforth source can appear in three different formats: T, B, and HF. Here's a side-by-side comparison of the three.

	Type	Use
Format T	text	All human interaction (coding, discussing ideas, etc.) occurs in this format.
Format B	bytecode (binary)	The HVM executes Format B code. The HSP2HF cross-compiler outputs this format.
Format HF	compressed	The XRPG specification requires all Format B code to be compressed into format HF lumps, which are then stored locally on the user's phone.

HFB

FMF:HFB

The Cross-Compiler

Fortunately, most of you will never have to touch Henceforth if you don't want to. (But who wouldn't want to?) The HSP2HF cross-compiler is used to automatically convert your HamsterSpeak into Henceforth bytecode. It is part of the RPG2XRPG conversion process.

How the Cross-Compiler Works: Motivations

Some parts of the OHRRPGCE FMF project are slightly incompatible with the standard OHRRPGCE, but for scripting this is simply unacceptable. Unlike things like, say, a slightly jittery enemy HP value -which is immediately apparent when you first load your game on a phone- a silent error in a script is worth hours of headache-inducing debugging, and probably not worth anything at all in the end.

So, the goal of the cross-compiler is script-level compatibility. Efficiency and conciseness, while important, take a backseat to this driving need.

How the Cross-Compiler Works: Naive Compiler

The Henceforth cross-compiler benefits from HamsterSpeak's tree-like structure; a naive conversion can simply convert each node to a local Henceforth function. Consider the following script (from Wandering Hamster, loaded in the HSP2HF utility)

Typical to most HSZ scripts, "setnpcspeed" contains a head "do" node. This node happens to contain only one element, which takes three arguments, each of which is a simple number or variable. Clicking "cross-compile" will invoke the naive converter, which produces the following code:

 \LOC14{
   1
   L[]@
 }
 \LOC12{
   3
 }
 \LOC10{
   0
   L[]@
 }
 \LOC4{
   \LOC10()
   \LOC12()
   \LOC14()
   \BLT78()
 }
 
 #Init local variables
 @L[1]
 @L[0]
 
 #Main script loop
 do_start
 \LOC4()
 do_end

Let's start from the local variables section:

 @L[1]

This is a shorthand syntax; it basically calls "local store", a function deep within the script interpreter which does something like this:

 void localStore(int arg) {
   local_variables[arg] = pop_parent();
 }

Next, we have the main loop. The "do_start" and "do_end" primitives are there to help the "break" and "continue" primitives to function properly. The meat of the main loop is the call to:

 \LOC4()

...which is simply a function call. We use the back-slash to emphasize that LOC4() is stored locally, but this syntax will likely be revised, since HF treats functions, primitives, and built-ins nearly the same way.

The {} syntax to define functions will likely remain, so the following code defines LOC4()

 \LOC4{
   \LOC10()
   \LOC12()
   \LOC14()
   \BLT78()
 }

...as three local calls(LOC10, LOC12, and LOC14) and one built-in function call (BLT78, alterNPC). The remaining local functions are equally easy to understand. For example, "1 L[]@" pushes 1, and then calls "local load", defined internally as:

 void localLoad() {
   int arg1 = pop();
   push(local_variables[arg1]);
 }

How the Cross-Compiler Works: Reasonable Inlining

Simple functions like "setnpcspeed" are very easy to inline, just by copying the leaf nodes' source into their parents. The previous script can be re-written as:

 #Init local variables
 @[1]
 @[0]
 
 #Main script loop
 do_start
 0
 L[]@
 3
 1
 L[]@
 \BLT78()
 do_end

...which is much more concise. Due to the nature of HF bytecode, inlining usually improves both performance and storage efficiency. (When I have time to profile, I hope to collect some facts to back this up.) However, inlining everything is often either impossible or unwise, which is why one needs a policy for inlining. At the moment, the OHRRPGCE FMF's cross-compiler uses the following algorithm to determine what to inline:

 1) Start by doing a naive conversion, and retain the tree structure. 
    Mark all nodes as "not dead" and with a "count" of 1.
 2) Loop until all nodes are inlined or dead. At each iteration, do the following.
 3) Determine which nodes are leaf nodes. (Leaf nodes have no children, or only dead children).
    a) If this node cannot be inlined (e.g., it's self-referential, or checks b/c/d/e below fail), 
       mark it as "dead".
    b) If this node's count is "1", inline it. (Copy its corresponding source into any node that 
       references it, incrementing their "count" value by 1, and then delete it.)
    c) If this node's count is "2", and it is referenced 8 times or less, inline it.
    d) If this node's count is "3", and it is referenced 4 times or less, inline it.
    e) If this node's count is "4" or more, only inline it if it is referenced exactly once.

We are still discussing what makes a node impossible to inline. Technically, the problem is difficult, but Hamsterspeak byte-code is fairly structured in nature, which means we can probably define a few simple criteria for exclusion.

Primitives & HSPEAK->HF Snippits

The cross-compiler inserts snippets of HF code when it encounters an HSPEAK node of a given type. For example, at node 10, given a "number" node with value 75, it inserts:

 \loc_10 {
   75
 }

Henceforth, we shall refer to a node of ID n as: \loc_n {} --this allows us to generalize HSPEAK nodes into simple templates. Here are the snippets used by the cross-compiler; we repeat numbers for the sake of completeness:

Numbers
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
number	value		\loc_n { value }

Do Loops
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
flow	do	node_x node_y ...	\loc_n { do{ loc_x loc_y ... } }

If Statements
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
flow	if	conditional_x then_y else_z	\loc_n{ loc_x if { loc_y } else { loc_z } }

Then/Else Loops
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
flow	then/else	node_x node_y ...	\loc_n { loc_x loc_y ... }

Break/Continue
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
flow	command	amount if amount==1 then: skip amount else: append "_x" to command	\loc_n { [amount] command }

Returning
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]	Henceforth Snippet
flow	return	value	\loc_n { value @[-1] }
flow	exitscript At a given depth		\loc_n { invalid @[-1] depth break_x }
flow	exitscript At a given depth	value	\loc_n { value @[-1] depth break_x }

While/For
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]	Henceforth Snippet
flow	while	conditional_x do_y	\loc_n { do { loc_x not if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } } }
flow	for	count_id_x counter_start_s count_end_e counter_step_w do_y	#Template: -2 4 set_var will set local variable 1 to value 4 \set_var { swap dup 0 lt if { 1 add -1 mult @[] } else { @[.G] } } #Template: 2 get_var will return the contents of global variable 2 \get_var { dup 0 lt if { 1 add -1 mult []@ } else { [.G]@ } } \loc_n { loc_x loc_s set_var do { loc_w 0 gt loc_x get_var loc_e gt xor not loc_x get_var loc_e neq and if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } loc_x get_var loc_w add } }
Note 1: The block inline_y simply unrolls the do_y block into the body of loc_n. This is done so that break* and continue will function properly.* Note 2: The upshot of this is that do_y will be instantly culled from the source, unless another node references it (which would be a bit of a hack, in my opinion. Regardless, this is fine, and will not affect program validity in any way.

Switch
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]
flow	switch	???	This is not yet documented in HSZ, so we will deal with it later.

Variable Access
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]	Henceforth Snippet
global	variable_x		\loc_n { loc_x [.G]@ }
local	variable_x		\loc_n { loc_x []@ }

Math Functions
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]	Henceforth Snippet
math	set_variable	lhs_l rhs_r	\loc_n { loc_l loc_r set_var # As defined above }
math	increment_variable	lhs_l rhs_r	\loc_n { loc_l get_var # As defined above loc_r add set_var # As defined above }
math	decrement_variable	lhs_l rhs_r	\loc_n { loc_l get_var # As defined above loc_r sub set_var # As defined above }
math	not	lhs_l	\loc_n { loc_l not }
math	operand If the operand is listed above, use that code block, not this one.	lhs_l rhs_r	\loc_n { loc_l loc_r operand }

Built-In and User-Defined Functions
HSpeak Parameters			Henceforth Snippet
Kind	ID	args[]	Henceforth Snippet
built-in	func_id_x		\loc_n { hspeak_api_call_x }
user-script	func_id_x		\loc_n { user_script_call_x }

Final Considerations

(later)

This article or section is not complete. If you wish, you may finish it, and remove this tag when you are done.

Revision as of 09:20, 15 December 2008 (edit) Sorlok reaves (talk \| contribs) m (→‎WIP: Moving HFB text to HFB lump page.... will re-write later.) ← Older edit		Revision as of 09:21, 15 December 2008 (edit) (undo) Sorlok reaves (talk \| contribs) m (→‎typo) Newer edit →
Line 762:		Line 762:


			===HFB===
	[[FMF:HFB]]		[[FMF:HFB]]

Henceforth VM: Difference between revisions

Revision as of 09:21, 15 December 2008

Contents

Acknowledgements

Introduction to Henceforth

A Henceforth Primer

The "Pure" Virtual Machine

Where the HVM Fits Into the Game Loop

Virtual Hardware Specification

VM Data Structures

What These Do: Top-Down Description

What These Do: Bottom-Up Description

Some Technical Considerations

HF Source Code & Formats

Henceforth Syntax Dictionary

Numbers

Primitives

Blocks (Subroutines)

Variables

Do/If & Loops

HSpeak API Calls & User Scripts

Strings

Script-Local Subroutines

Local Parameters & Return Values

Comments

Global Variables

The Three Formats of Henceforth

HFB

The Cross-Compiler

How the Cross-Compiler Works: Motivations

How the Cross-Compiler Works: Naive Compiler

How the Cross-Compiler Works: Reasonable Inlining

Primitives & HSPEAK->HF Snippits

Final Considerations

Navigation menu

Henceforth VM: Difference between revisions

Revision as of 09:21, 15 December 2008

Acknowledgements

Introduction to Henceforth

A Henceforth Primer

The "Pure" Virtual Machine

Where the HVM Fits Into the Game Loop

Virtual Hardware Specification

VM Data Structures

What These Do: Top-Down Description

What These Do: Bottom-Up Description

Some Technical Considerations

HF Source Code & Formats

Henceforth Syntax Dictionary

Numbers

Primitives

Blocks (Subroutines)

Variables

Do/If & Loops

HSpeak API Calls & User Scripts

Strings

Script-Local Subroutines

Local Parameters & Return Values

Comments

Global Variables

The Three Formats of Henceforth

HFB

The Cross-Compiler

How the Cross-Compiler Works: Motivations

How the Cross-Compiler Works: Naive Compiler

How the Cross-Compiler Works: Reasonable Inlining

Primitives & HSPEAK->HF Snippits

Final Considerations

Navigation menu

Search