Henceforth VM: Difference between revisions

From OHRRPGCE-Wiki
Jump to navigation Jump to search
Line 762: Line 762:




 
===HFB===
[[FMF:HFB]]
[[FMF:HFB]]



Revision as of 09:21, 15 December 2008

Mobile-phone.png
This article is about the OHRRPGCE FMF project, which is an alternate implementation of the OHRRPGCE for Java mobile phones. Technical implementation details discussed here should not be confused with those of the RPG format

The Henceforth Virtual Machine (HVM, Henceforth VM) is used to dynamically interpret Henceforth (HF, Hamsterforth) scripts. The OHRRPGCE FMF does not use Hamsterspeak's traditional HSX/HSZ files; rather, it opts for a stack-based alternative. Henceforth files are stored with the .HF extension.


Warning: This document is complete, except for the "final considerations" section, which will be trivial to add later.
We are currently in the process of reviewing the entire design specification, writing a toy interpreter, and polishing up (and segmenting) this document.
We'll gladly take your suggestions on the "Discussion" page, but please do not modify, distribute, or trust this document. It will soon be as open to the community as everything else on this Wiki, but it'll take a few weeks to do the final review.

Thanks~
/Seth


Acknowledgements

A big thanks to both James and Ralph for their well-founded suggestions regarding the HVM, and to Mike for clarifying countless issues. If you start reading into the minute details of the HVM, you'll find that a whole lot of it is lean, clean, and sensible. This is almost entirely due to James & Ralph's suggestions and experience, which I truly appreciate.


Introduction to Henceforth

Henceforth is a simple, stack-based programming language which the OHRRPGCE FMF uses as a crossover language from traditional Hamsterspeak. Like all of the lumps exclusive to this project, .HF lumps containing Henceforth code are cross-compiled automatically for you when you convert your game to XRPG Format. However, all developers should consider learning basic Henceforth syntax to aid them in writing run-time debug scripts and compile-time optimized versions of their most time-critical Hamsterspeak scripts.

Henceforth is a stack-based language, designed to obviate register allocation problems and minimize script size. You should read James's excellent analogy of a stack, if you aren't familiar with the concept. Hamsterspeak uses the stack for ordering scripts; Henceforth, however, uses the stack for storing data.


A Henceforth Primer

Although Henceforth is not a compiled language, you can pretend that "source" Henceforth is compiled into "binary" HFZ lumps, which the game engine runs, and only one person will cringe and correct you. So let's have a look at some Henceforth source code, shall we? Any text after a hash mark until the next newline is considered a comment, and whitespace is used to separate commands. Consider the following script.

   #One way to separate commands is by hitting "Enter" after each command.
   4        #Push 4
   5        #Push 5
   add      #Pop the top two elements and add them. Push the result.
   @result  #Pop the top element and store it in the variable "result".
   
   #Another way to separate commands is using spaces.
   4   5   add  @result

Most programmers will use a mix of spaces and newlines to keep their scripts clear and sensible. For example, "4 5 add" should probably always be on the same line. Regardless of what format you use, manipulation of the stack is the same. The following diagram shows how the stack (and the variables list) changes as the HVM reads each command, from left to right.

Fmf stack diagram.png


The "Pure" Virtual Machine

From a theoretical point of view, Henceforth does not require a location to store variables. Rather, we could simply add block labels and a delayed execution (prepending) command:

  • Surrounding some source with \name{} will label that block with "name"; if name later appears as a command, this block will be inserted in-place. Later writing /name will un-define "name". If multiple blocks are labeled similarly, the most-recent label is considered valid.
  • Pre-pending "&" to a command will push the result of that command into the source code itself, at the time that command is first encountered.

Block labels allow functions to be written very naturally:

   #The "modulo" function expects two arguments to be
   # on the stack; it transforms (X  Y)  to (X%Y)
   \modulo {
    swap  dup  rot  dup  rot  swap     #Now we have (X Y X Y)
    div  mult  sub                     #The heart of our algorithm
   }

There's no need to "return" anything; the labeled block operates in a way that leaves one item on the stack. Here's some code to use this block as a function:

   #We want to compute 7%3
   7  3  modulo    #After this, "1" is left on the stack

This is almost the exact syntax for writing a function in Henceforth. Now, let's cover prepending:

   #Avoiding variables with named blocks.
   4   5   add  
   \result{       #Label the start of the "result" block
     &pop         #This is tricky: the first time we encounter this
                  # (i.e., when "\result" is defined), it executes "pop",
                  # resulting in "9", which is pushed into the source.
   }              #Closes the "\result" label.
   
   result         #Calling "result" will now push "9".

Unlike named blocks, prepending is not actually implemented in Henceforth; you can ignore that last bit of code if you don't understand it. Which brings us to our main point:

Henceforth was designed around a "pure" VM, but it was then modified to run efficiently on a phone, and to allow an easy mapping from existing Hamsterspeak scripts.

It is crucial that you keep the ideal virtual machine in mind, or else you will start asking about "Henceforth's registers" (which don't exist) or you'll start using variables way too much. You also need to be cognizant of the dirty details required to kludge Hamsterspeak onto a phone; otherwise, things like script-level subroutines and our extravagant caching mechanism will seem redundant and senseless.

(Note to readers: At this point, if you're only interested in the Henceforth language, you can skip down to the section on Henceforth Syntax.)


Where the HVM Fits Into the Game Loop

Let's pretend that on a certain map in your game, a certain NPC is set up to "run script 4" when the user walks up to it and presses "Accept". Upon pressing "Accept", the game engine checks if this NPC has a script ID assigned to its "accept" behavior. The classic OHR performs this same task. If one exists, it is initialized and cached. At this point, the game engine continues with its tasks for that cycle (e.g., message boxes, vehicles). This (including user input) constitutes the "Update" phase of the game loop. Following this is the "Paint" phase, which shows one frame's worth of updates to the user. After painting, the tick count is incremented, and the loop repeats. The first thing that happens then is the "scripts" phase, during which scripts are actually executed (see below). The loop repeats indefinitely.

Hf flowchart.png


Virtual Hardware Specification

The Wikipedia page for Virtual Machines focuses on their use in emulation. In fact, there is no hardware version of the HVM, just like there is no hardware equivalent of the Java Virtual Machine. Rather, Henceforth defines an unambiguous virtual hardware specification which can be ported to any platform with guaranteed interoperability. Currently, the only platform with a working HVM is the OHRRPGCEFMF, due to the bias of the HVM's design towards mobile devices.

Anything in the world can be described by its nouns and verbs --the things it has and what they do. So let's inspect the pieces of the HVM and how they interact.


VM Data Structures

Fmf hvm data structures.png

Green and red lines in this diagram indicate references from one item to another, although they are not drawn for every single reference (because that would be craaaaaaazy).


What These Do: Top-Down Description

One way of learning how these data structures work is by discussing what each piece does on its own, and then assuming their combination. This is like studying one of those apple machines; we say that this piece pierces the apple and holds it, that piece slices the skin off, and that blade cores the apple.

Here's the lowdown on each data structure and its elements:


Global Variable Table Contains global variables indexed by ID.
ID ID of the variable (implicit).
Value Value of the variable (signed integer).


Running Scripts Table Contains a script UDT and related data for any script which is held in the HVM (running or waiting).
Script The script UDT for a running script.
Wait True if the script is not ready to continue execution; false otherwise.
Wait On A series of bitflags signifying what this script is waiting on, concatenated to the number of cycles left to wait, for wait_cycles. The flags are:

   wait_cycles
    wait_hero
    wait_npcs
    wait_anykey
    wait_pan
    wait_textbox
Note that, in some cases, these flags can all be zero but the script may still be waiting on something.


Script UDT Wraps several fields necessary for a running script.
Local Stack With the exception of the "push/pop local" commands, all stack operations will take place on the local stack, allowing for easy multitasking. Conceptually "on top" of the local variable array.
Program Counter (PC) Current index into the "Bytecode" array (see: "Script Cache Pointer") of the bytecode being executed. At the end of every cycle, the PC is incremented by 1.
ID Index of this script in the "Script Lookup Table".
Script Cache Pointer A reference to this script's cached data in the "Script Cache" table. If this is null (or -1), this script UDT's cached data has somehow been lost, and must be re-loaded.
Parent Script Pointer A reference to the script which called this script. When this script terminates, it will "wake" the parent script.
Return Address Stack When a script calls a local subroutine, it pushes the current PC to this stack, and pops it upon returning.
Local Variable Array Stores the first seven parameters and the return value for a given script. Push and pop operations on these affect the stack of the calling script. Indexing this array at -1 will push an effective "return value", to use standard terminology. Could theoretically be used to clobber the stack (but please don't).
Local Variable Hash Local variables after seven are stored here, and the same push/pop behavior applies. Named variables such as "x" or "test" are also stored here, through the use of a hash table. Although stack-minded gurus detest local stores, they are allowed into Henceforth for the purpose of clarity when debugging a script at run-time.
Do/It Stack A stack of do UDT entries. One is pushed each time a dostart or ifstart imperative is encountered, and popped when the corresponding doend or ifend is found.
Local Function Start Indices Each script can define local subroutines, which are usually numbered starting from zero, increasing by one monotonically. For example, "\[2] { dup }" defines a local subroutine with ID 2 that serves to duplicate the top of the stack. The interpreter caches the start indices of these script-local subroutines in a small table for fast lookup.
String Table Contains a list of Unicode-encoded strings, ordered by ID, for use by this script.


Do UDT A pair of integers used to manage do loops and if statement.
Start Index The value of the PC when a do or if statement is encountered will be unique within a script.
Else Index Contains the index of "else_start" in an if block. (See below). When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur. -2 indicates that this is a "do" block, not an "if block"; any other number is a valid index in the bytecode array.
End Index Given a do loop's start index, the break command will jump to the next "doend" imperative (taking care to discard any nested "doend"s). Since bytecode is never modified at run-time, this scanning only needs to happen once; the PC index of the "doend" statement can be cached for later. If statements follow the same rule. When an entry is pulled from the cache, an end/else index of -1 implies that scanning must occur; any other number is a valid index in the bytecode array.


Script Lookup Table A full list of all static scripts contains additional data to help retrieve script source from the cache or from the file system.
ID This field is implicit; all scripts from ID 0 to ID N appear in order in this table. If auto-numbered scripts are used, the Henceforth cross-compiler will renumber them so that there are no gaps in this table.
Name The name of this script is stored as a string. The Henceforth cross-compiler will always call a static script by its ID, but developers may choose to call a script by name when debugging.
Chunk Scripts are stored compressed, but compressing a 10-byte file saves you nothing. Hence, there are several scripts "chunked" together in a file. If this value is "5" for example, then the lump "SCRIPTS_5.HF" should be loaded. Please see the relevant documentation for information on how to extract a particular script from its chunk.
Cache A reference to this script's entry in the Script Cache, null or -1 if no entry exists. If several copies of a script exist, they pull only one entry into the cache, and checking this value in the Script Lookup Table is faster than scanning each entry in the Running Scripts Table. Scanning must still occur when an entry is ejected from the Script Cache, but phones with a reasonable amount of memory will rarely have to eject entries.


Script Name Multiplexer A table which holds definitions for named scripts (those defined at runtime).
Name Unlike static scripts, runtime scripts are identified by name alone. Stored in hash table, no two scripts may have the same name (see below).
Definitions A stack of script cache entries corresponding to each definition, stored by reference. If, say, the script "exp" is declared twice, the second entry takes precedence until it is undeclared. Calling a static script by ID can get around this, but that is a discouraged usage of script IDs.


Script Cache A table which contains large amounts of recycleable data, to allow our scripts to run on machines with very little memory. Also contains a few flags.
New Name Although probably un-necessary, we store the name of the script (or the empty string, for static scripts) to help with debugging.
ID Stores the ID of a static script, or -1 if this is a runtime script.
Pinned A flag: if true, this script cannot be removed from memory without throwing the HVM into an invalid state. An example of this would be a runtime-declared scrit. Theoretically, a static script that is constantly being reloaded from memory or a script that contains a massive Do Table might be manually pinned by the HVM.
In Use Set to zero initially; incremented by 1 each time a script references this entry. Decremented by 1 each time a script terminates. An entry with an in-use count of 0 is only removed from the Script Cache if more space is needed, since the Do Table is also cached.
Bytecode An array for the script's Format B bytecode. This is static data --after being defined once, it can only be re-loaded or re-defined, never modified.
Do Table A cache for this script's Do UDTs. When a Script UDT pops its do/it stack, the value is saved here; if the same loop or if statement is encountered again later, a considerable amount of scanning can be skipped.

What These Do: Bottom-Up Description

If you've ever seen an apple machine like the one just mentioned, you're probably aware that describing each element does absolutely nothing to help you understand how the thing works --you have to actually use it on an apple (preferably very slowly) to see how the magic is done. This is a bottom-up description: describe how certain parts interact at critical moments in the operation of the whole.

We will now list some critical operations performed by the HVM and how it uses the data structures to track these operations. Before reading this section, you should probably read the syntax dictionary, so that you have an idea of Henceforth's intended functionality. Otherwise, this section might not make a lot of sense.


Functional Primitives
Trigger

dup
swap
drop
over
rot
add
sub
mult
div
random
b_xor
b_and
eq
lt
not
and
xor

Response
  • Pop a pre-defined number of arguments
  • Perform the desired math/logic
  • Push the returned value, if applicable


Numbers
Trigger

The HVM encounters a "short integer" or a "full-width integer"

Response

Push that number to the "local stack" of the currently-running script.


Script-Local Subroutines
Trigger

The HVM encounters a "define local script" with id x.

Response
  • In the "local function start indices" array, at index x, insert (PC+1)
  • Advance the PC without executing commands until you reach an "end define".
  • Advance the PC by one.
Trigger

The HVM encounters a "call local script" with id x.

Response
  • Push (PC+1) to the "return address stack"
  • Set the PC to the "local function start indices" array value at index x.
Trigger

The HVM encounters an "end define" and the "return address stack" is not empty.

Response
  • Pop the "return address stack" and set the PC to this value.


Global Variables
Trigger

The HVM encounters a "retrieve global variable" with id x.

Response
  • Push the value of that variable from the Global Variable Table.
Trigger

The HVM encounters a "set global variable" with id x.

Response
  • Pop the stack.
  • Store the popped value at location x in the Global Variable Table.


Call Subroutines
Trigger

The HVM, while running a script with id y in the Running Scripts Table, encounters a "call user-defined function by id" with id x.

Response
  • Retrieve a new entry in the script cache
    • Get entry x from the script lookup table; if "Cache" == -1, then load the source from the appropriate "Chunk" and add a new entry to the script_cache:
      • New Name = null
      • ID = x
      • Pinned = false
      • In Use = 0
      • Bytecode = loadFromChunk()
      • Do Table = new Array[do_udt]
    • For the appropriate script cache entry, set InUse = true
  • Make a new Script UDT:
    • Local Stack = new Stack[]
    • Program Counter = 0
    • ID = x
    • Script Cache Pointer = id of the script cache entry we retrieved
    • Parent Script Pointer = y
    • Return Address Stack = new Stack[]
    • Local Variable Array = new Array[max+1]
    • Set Local Variable Array to -1 at index 0
    • Local Variable Hash = new Hash{}
    • Do/It Stack = new Stack[do_UDT]
    • Local Function Start Indices = new Array[]*
    • String Table = new Array[]*
    • *Note: The HVM can leave "optional" components like this un-initialized until they are actually used.
  • Make a new entry in the Running Scripts table:
    • Script = our new Script UDT
    • Wait = false
    • Wait On = 0
  • Insert this new Running Scripts entry one step ahead of script Y; this ensures it will be run next.
  • Set script Y's "Wait" flag to "true".
Trigger

The HVM, while running a script with id y in the Running Scripts Table, encounters a "call named subroutine" with name exp.

Response
  • The same as the previous call, with the following exceptions:
    • Load from the script cache the last definition in the script name multiplexer for exp
    • script_udt.id = -1
Trigger

The HVM encounters a "hamsterspeak api call" with id x.

Response
  • Pop the agreed-upon number of arguments
  • Call API function x with said arguments
  • Push the return variable, if applicable


Local Variables
Trigger

The HVM encounters a "push local variable by id" with id x.

Response
  • Pop the stack
  • Store the popped value in script_udt.local_variable_array[x+1]
Trigger

The HVM encounters a "pop local variable by id" with id x.

Response
  • Retrieve the value in script_udt.local_variable_array[x+1]
  • Push the retrieved value to the stack
Trigger

The HVM encounters a "push local variable by name" with name test.

Response
  • Pop the stack
  • Create a key in script_udt.local_variable_hash for "test"
  • Store the popped value in script_udt.local_variable_hash{"test"}
Trigger

The HVM encounters a "pop local variable by name" with name test.

Response
  • If the local variable hash has no key at "test", push -1 and return
  • Retrieve the value in script_udt.local_variable_hash{"test"}
  • Push the retrieved value to the stack


If Statements
Trigger

The HVM encounters an "if_start"

Response
  • Push a new do_udt with startIndex = PC to the do/it stack
  • If the script_cache entry for this script contains a do_udt with the same startIndex, set the elseIndex of this do_udt to that value. Else, set it to -1. Do the same for the endIndex.
  • Pop the stack. If this value is not 0, increment the PC. If 0, then set the PC to the elseIndex. (If the elseIndex is -1, scan & dispose bytecodes until the else_start command is found, and then update the script_cache.) Increment the PC.
Trigger

The HVM encounters an "if_end"

Response
  • Pop the do/if stack. If that entry's endIndex is -1, set endIndex = PC and update the script_cache.
Trigger

The HVM encounters an "else_start" (This will only happen if the else branch is not taken)

Response
  • Pop the do/if stack. If that entry's elseIndex is -1, set elseIndex = PC and update the script_cache.
  • Set the PC = endIndex. (If endIndex is -1, scan & dispose bytecodes until the if_end command is found, and then update the script cache.) Increment the PC


Loops & Flow Control
Trigger

The HVM encounters a "do_start"

Response
  • Push a new do_udt with startIndex = PC to the do/it stack
  • If the script_cache entry for this script contains a do_udt with the same startIndex, set the endIndex of this do_udt to that value. Else, set it to -1. Set elseIndex = -2.
  • Increment the PC
Trigger

The HVM encounters a "do_end"

Response
  • Pop the do/if stack. If that entry's endIndex is -1, set endIndex = PC and update the script_cache.
Trigger

The HVM encounters a "break" or "break_x"

Response
  • Set x = 1 if this is a "break" command, or x = stack.pop() if this is "break_x"
  • while x > 0, do the following:
  • Pop the do/if stack until you encounter a do_udt with elseIndex == -2. Pop that as well.
  • Set PC = endIndex of this do_udt. (If endIndex is -1, scan & discard bytecodes until a do_end is found. Then update the script_cache.) Increment the PC.
  • Decrement x
Trigger

The HVM encounters a "continue" or "continue_x"

Response
  • Set x = 1 if this is a "continue" command, or x = stack.pop() if this is "continue_x"
  • Pop the do/if stack until you have removed x do_udts with elseIndex == -2. Push the last entry back to the do/if stack.
  • Set PC = do/if_stack.top().startIndex + 1


Subroutines & Strings
Trigger

The HVM encounters a "string define" with string test string.

Response
  • Push test string to the end of the string table
Trigger

The HVM encounters a "begin define" with string my_function.

Response
  • Look for my_function in the script name multiplexer. If it's not there, add it and set definitions = new Array[int]
  • Create a new entry in the script_cache:
    • New Name = "my_function"
    • ID = -1
    • Pinned = true
    • In Use = 0
    • Do Table = new Array[do_UDT]
  • Scan all bytecodes up to and including an end_define into the "Bytecode" field for this new script cache entry.
  • Push into script_name_multiplexer{"my_function"} the id of the script cache entry you just created.
Trigger

The HVM encounters an "un-define named subroutine" with string my_function.

Response
  • Pop the last definition from the script_name_multiplexer{"my_function"} into variable x.
  • If script_name_multiplexer{"my_function"} has no more definitions, remove it.
  • Set script_cache[x] to be a null row; remove it from the array if you can do so without introducing inconsistencies. At the very least, de-allocate all its data.
Trigger

The HVM encounters an "end_define", or a "return" command in a user-defined script.

Response
  • If there is a return value in script_udt.local_variable_array[0], push it to the parent script's stack.
  • Set this script's parent's entry in the running scripts table to Wait = false
  • Remove this script from the running scripts table.
  • Decrement the in_use field for the script_cache entry for this script_udt
  • Delete this script_udt

Some Technical Considerations

When a script is triggered by any means, it suspends the currently-executing script; see "The Stack" in "How does plotscripting work?". This makes sense; if your script calls another script, one expects the new script to take over. It also means that if you step on several map tiles which each have (long) scripts, the last maptile's script will finish first. (Threading will probably change this.)

But wait, there's more! Several small but important details of script initialization include:

  1. If you trigger the same script twice (e.g., by stepping on two copies of an NPC one after the other) the script loader will detect that you are trying to load a script with the same id as the currently executing script. If the "Permit double-triggering of scripts" general bitset is on (or if the script is explicitly called by another script) then this is allowed, and a second instance of the script is initialized.
  2. When a script instance is first loaded, its "delay" counter is set to 2 ticks. This is not the same thing as instructing the script to "wait", because it remains the active script while delaying. If another script pre-empts this one, the delay counter does not decrement until the new script finishes. (When the new script finishes, the old one immediately resumes what it was doing, be it running or decrementing the "delay" counter.)
  3. To prepare for parallel scripts, the active script is pushed to the top of the "waiting" stack, and is manipulated by a reference ID. This way, all scripts are kept in one place, and GAME_FMF can just scan through this list once each cycle, updating all scripts until they enter a waiting state.

HF Source Code & Formats

Strictly speaking, Henceforth has three different formats for its source code, and there is no such thing as "compiled Henceforth". These three formats are Format T (text-based), Format B (bytecode), and Format HF (compressed). Practically, most user code is generated in Format T, which is then converted to Format B and compressed (to Format HF) for storage. The HVM reads all three formats, but in doing so it converts T & HF to B.

That said, we shall begin our study of Henceforth syntax by exhaustively reviewing all Henceforth commands in "source code" format (Format T), and then describe the mappings to other formats.

Henceforth source is a rather slim language. Numbers and strings are supported, but concatenations like "Hello"+" world" and decimals like 2.4 are not. A syntax exists for "do" and "if" statements, and for declaring functions and variables. Several smaller hacks exist for efficiency's sake, as described earlier. And that's it. Anything else is either a "primitive" like add, or a "block label", subject to the polymorphic lookup that characterizes Forth-like languages. Henceforth is not based explicitly on Classic Forth, but it hopes to learn from the mistakes of this peculiar family of languages. One of the main disadvantages to ANSI Forth is its pandering to C++ fixations, overcomplicating a language which, at its heart, always championed simplicity and elegance.


Henceforth Syntax Dictionary

Numbers

Numbers are represented by signed integers such as 2, 18, -35, or 43. When the HVM encounters a number, it pushes it to the stack. That's all for numbers, so let's cover stack effect diagrams briefly; they're necessary to understand more complicated features. Consider:

1 2 3 7 1 2 3 7

Note: Stack-effect diagrams are oriented with the top of the stack on the right. This means that the stack starts with the elements 1, 2, and 3 already in it, is then given the command "7", and the resulting stack is now 1, 2, 3, then 7. These is an example of the prototype style of learning; to teach you how to use a command, I give you a real-world example of how it works.

If I want to write a stack effect diagram without the fancy table, I might use parentheses and the "--" symbol to mean "the operator we're talking about". For example: "if the interpreter encounters -5, the resulting stack-effect diagram would be (1 2 3 -- 1 2 3 -5)".

And now we're way beyond numbers.


Primitives

When the HVM interprets a literal string like exp, mycode, or abs, it searches for the relevant subroutine. The primitives are exceptions to this; they are always executed immediately. A primitive consists of lowercase letters and underscores, and must contain at least one letter.

Here is a list of primitives, and their stack-effect diagrams:

Category Primitive Stack Effect Diagram Comments
Stack Manipulation dup ( 1 2 3 -- 1 2 3 3 ) "Duplicate" the top of the stack.
swap ( 1 2 3 -- 1 3 2 ) "Swap" the top two elements of the stack.
drop ( 1 2 3 -- 1 2 ) "Drop" the top element off the stack.
over ( 1 2 3 -- 1 2 3 2 ) Pull the penultimate element "over" the top element.
rot ( 1 2 3 4 -- 1 4 3 2 ) "Rotate" the top three elements of the stack.
Arithmetic add ( 1 2 3 -- 1 5 ) Addition
sub ( 7 3 2 -- 7 1 ) Subtraction
mult ( 1 2 3 -- 1 6 ) Multiplication
div ( 1 7 3 -- 1 2 ) Division
random ( 1 4 8 -- 1 6 ) Random Number
Bitwise Operators b_xor ( 1 52 37 -- 1 17 ) Exclusive Bitwise "Or"
b_and ( 1 52 37 -- 1 36 ) Bitwise "And"
Logical Operators eq ( 1 2 2 -- 1 1 ) Logical "Equals"
lt ( 1 5 2 -- 1 0 ) Logical "Less Than"
not ( 1 5 1 -- 1 5 0 ) Logical "Not"
and ( 1 5 1 1 -- 1 5 1 ) Logical "And"
xor ( 1 5 1 1 -- 1 5 0 ) Exclusive Logical "Or"

There is no boolean type; the number zero is considered "false", and any other number is considered "true". For clarity, the number "1" is the preferred way to represent "true".

A number of additional keywords such as break and continue will be discussed in the section on If statements and Loops.


Blocks (Subroutines)

Labeled blocks, also called subroutines, are the core of Henceforth's modularization. They are strikingly similar to primitives, except that they can be defined, undefined, and overloaded. Labels contain letters and underscores, just like primitives.

Defining a Subroutine
Syntax
  \NAME{CODE}
Example
  \mod{
    swap dup rot dup rot swap
    div mult sub
  }
Comments
  This creates a function called "mod", which performs a series of stack manipulations and basic arithmetic.


Calling a Subroutine
Syntax
  NAME
Example
  7 3 mod
Comments
  Traditionally, we push the "parameters" to the subroutine in increasing order, and expect it to remove these arguments (and replace them with a return value, if any) during the course of its operation.


Overriding a Subroutine
Syntax
  \NAME{CODE}
Example
  \mod{
    swap dup rot dup rot swap
    div mult sub
  }
 
  \mod{
    drop drop 0
  }
Comments
  After executing this snipped, "mod" will have two definitions. Calling "mod" will result in the latter being evoked, which simply clears the parameters and returns zero.


Un-defining a Subroutine
Syntax
  /NAME
Example
  /mod
Comments
  If we execute the code from "Overriding a Subroutine", and then execute this code, "mod" will be left with a single definition: the first one.


Any labeled block remains in effect even if the script which defined it finishes execution; make sure to un-define any subroutines you do not wish to linger. The HVM capitalizes on this behavior to limit the number of built-in primitives. The modulus operator described earlier is an example, as is "less than or equal to", which can be defined compositely. Here is a list of all the subroutines the HVM loads by default, with their stack-effect diagrams.


Category Subroutine Stack Effect Diagram Comments
Arithmetic exp ( 1 2 3 -- 1 8 ) Compute x raised to the y ("exponent")
mod ( 1 10 4 -- 1 2 ) Calculate the "modulus"
Bitwise Operators b_or ( 1 45 52 -- 1 61 ) Inclusive Bitwise "Or"
Logical Operators neq ( 1 45 52 -- 1 1 ) Logical "Not Equal To"
lte ( 1 45 45 -- 1 1 ) Logical "Less Than or Equal To"
gt ( 1 45 52 -- 1 0 ) Logical "Greater Than"
gte ( 1 63 45 -- 1 1 ) Logical "Greater Than or Equal To"
or ( 1 0 1 -- 1 1 ) Logical "Inclusive Or"


Variables

Shuffling the stack beyond the fourth element can get messy. As such, subroutines can allocate local variables which stay in scope as long as that script is running. Note that variables follow a similar naming convention to subroutines. They are accessed using the @VAR and VAR@ commands. You can remember them easily by putting an imaginary "stack" next to them:

  • (stack)@VAR takes a value from the stack and throws it at VAR. This is the same as setting VAR = pop().
  • VAR@(stack) takes the value in VAR and throws it at the stack. This is the same as pushing VAR.

Here is some sample source for the lte function, written using variables. Note that, in reality, lte only uses the stack. This is much more efficient for the HVM, which only initializes the local variable store if it detects your script trying to access it.


Using Variables
Code
  \lte{
    #Pop the stack twice to get our local variables.
    #We assume that we are testing @lhs <= @rhs
    @rhs  @lhs
  
    #Push our variables, do the first test (<) and
    # store it in @res1
    lhs@  rhs@  lt  @res1
  
    #Push our variables, do the second test (==) and
    # store it in @res2
    lhs@  rhs@  eq  @res2
  
    #Push our two tests and do an or test. Leave it on
    # the stack to be returned
    res1@  res2@  or
  }
Comments
  As an exercise, you should try to convert this to a stack-based alternative. It should end up more concise, and also faster.


Do/If & Loops

Most Forth-like languages have an "if" statement, and Henceforth is no exception. However, rather than encapsulating loops into labeled blocks (like Classic Forth), it adds another keyword: "do". This serves two purposes: first, it allows for better inlining than subroutines, and second, it provides a locus of control for jump commands. The second rationale is sometimes written as "otherwise, we'd have to use goto".

Inside of a do loop, the commands break and continue also take effect; you should read the HSZ documentation for an example of how these work conceptually. The mechanism for break and continue is different in the HVM, but the effect is the same.

The standard Hamsterspeak engine also allows breaking out of multiple loops at once. In order to keep the syntax clean in Henceforth, we introduce two bytecodes for that purpose: break_x and continue_x. These take the element from the top of the stack, and break out of that many loops. The break command is effectively equivalent to "1 break_x".

Thus, we add five more keywords to the list of Henceforth primitives: do, if, else, break, and continue. Brackets are used to denote code blocks where relevant. We will cover the syntax in simple examples. The first is a user-defined subroutine, "divide_safe", which avoids dividing by zero, and the second is an example of the Fibonacci sequence using "do" to achieve the effect of a "while" loop.

Save Divide-by-Zero Handling
Code
  \divide_safe{
    #Do a zero-check on the divisor
    dup  0  eq
    
    #Now, branch on that
    if{
      pop  pop  0
    }else{
      div
    }
  }
Comments
  Note that returning zero is just one way of handling the erroneous case. Also, take note of the two "pop" commands; these are required!


Fibonacci Numbers
Code
  \fibonacci{
    @count     #Must be >=1
    0  @fprev  #fib(n-1)
    1  @fnow   #fib(n)
    do {
      #Test-and-decrement
      count@  0  eq
      if {
        break
      }else{
        count@  -1  add  @count
      }
      
      #Iterate & Accumulate
      fprev@  fnow@   add
      fnow@   @fprev
      @fnow
      
      #Re-do the loop
      continue
    }
    
    fnow@
  }
Comments
  This subroutine fails given zero as input. Modify it by adding a simple if statement before the main loop.

This second example is far more interesting; it answers the question "How do I Loop in Henceforth?"

A Note About Recursion: Although Henceforth fully supports recursion, the HVM is not designed for heavily-recursive functions. When possible, please use an iterative solution.

HSpeak API Calls & User Scripts

The HVM has knowledge of most "built-in" functions that Hamsterspeak provides. These are called like any other subroutine, with one exception: most format converters create specialized subroutine calls out of them. In other words, if I write a call to wait_npc in my (Format T) code, the converter will create (Format B) source that points explicitly to the definition of wait_npc in the HVM; it is not possible to overload wait_npc. This breaks our promise of fully-interchangeable source formats, but it is hardly noticeable to most users, and keeps our global function lookup table from becoming too cluttered.

See FMF:Plotdict for a listing of all API calls and their syntax in Henceforth.

User-defined scripts are treated exactly like API calls; they are assigned IDs and called explicitly.


Strings

Strings in Classic Forth are handled by simply pushing a series of integers (for each letter) to the stack. Eventually, Henceforth may go this route, but for now the API calls to strings are so simple that we chose a much simpler implementation:

  • Strings are a series of letters between double-quotes, like "Hello World", "Going?", or "Unicode: \u1000". As hinted by the last example, the HVM supports Unicode.
  • When a string is encountered (usually at the top of a script) it is pushed to the String Table for that script. Strings are pushed in order, cannot be removed or easily manipulated, and have IDs in increasing order.

This is an ad hoc implementation of strings, until we get a clearer picture where the standard OHR's development is going with these ideas.

Using a String
Code
  "Hello world!"
  0 show_str
Comments
  Defining "Hello world!" will place this string at index zero; pushing 0 and then calling the API function show_str will reference that string explicitly.


Script-Local Subroutines

Here's where Henceforth starts to really get picky. When a Hamsterspeak script is converted to Henceforth, it is basically converted into a series of labeled blocks that call each other. Recalling that user-level "scripts" are effectively subroutines, the keen observer will note that a lot of resources are wasted supporting "subroutines" that are, actually, just glorified "goto" blocks. For example, a definition such as "\plus_two{ 2 add }" doesn't need its own stack space --it's just a snippet of code!

With that in mind, subroutines are allowed to define "script-local" subroutines which operate along a single thread of control within a named block. These are defined with a given ID, and called with an array-like syntax. They cannot be un-defined or overloaded, although theoretically they could be replaced (by re-defining them).

Defining a Script-Local Subroutine
Syntax
  \[ID]{CODE}
Example
  \[5]{
    2  plus
    div mult sub
  }
Comments
  Our "plus two" example from before, assigned to index 5.


Calling a Script-Local Subroutine
Syntax
  [ID]
Example
  16  [5]
Comments
  This pushes 16, and calls local block 5. The result is a stack containing the number 18, if we define [5] as listed above.


Local Parameters & Return Values

An astute reader will have noticed that scripts currently have no way of passing values to other scripts. Calling "5 6 my_func" won't work, because my_func will get its own independent stack. This problem is circumvented through the use of local parameters, which also provide a slight speedup for small, frequently called subroutines. The local parameters are accessed by ID, with -1 a special ID that refers to the script's return value.

  • Executing @[5] will "pop the (parent) stack to local parameter 5"
  • Executing [5]@ will "push local parameter 5 onto the (current) stack

For the return value, it's backwards:

  • Executing @[-1] will "pop the (current) stack to the return value"
  • Executing [-1]@ will "push the return value onto the (parent) stack", unless the return value is set to invalid (INT_MIN, for now).

The only point of interest is that last bit. The problem is, the HSpeak command "return" sets the return value, and "exitscript" will exit and return that value. Some scripts simply aren't intended to use their return values, which would leave an extra element on the stack. We could always just pop the return value, but we don't want to punish a highly modularized script with an un-necessary drag on performance. This is the alternative we came up with.

Comments

Henceforth supports comments, although it's important to realize that Formats B and HF will drop your comments; they can't be encoded. A comment begins with a hash mark (#) and ends with a line feed.


Global Variables

All that's left are global variables, which are manipulated with syntax like @[65.G] and [65.G]@. We realize this is a bit awkward, and we're open to a cleaner syntax (once we actually get a release going).


The Three Formats of Henceforth

As mentioned, Henceforth source can appear in three different formats: T, B, and HF. Here's a side-by-side comparison of the three.

  Type Use
Format T text All human interaction (coding, discussing ideas, etc.) occurs in this format.
Format B bytecode (binary) The HVM executes Format B code. The HSP2HF cross-compiler outputs this format.
Format HF compressed The XRPG specification requires all Format B code to be compressed into format HF lumps, which are then stored locally on the user's phone.


HFB

FMF:HFB

The Cross-Compiler

Fortunately, most of you will never have to touch Henceforth if you don't want to. (But who wouldn't want to?) The HSP2HF cross-compiler is used to automatically convert your HamsterSpeak into Henceforth bytecode. It is part of the RPG2XRPG conversion process.

How the Cross-Compiler Works: Motivations

Some parts of the OHRRPGCE FMF project are slightly incompatible with the standard OHRRPGCE, but for scripting this is simply unacceptable. Unlike things like, say, a slightly jittery enemy HP value -which is immediately apparent when you first load your game on a phone- a silent error in a script is worth hours of headache-inducing debugging, and probably not worth anything at all in the end.

So, the goal of the cross-compiler is script-level compatibility. Efficiency and conciseness, while important, take a backseat to this driving need.

How the Cross-Compiler Works: Naive Compiler

The Henceforth cross-compiler benefits from HamsterSpeak's tree-like structure; a naive conversion can simply convert each node to a local Henceforth function. Consider the following script (from Wandering Hamster, loaded in the HSP2HF utility)

FMF cross compiler screenshot.png

Typical to most HSZ scripts, "setnpcspeed" contains a head "do" node. This node happens to contain only one element, which takes three arguments, each of which is a simple number or variable. Clicking "cross-compile" will invoke the naive converter, which produces the following code:

 \LOC14{
   1
   L[]@
 }
 \LOC12{
   3
 }
 \LOC10{
   0
   L[]@
 }
 \LOC4{
   \LOC10()
   \LOC12()
   \LOC14()
   \BLT78()
 }
 
 #Init local variables
 @L[1]
 @L[0]
 
 #Main script loop
 do_start
 \LOC4()
 do_end

Let's start from the local variables section:

 @L[1]

This is a shorthand syntax; it basically calls "local store", a function deep within the script interpreter which does something like this:

 void localStore(int arg) {
   local_variables[arg] = pop_parent();
 }

Next, we have the main loop. The "do_start" and "do_end" primitives are there to help the "break" and "continue" primitives to function properly. The meat of the main loop is the call to:

 \LOC4()

...which is simply a function call. We use the back-slash to emphasize that LOC4() is stored locally, but this syntax will likely be revised, since HF treats functions, primitives, and built-ins nearly the same way.

The {} syntax to define functions will likely remain, so the following code defines LOC4()

 \LOC4{
   \LOC10()
   \LOC12()
   \LOC14()
   \BLT78()
 }

...as three local calls(LOC10, LOC12, and LOC14) and one built-in function call (BLT78, alterNPC). The remaining local functions are equally easy to understand. For example, "1 L[]@" pushes 1, and then calls "local load", defined internally as:

 void localLoad() {
   int arg1 = pop();
   push(local_variables[arg1]);
 }


How the Cross-Compiler Works: Reasonable Inlining

Simple functions like "setnpcspeed" are very easy to inline, just by copying the leaf nodes' source into their parents. The previous script can be re-written as:

 #Init local variables
 @[1]
 @[0]
 
 #Main script loop
 do_start
 0
 L[]@
 3
 1
 L[]@
 \BLT78()
 do_end

...which is much more concise. Due to the nature of HF bytecode, inlining usually improves both performance and storage efficiency. (When I have time to profile, I hope to collect some facts to back this up.) However, inlining everything is often either impossible or unwise, which is why one needs a policy for inlining. At the moment, the OHRRPGCE FMF's cross-compiler uses the following algorithm to determine what to inline:

 1) Start by doing a naive conversion, and retain the tree structure. 
    Mark all nodes as "not dead" and with a "count" of 1.
 2) Loop until all nodes are inlined or dead. At each iteration, do the following.
 3) Determine which nodes are leaf nodes. (Leaf nodes have no children, or only dead children).
    a) If this node cannot be inlined (e.g., it's self-referential, or checks b/c/d/e below fail), 
       mark it as "dead".
    b) If this node's count is "1", inline it. (Copy its corresponding source into any node that 
       references it, incrementing their "count" value by 1, and then delete it.)
    c) If this node's count is "2", and it is referenced 8 times or less, inline it.
    d) If this node's count is "3", and it is referenced 4 times or less, inline it.
    e) If this node's count is "4" or more, only inline it if it is referenced exactly once.

We are still discussing what makes a node impossible to inline. Technically, the problem is difficult, but Hamsterspeak byte-code is fairly structured in nature, which means we can probably define a few simple criteria for exclusion.


Primitives & HSPEAK->HF Snippits

The cross-compiler inserts snippets of HF code when it encounters an HSPEAK node of a given type. For example, at node 10, given a "number" node with value 75, it inserts:

 \loc_10 {
   75
 }

Henceforth, we shall refer to a node of ID n as: \loc_n {} --this allows us to generalize HSPEAK nodes into simple templates. Here are the snippets used by the cross-compiler; we repeat numbers for the sake of completeness:

Numbers
HSpeak Parameters Henceforth Snippet
Kind ID args[]
number value   \loc_n {
  value
}


Do Loops
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow do node_x
node_y
...
\loc_n {
  do{
    loc_x
    loc_y
    ...
  }
}


If Statements
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow if conditional_x
then_y
else_z

\loc_n{
  loc_x
  if {
    loc_y
  } else {
    loc_z
  }
}


Then/Else Loops
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow then/else node_x
node_y
...
\loc_n {
  loc_x
  loc_y
  ...
}


Break/Continue
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow command amount
if amount==1 then:
  skip amount
else:
  append "_x" to command
\loc_n {
  [amount]
  command
}


Returning
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow return value \loc_n {
  value
  @[-1]
}
flow exitscript
At a given depth
  \loc_n {
  invalid
  @[-1]
  depth
  break_x
}
flow exitscript
At a given depth
value \loc_n {
  value
  @[-1]
  depth
  break_x
}


While/For
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow while conditional_x
do_y
\loc_n {
  do {
    loc_x
    not if {
      break
    }
    inline_y{
      y_command_1
      y_command_2
      ...
      y_command_z
    }
  }
}
flow for count_id_x
counter_start_s
count_end_e
counter_step_w
do_y
#Template: -2 4 set_var will set local variable 1 to value 4
\set_var {
  swap
  dup
  0 lt if {
    1 add -1 mult
    @[]
  } else {
    @[.G]
  }
}
 
#Template: 2 get_var will return the contents of global variable 2
\get_var {
  dup
  0 lt if {
    1 add -1 mult
    []@
  } else {
    [.G]@
  }
}
 
\loc_n {
  loc_x
  loc_s  
  set_var
  do {
    loc_w
    0 gt
    loc_x
    get_var
    loc_e
    gt xor not
    loc_x  
    get_var
    loc_e
    neq and
    if {
      break
    }
    inline_y{
      y_command_1
      y_command_2
      ...
      y_command_z
    }
    loc_x
    get_var
    loc_w
    add
  }
}
Note 1: The block inline_y simply unrolls the do_y block into the body of loc_n. This is done so that break and continue will function properly.
Note 2: The upshot of this is that do_y will be instantly culled from the source, unless another node references it (which would be a bit of a hack, in my opinion. Regardless, this is fine, and will not affect program validity in any way.


Switch
HSpeak Parameters Henceforth Snippet
Kind ID args[]
flow switch ??? This is not yet documented in HSZ, so we will deal with it later.


Variable Access
HSpeak Parameters Henceforth Snippet
Kind ID args[]
global variable_x   \loc_n {
  loc_x
  [.G]@
}
local variable_x   \loc_n {
  loc_x
  []@
}


Math Functions
HSpeak Parameters Henceforth Snippet
Kind ID args[]
math set_variable lhs_l
rhs_r
\loc_n {
  loc_l
  loc_r
  set_var # As defined above
}
math increment_variable lhs_l
rhs_r
\loc_n {
  loc_l
  get_var # As defined above
  loc_r
  add
  set_var # As defined above
}
math decrement_variable lhs_l
rhs_r
\loc_n {
  loc_l
  get_var # As defined above
  loc_r
  sub
  set_var # As defined above
}
math not lhs_l \loc_n {
  loc_l
  not
}
math operand
If the operand is listed
above, use that code
block, not this one.
lhs_l
rhs_r
\loc_n {
  loc_l
  loc_r
  operand
}


Built-In and User-Defined Functions
HSpeak Parameters Henceforth Snippet
Kind ID args[]
built-in func_id_x   \loc_n {
  hspeak_api_call_x
}
user-script func_id_x   \loc_n {
  user_script_call_x
}

Final Considerations

(later)




Revise.png
This article or section is not complete. If you wish, you may finish it, and remove this tag when you are done.