Henceforth VM

From OHRRPGCE-Wiki
Jump to navigation Jump to search
Mobile-phone.png
This article is about the OHRRPGCE FMF project, which is an alternate implementation of the OHRRPGCE for Java mobile phones. Technical implementation details discussed here should not be confused with those of the RPG format

The Henceforth Virtual Machine (HVM, Henceforth VM) is used to dynamically interpret Henceforth (HF, Hamsterforth) scripts. The OHRRPGCE FMF does not use Hamsterspeak's traditional HSX/HSZ files; rather, it opts for a stack-based alternative. Henceforth files are stored with the .HF extension.


Warning: This document is complete, except for the "final considerations" section, which will be trivial to add later.
We are currently in the process of reviewing the entire design specification, writing a toy interpreter, and polishing up (and segmenting) this document.
We'll gladly take your suggestions on the "Discussion" page, but please do not modify, distribute, or trust this document. It will soon be as open to the community as everything else on this Wiki, but it'll take a few weeks to do the final review.

Thanks~
/Seth


Acknowledgements

A big thanks to both James and Ralph for their well-founded suggestions regarding the HVM, and to Mike for clarifying countless issues. If you start reading into the minute details of the HVM, you'll find that a whole lot of it is lean, clean, and sensible. This is almost entirely due to James & Ralph's suggestions and experience, which I truly appreciate.


Introduction to Henceforth

Henceforth is a simple, stack-based programming language which the OHRRPGCE FMF uses as a crossover language from traditional Hamsterspeak. Like all of the lumps exclusive to this project, .HF lumps containing Henceforth code are cross-compiled automatically for you when you convert your game to XRPG Format. However, all developers should consider learning basic Henceforth syntax to aid them in writing run-time debug scripts and compile-time optimized versions of their most time-critical Hamsterspeak scripts.

Henceforth is a stack-based language, designed to obviate register allocation problems and minimize script size. You should read James's excellent analogy of a stack, if you aren't familiar with the concept. Hamsterspeak uses the stack for ordering scripts; Henceforth, however, uses the stack for storing data.


A Henceforth Primer

Although Henceforth is not a compiled language, you can pretend that "source" Henceforth is compiled into "binary" HFZ lumps, which the game engine runs, and only one person will cringe and correct you. So let's have a look at some Henceforth source code, shall we? Any text after a hash mark until the next newline is considered a comment, and whitespace is used to separate commands. Consider the following script.

   #One way to separate commands is by hitting "Enter" after each command.
   4        #Push 4
   5        #Push 5
   add      #Pop the top two elements and add them. Push the result.
   @result  #Pop the top element and store it in the variable "result".
   
   #Another way to separate commands is using spaces.
   4   5   add  @result

Most programmers will use a mix of spaces and newlines to keep their scripts clear and sensible. For example, "4 5 add" should probably always be on the same line. Regardless of what format you use, manipulation of the stack is the same. The following diagram shows how the stack (and the variables list) changes as the HVM reads each command, from left to right.

Fmf stack diagram.png


The "Pure" Virtual Machine

From a theoretical point of view, Henceforth does not require a location to store variables. Rather, we could simply add block labels and a delayed execution (prepending) command:

  • Surrounding some source with \name{} will label that block with "name"; if name later appears as a command, this block will be inserted in-place. Later writing /name will un-define "name". If multiple blocks are labeled similarly, the most-recent label is considered valid.
  • Pre-pending "&" to a command will push the result of that command into the source code itself, at the time that command is first encountered.

Block labels allow functions to be written very naturally:

   #The "modulo" function expects two arguments to be
   # on the stack; it transforms (X  Y)  to (X%Y)
   \modulo {
    swap  dup  rot  dup  rot  swap     #Now we have (X Y X Y)
    div  mult  sub                     #The heart of our algorithm
   }

There's no need to "return" anything; the labeled block operates in a way that leaves one item on the stack. Here's some code to use this block as a function:

   #We want to compute 7%3
   7  3  modulo    #After this, "1" is left on the stack

This is almost the exact syntax for writing a function in Henceforth. Now, let's cover prepending:

   #Avoiding variables with named blocks.
   4   5   add  
   \result{       #Label the start of the "result" block
     &pop         #This is tricky: the first time we encounter this
                  # (i.e., when "\result" is defined), it executes "pop",
                  # resulting in "9", which is pushed into the source.
   }              #Closes the "\result" label.
   
   result         #Calling "result" will now push "9".

Unlike named blocks, prepending is not actually implemented in Henceforth; you can ignore that last bit of code if you don't understand it. Which brings us to our main point:

Henceforth was designed around a "pure" VM, but it was then modified to run efficiently on a phone, and to allow an easy mapping from existing Hamsterspeak scripts.

It is crucial that you keep the ideal virtual machine in mind, or else you will start asking about "Henceforth's registers" (which don't exist) or you'll start using variables way too much. You also need to be cognizant of the dirty details required to kludge Hamsterspeak onto a phone; otherwise, things like script-level subroutines and our extravagant caching mechanism will seem redundant and senseless.

(Note to readers: At this point, if you're only interested in the Henceforth language, you can skip down to the section on Henceforth Syntax.)


Where the HVM Fits Into the Game Loop

Let's pretend that on a certain map in your game, a certain NPC is set up to "run script 4" when the user walks up to it and presses "Accept". Upon pressing "Accept", the game engine checks if this NPC has a script ID assigned to its "accept" behavior. The classic OHR performs this same task. If one exists, it is initialized and cached. At this point, the game engine continues with its tasks for that cycle (e.g., message boxes, vehicles). This (including user input) constitutes the "Update" phase of the game loop. Following this is the "Paint" phase, which shows one frame's worth of updates to the user. After painting, the tick count is incremented, and the loop repeats. The first thing that happens then is the "scripts" phase, during which scripts are actually executed (see below). The loop repeats indefinitely.

Hf flowchart.png


Virtual Hardware Specification

The Wikipedia page for Virtual Machines focuses on their use in emulation. In fact, there is no hardware version of the HVM, just like there is no hardware equivalent of the Java Virtual Machine. Rather, Henceforth defines an unambiguous virtual hardware specification which can be ported to any platform with guaranteed interoperability. Currently, the only platform with a working HVM is the OHRRPGCEFMF, due to the bias of the HVM's design towards mobile devices.

Anything in the world can be described by its nouns and verbs --the things it has and what they do. So let's inspect the pieces of the HVM and how they interact.


VM Data Structures

Fmf hvm data structures.png

Green and red lines in this diagram indicate references from one item to another, although they are not drawn for every single reference (because that would be craaaaaaazy).


What these do, top-down

FMF:The_HVM,_From_the_Top_Down

What These Do: Bottom-Up Description

FMF:The_HVM,_From_the_Bottom_Up

Some Technical Considerations

When a script is triggered by any means, it suspends the currently-executing script; see "The Stack" in "How does plotscripting work?". This makes sense; if your script calls another script, one expects the new script to take over. It also means that if you step on several map tiles which each have (long) scripts, the last maptile's script will finish first. (Threading will probably change this.)

But wait, there's more! Several small but important details of script initialization include:

  1. If you trigger the same script twice (e.g., by stepping on two copies of an NPC one after the other) the script loader will detect that you are trying to load a script with the same id as the currently executing script. If the "Permit double-triggering of scripts" general bitset is on (or if the script is explicitly called by another script) then this is allowed, and a second instance of the script is initialized.
  2. When a script instance is first loaded, its "delay" counter is set to 2 ticks. This is not the same thing as instructing the script to "wait", because it remains the active script while delaying. If another script pre-empts this one, the delay counter does not decrement until the new script finishes. (When the new script finishes, the old one immediately resumes what it was doing, be it running or decrementing the "delay" counter.)
  3. To prepare for parallel scripts, the active script is pushed to the top of the "waiting" stack, and is manipulated by a reference ID. This way, all scripts are kept in one place, and GAME_FMF can just scan through this list once each cycle, updating all scripts until they enter a waiting state.

HF Source Code & Formats

Strictly speaking, Henceforth has three different formats for its source code, and there is no such thing as "compiled Henceforth". These three formats are Format T (text-based), Format B (bytecode), and Format HF (compressed). Practically, most user code is generated in Format T, which is then converted to Format B and compressed (to Format HF) for storage. The HVM reads all three formats, but in doing so it converts T & HF to B.

That said, we shall begin our study of Henceforth syntax by exhaustively reviewing all Henceforth commands in "source code" format (Format T), and then describe the mappings to other formats.

Henceforth source is a rather slim language. Numbers and strings are supported, but concatenations like "Hello"+" world" and decimals like 2.4 are not. A syntax exists for "do" and "if" statements, and for declaring functions and variables. Several smaller hacks exist for efficiency's sake, as described earlier. And that's it. Anything else is either a "primitive" like add, or a "block label", subject to the polymorphic lookup that characterizes Forth-like languages. Henceforth is not based explicitly on Classic Forth, but it hopes to learn from the mistakes of this peculiar family of languages. One of the main disadvantages to ANSI Forth is its pandering to C++ fixations, overcomplicating a language which, at its heart, always championed simplicity and elegance.


Henceforth Syntax Dictionary

Numbers

Numbers are represented by signed integers such as 2, 18, -35, or 43. When the HVM encounters a number, it pushes it to the stack. That's all for numbers, so let's cover stack effect diagrams briefly; they're necessary to understand more complicated features. Consider:

1 2 3 7 1 2 3 7

Note: Stack-effect diagrams are oriented with the top of the stack on the right. This means that the stack starts with the elements 1, 2, and 3 already in it, is then given the command "7", and the resulting stack is now 1, 2, 3, then 7. These is an example of the prototype style of learning; to teach you how to use a command, I give you a real-world example of how it works.

If I want to write a stack effect diagram without the fancy table, I might use parentheses and the "--" symbol to mean "the operator we're talking about". For example: "if the interpreter encounters -5, the resulting stack-effect diagram would be (1 2 3 -- 1 2 3 -5)".

And now we're way beyond numbers.


Primitives

When the HVM interprets a literal string like exp, mycode, or abs, it searches for the relevant subroutine. The primitives are exceptions to this; they are always executed immediately. A primitive consists of lowercase letters and underscores, and must contain at least one letter.

Here is a list of primitives, and their stack-effect diagrams:

Category Primitive Stack Effect Diagram Comments
Stack Manipulation dup ( 1 2 3 -- 1 2 3 3 ) "Duplicate" the top of the stack.
swap ( 1 2 3 -- 1 3 2 ) "Swap" the top two elements of the stack.
drop ( 1 2 3 -- 1 2 ) "Drop" the top element off the stack.
over ( 1 2 3 -- 1 2 3 2 ) Pull the penultimate element "over" the top element.
rot ( 1 2 3 4 -- 1 4 3 2 ) "Rotate" the top three elements of the stack.
Arithmetic add ( 1 2 3 -- 1 5 ) Addition
sub ( 7 3 2 -- 7 1 ) Subtraction
mult ( 1 2 3 -- 1 6 ) Multiplication
div ( 1 7 3 -- 1 2 ) Division
random ( 1 4 8 -- 1 6 ) Random Number
Bitwise Operators b_xor ( 1 52 37 -- 1 17 ) Exclusive Bitwise "Or"
b_and ( 1 52 37 -- 1 36 ) Bitwise "And"
Logical Operators eq ( 1 2 2 -- 1 1 ) Logical "Equals"
lt ( 1 5 2 -- 1 0 ) Logical "Less Than"
not ( 1 5 1 -- 1 5 0 ) Logical "Not"
and ( 1 5 1 1 -- 1 5 1 ) Logical "And"
xor ( 1 5 1 1 -- 1 5 0 ) Exclusive Logical "Or"

There is no boolean type; the number zero is considered "false", and any other number is considered "true". For clarity, the number "1" is the preferred way to represent "true".

A number of additional keywords such as break and continue will be discussed in the section on If statements and Loops.


Blocks (Subroutines)

Labeled blocks, also called subroutines, are the core of Henceforth's modularization. They are strikingly similar to primitives, except that they can be defined, undefined, and overloaded. Labels contain letters and underscores, just like primitives.

Defining a Subroutine
Syntax
  \NAME{CODE}
Example
  \mod{
    swap dup rot dup rot swap
    div mult sub
  }
Comments
  This creates a function called "mod", which performs a series of stack manipulations and basic arithmetic.


Calling a Subroutine
Syntax
  NAME
Example
  7 3 mod
Comments
  Traditionally, we push the "parameters" to the subroutine in increasing order, and expect it to remove these arguments (and replace them with a return value, if any) during the course of its operation.


Overriding a Subroutine
Syntax
  \NAME{CODE}
Example
  \mod{
    swap dup rot dup rot swap
    div mult sub
  }
 
  \mod{
    drop drop 0
  }
Comments
  After executing this snipped, "mod" will have two definitions. Calling "mod" will result in the latter being evoked, which simply clears the parameters and returns zero.


Un-defining a Subroutine
Syntax
  /NAME
Example
  /mod
Comments
  If we execute the code from "Overriding a Subroutine", and then execute this code, "mod" will be left with a single definition: the first one.


Any labeled block remains in effect even if the script which defined it finishes execution; make sure to un-define any subroutines you do not wish to linger. The HVM capitalizes on this behavior to limit the number of built-in primitives. The modulus operator described earlier is an example, as is "less than or equal to", which can be defined compositely. Here is a list of all the subroutines the HVM loads by default, with their stack-effect diagrams.


Category Subroutine Stack Effect Diagram Comments
Arithmetic exp ( 1 2 3 -- 1 8 ) Compute x raised to the y ("exponent")
mod ( 1 10 4 -- 1 2 ) Calculate the "modulus"
Bitwise Operators b_or ( 1 45 52 -- 1 61 ) Inclusive Bitwise "Or"
Logical Operators neq ( 1 45 52 -- 1 1 ) Logical "Not Equal To"
lte ( 1 45 45 -- 1 1 ) Logical "Less Than or Equal To"
gt ( 1 45 52 -- 1 0 ) Logical "Greater Than"
gte ( 1 63 45 -- 1 1 ) Logical "Greater Than or Equal To"
or ( 1 0 1 -- 1 1 ) Logical "Inclusive Or"


Variables

Shuffling the stack beyond the fourth element can get messy. As such, subroutines can allocate local variables which stay in scope as long as that script is running. Note that variables follow a similar naming convention to subroutines. They are accessed using the @VAR and VAR@ commands. You can remember them easily by putting an imaginary "stack" next to them:

  • (stack)@VAR takes a value from the stack and throws it at VAR. This is the same as setting VAR = pop().
  • VAR@(stack) takes the value in VAR and throws it at the stack. This is the same as pushing VAR.

Here is some sample source for the lte function, written using variables. Note that, in reality, lte only uses the stack. This is much more efficient for the HVM, which only initializes the local variable store if it detects your script trying to access it.


Using Variables
Code
  \lte{
    #Pop the stack twice to get our local variables.
    #We assume that we are testing @lhs <= @rhs
    @rhs  @lhs
  
    #Push our variables, do the first test (<) and
    # store it in @res1
    lhs@  rhs@  lt  @res1
  
    #Push our variables, do the second test (==) and
    # store it in @res2
    lhs@  rhs@  eq  @res2
  
    #Push our two tests and do an or test. Leave it on
    # the stack to be returned
    res1@  res2@  or
  }
Comments
  As an exercise, you should try to convert this to a stack-based alternative. It should end up more concise, and also faster.


Do/If & Loops

Most Forth-like languages have an "if" statement, and Henceforth is no exception. However, rather than encapsulating loops into labeled blocks (like Classic Forth), it adds another keyword: "do". This serves two purposes: first, it allows for better inlining than subroutines, and second, it provides a locus of control for jump commands. The second rationale is sometimes written as "otherwise, we'd have to use goto".

Inside of a do loop, the commands break and continue also take effect; you should read the HSZ documentation for an example of how these work conceptually. The mechanism for break and continue is different in the HVM, but the effect is the same.

The standard Hamsterspeak engine also allows breaking out of multiple loops at once. In order to keep the syntax clean in Henceforth, we introduce two bytecodes for that purpose: break_x and continue_x. These take the element from the top of the stack, and break out of that many loops. The break command is effectively equivalent to "1 break_x".

Thus, we add five more keywords to the list of Henceforth primitives: do, if, else, break, and continue. Brackets are used to denote code blocks where relevant. We will cover the syntax in simple examples. The first is a user-defined subroutine, "divide_safe", which avoids dividing by zero, and the second is an example of the Fibonacci sequence using "do" to achieve the effect of a "while" loop.

Save Divide-by-Zero Handling
Code
  \divide_safe{
    #Do a zero-check on the divisor
    dup  0  eq
    
    #Now, branch on that
    if{
      pop  pop  0
    }else{
      div
    }
  }
Comments
  Note that returning zero is just one way of handling the erroneous case. Also, take note of the two "pop" commands; these are required!


Fibonacci Numbers
Code
  \fibonacci{
    @count     #Must be >=1
    0  @fprev  #fib(n-1)
    1  @fnow   #fib(n)
    do {
      #Test-and-decrement
      count@  0  eq
      if {
        break
      }else{
        count@  -1  add  @count
      }
      
      #Iterate & Accumulate
      fprev@  fnow@   add
      fnow@   @fprev
      @fnow
      
      #Re-do the loop
      continue
    }
    
    fnow@
  }
Comments
  This subroutine fails given zero as input. Modify it by adding a simple if statement before the main loop.

This second example is far more interesting; it answers the question "How do I Loop in Henceforth?"

A Note About Recursion: Although Henceforth fully supports recursion, the HVM is not designed for heavily-recursive functions. When possible, please use an iterative solution.

HSpeak API Calls & User Scripts

The HVM has knowledge of most "built-in" functions that Hamsterspeak provides. These are called like any other subroutine, with one exception: most format converters create specialized subroutine calls out of them. In other words, if I write a call to wait_npc in my (Format T) code, the converter will create (Format B) source that points explicitly to the definition of wait_npc in the HVM; it is not possible to overload wait_npc. This breaks our promise of fully-interchangeable source formats, but it is hardly noticeable to most users, and keeps our global function lookup table from becoming too cluttered.

See FMF:Plotdict for a listing of all API calls and their syntax in Henceforth.

User-defined scripts are treated exactly like API calls; they are assigned IDs and called explicitly.


Strings

Strings in Classic Forth are handled by simply pushing a series of integers (for each letter) to the stack. Eventually, Henceforth may go this route, but for now the API calls to strings are so simple that we chose a much simpler implementation:

  • Strings are a series of letters between double-quotes, like "Hello World", "Going?", or "Unicode: \u1000". As hinted by the last example, the HVM supports Unicode.
  • When a string is encountered (usually at the top of a script) it is pushed to the String Table for that script. Strings are pushed in order, cannot be removed or easily manipulated, and have IDs in increasing order.

This is an ad hoc implementation of strings, until we get a clearer picture where the standard OHR's development is going with these ideas.

Using a String
Code
  "Hello world!"
  0 show_str
Comments
  Defining "Hello world!" will place this string at index zero; pushing 0 and then calling the API function show_str will reference that string explicitly.


Script-Local Subroutines

Here's where Henceforth starts to really get picky. When a Hamsterspeak script is converted to Henceforth, it is basically converted into a series of labeled blocks that call each other. Recalling that user-level "scripts" are effectively subroutines, the keen observer will note that a lot of resources are wasted supporting "subroutines" that are, actually, just glorified "goto" blocks. For example, a definition such as "\plus_two{ 2 add }" doesn't need its own stack space --it's just a snippet of code!

With that in mind, subroutines are allowed to define "script-local" subroutines which operate along a single thread of control within a named block. These are defined with a given ID, and called with an array-like syntax. They cannot be un-defined or overloaded, although theoretically they could be replaced (by re-defining them).

Defining a Script-Local Subroutine
Syntax
  \[ID]{CODE}
Example
  \[5]{
    2  plus
    div mult sub
  }
Comments
  Our "plus two" example from before, assigned to index 5.


Calling a Script-Local Subroutine
Syntax
  [ID]
Example
  16  [5]
Comments
  This pushes 16, and calls local block 5. The result is a stack containing the number 18, if we define [5] as listed above.


Local Parameters & Return Values

An astute reader will have noticed that scripts currently have no way of passing values to other scripts. Calling "5 6 my_func" won't work, because my_func will get its own independent stack. This problem is circumvented through the use of local parameters, which also provide a slight speedup for small, frequently called subroutines. The local parameters are accessed by ID, with -1 a special ID that refers to the script's return value.

  • Executing @[5] will "pop the (parent) stack to local parameter 5"
  • Executing [5]@ will "push local parameter 5 onto the (current) stack

For the return value, it's backwards:

  • Executing @[-1] will "pop the (current) stack to the return value"
  • Executing [-1]@ will "push the return value onto the (parent) stack", unless the return value is set to invalid (INT_MIN, for now).

The only point of interest is that last bit. The problem is, the HSpeak command "return" sets the return value, and "exitscript" will exit and return that value. Some scripts simply aren't intended to use their return values, which would leave an extra element on the stack. We could always just pop the return value, but we don't want to punish a highly modularized script with an un-necessary drag on performance. This is the alternative we came up with.

Comments

Henceforth supports comments, although it's important to realize that Formats B and HF will drop your comments; they can't be encoded. A comment begins with a hash mark (#) and ends with a line feed.


Global Variables

All that's left are global variables, which are manipulated with syntax like @[65.G] and [65.G]@. We realize this is a bit awkward, and we're open to a cleaner syntax (once we actually get a release going).


The Three Formats of Henceforth

As mentioned, Henceforth source can appear in three different formats: T, B, and HF. Here's a side-by-side comparison of the three.

  Type Use
Format T text All human interaction (coding, discussing ideas, etc.) occurs in this format.
Format B bytecode (binary) The HVM executes Format B code. The HSP2HF cross-compiler outputs this format.
Format HF compressed The XRPG specification requires all Format B code to be compressed into format HF lumps, which are then stored locally on the user's phone.


HFB

FMF:HFB

The Cross-Compiler

HSP2HF

Final Considerations

(later)




Revise.png
This article or section is not complete. If you wish, you may finish it, and remove this tag when you are done.