HSP2HF
Most of you will never have to touch Henceforth if you don't want to. (But who wouldn't want to?) The HSP2HF cross-compiler is used to automatically convert your HamsterSpeak into Henceforth bytecode. It is part of the RPG2XRPG conversion process.
How the Cross-Compiler Works: Motivations[edit]
Some parts of the OHRRPGCE FMF project are slightly incompatible with the standard OHRRPGCE, but for scripting this is simply unacceptable. Unlike, say, a slightly jittery enemy HP value -which is immediately apparent when you first load your game on a phone- a silent error in a script is worth hours of headache-inducing debugging, and probably not worth anything at all in the end.
So, the goal of the cross-compiler is script-level compatibility. Efficiency and conciseness, while important, take a backseat to this driving need.
How the Cross-Compiler Works: Naive Compiler[edit]
The Henceforth cross-compiler benefits from HamsterSpeak's tree-like structure; a naive conversion can simply convert each node to a local Henceforth function. Consider the following script (from Wandering Hamster, loaded in the HSP2HF utility)
Typical to most HSZ scripts, "setnpcspeed" contains a head "do" node. This node happens to contain only one element, which takes three arguments, each of which is a simple number or variable. Clicking "cross-compile" will invoke the naive converter, which produces the following code:
\[14]{ [1]@ } \[12]{ 3 } \[10]{ [0]@ } \[4]{ [10]() [12]() [14]() [HS:78]() } #Init local variables @[1] @[0] #Main script loop do_start [4]() do_end
Let's start from the local variables section:
@[1]
This is a shorthand syntax; it basically calls "local store", a function deep within the script interpreter which does something like this:
void localStore(int arg) { local_variables[arg] = pop_parent(); }
Next, we have the main loop. The "do_start" and "do_end" primitives are there to help the "break" and "continue" primitives to function properly. The meat of the main loop is the call to:
[4]()
...which is simply a call to a script-local subroutine.
The following code defines script-local subroutine 4:
\[4]{ [10]() [12]() [14]() [HS:78]() }
...as three script-local calls([10], [12], and [14]) and one built-in function call (HamsterSpeak:78, alterNPC). The remaining local functions are equally easy to understand. For example, "[1]@" calls "local load" with "1" as an argument. Local load is defined internally as:
void localLoad(arg1) { push(local_variables[arg1]); }
The HSP2HF utility is also an excellent example of a place where recommended syntax is ignored. Although alter_npc is more readable to humans, [HS:78]() is a lot easier to parse for a compiler. Likewise, do_start and do_end is cleaner machine syntax than do{ ... }.
How the Cross-Compiler Works: Reasonable Inlining[edit]
Simple functions like "setnpcspeed" are very easy to inline, just by copying the leaf nodes' source into their parents. The previous script can be re-written as:
#Init local variables @[1] @[0] #Main script loop do_start [0]@ 3 [1]@ [HS:78]() do_end
...which is much more concise. Due to the nature of HF bytecode, inlining usually improves both performance and storage efficiency. (When I have time to profile, I hope to collect some facts to back this up.) However, inlining everything is often either impossible or unwise, which is why one needs a policy for inlining. At the moment, the OHRRPGCE FMF's cross-compiler uses the following algorithm to determine what to inline:
1) Start by doing a naive conversion, and retain the tree structure. Mark all nodes as "not dead" and with a "count" of 1. 2) Loop until all nodes are inlined or dead. At each iteration, do the following. 3) Determine which nodes are leaf nodes. (Leaf nodes have no children, or only dead children). a) If this node cannot be inlined (e.g., it's self-referential, or checks b/c/d/e below fail), mark it as "dead". b) If this node's count is "1", inline it. (Copy its corresponding source into any node that references it, incrementing their "count" value by 1, and then delete it.) c) If this node's count is "2", and it is referenced 8 times or less, inline it. d) If this node's count is "3", and it is referenced 4 times or less, inline it. e) If this node's count is "4" or more, only inline it if it is referenced exactly once.
We are still discussing what makes a node impossible to inline. Technically, the problem is difficult, but Hamsterspeak byte-code is fairly structured in nature, which means we can probably define a few simple criteria for exclusion.
Primitives & HSPEAK->HF Snippits[edit]
The cross-compiler inserts snippets of HF code when it encounters an HSPEAK node of a given type. For example, at node 10, given a "number" node with value 75, it inserts:
\[10] { 75 }
Henceforth, we shall refer to a node of ID n as: \[n] {} --this allows us to generalize HSPEAK nodes into simple templates. Just a reminder: \[n] {} represents a script-local subroutine; it is not valid Format T syntax.
The following templates are loaded when the HVM initializes, so the cross-compiler makes use of them to simplify syntax:
Templates |
#Template: -2 4 set_var will set local variable 1 to value 4 \set_var { swap dup 0 lt if { 1 add -1 mult @[] } else { @[.G] } } #Template: 2 get_var will return the contents of global variable 2 \get_var { dup 0 lt if { 1 add -1 mult []@ } else { [.G]@ } } |
Here are the snippets used by the cross-compiler; we repeat numbers for the sake of completeness:
Numbers | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
number | value | \[n] { value } |
Do Loops | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | do | node_x node_y ... |
\[n] { do{ [x]() [y]() ... } } |
If Statements | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | if | conditional_x then_y else_z |
\[n]{ [x]() if { [y]() } else { [z]() } } |
Then/Else Loops | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | then/else | node_x node_y ... |
\[n] { [x]() [y]() ... } |
Break/Continue | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | command | amount if amount==1 then: skip amount else: append "_x" to command |
\[n] { [amount] command } |
Returning | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | return | value | \[n] { value @[-1] } |
flow | exitscript At a given depth |
\[n] { invalid @[-1] depth break_x } | |
flow | exitscript At a given depth |
value | \[n] { value @[-1] depth break_x } |
While/For | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | while | conditional_x do_y |
\[n] { do { [x]() not if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } } } |
flow | for | count_id_x counter_start_s count_end_e counter_step_w do_y |
\[n] { [x]() [s]() set_var do { [w]() 0 gt [x]() get_var [e]() gt xor not [x]() get_var [e]() neq and if { break } inline_y{ y_command_1 y_command_2 ... y_command_z } [x]() get_var [w]() add } } |
Note 1: The block inline_y simply unrolls the do_y block into the body of [n](). This is done so that break and continue will function properly. Note 2: The upshot of this is that do_y will be instantly culled from the source, unless another node references it (which would be a bit of a hack, in my opinion. Regardless, this is fine, and will not affect program validity in any way. |
Switch | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
flow | switch | ??? | This is not yet documented in HSZ, so we will deal with it later. |
Variable Access | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
global | variable_x | \[n] { [x]() [.G]@ } | |
local | variable_x | \[n] { [x]() []@ } |
Math Functions | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
math | set_variable | lhs_l rhs_r |
\[n] { [l]() [r]() set_var } |
math | increment_variable | lhs_l rhs_r |
\[n] { [l]() get_var [r]() add set_var } |
math | decrement_variable | lhs_l rhs_r |
\[n] { [l]() get_var [r]() sub set_var } |
math | not | lhs_l | \[n] { [l]() not } |
math | and | lhs_l rhs_r |
\[n] { [l]() if { [r]() } else { False } } |
math | or | lhs_l rhs_r |
\[n] { [l]() not if { [r]() } else { True } } |
math | operand If the operand is listed above, use that code block, not this one. |
lhs_l rhs_r |
\[n] { [l]() [r]() operand } |
Built-In and User-Defined Functions | |||
HSpeak Parameters | Henceforth Snippet | ||
Kind | ID | args[] | |
built-in | func_id_x | \[n] { hspeak_api_call_x } | |
user-script | func_id_x | \[n] { user_script_call_x } |
Resolution Engine[edit]
After cross-compiling, the HSP2HF utility is basically left with a sequence of bytecodes for each script. The final step is to lump these together into HF lumps. The size of a script, along with its potential to call other scripts, is used to properly group several scripts into one HF lump. To gather this information, a single pass of each script is made. Simultaneously, the resolution engine scans scripts to find simple optimizations which can be performed in place. A list of these optimizations follows.
Found | Replaced By | Reasoning |
number get_var | [number.G]@ #If number is < 0 [-(number+1)]@ #Otherwise |
The parameter ID is often known. |
number value set_var | value @[number.G] #If number is < 0 value @[-(number+1)] #Otherwise |
Ditto to the above. |