FMF:HFB
The HFB lump is never encountered in the wild, but its progeny (HF lumps) are used all over the placed in the HVM. Moreover, understanding the "Format B" data stored in HFB lumps is crucial to understanding how the Henceforth Virtual Machine works.
An HFB lump is simply a full list of "Format B" bytecodes, with no additional data or headers. Format B is designed to be quick to parse, and relatively conservative on space. Most bytecodes are exactly one word (sixteen bits), although some are much longer.
These lumps are rarely named. If a name is required, you might consider basing the prefix on either the script ID or the script name --for example, 34.HFB or calculate_exp.HFB.
Legend[edit]
In the following charts, numbers in green represent those already accounted for, and numbers in red represent those currently being described. A star (*) stands for "any digit". Blue digits represent data used by the current block: a period (.), question mark (?), or a percent sign (%) are used to stand for one bit, four bits, and sixteen bits respectively. Numbers in black can be ignored when considering a specific block. Various highlights are marked and explained on the charts.
A Vanilla implementation of the HVM need implement the white blocks only; any Pistachio-colored blocks can simply be scanned and ignored. The Pistachio-colored blocks must be implemented by the OHRRPGCE FMF's HVM, and by any HVM that wishes to mimic the game engine.
The First 2 Bits[edit]
The first 2 bits of each word explain how the remainder of that word is allocated.
- 00 is the simplest and fastest to parse; the command is contained within this single word.
- 01 implies that this word specifies how many words follow, either by its nature ("I am a two-word integer") or by stating its length as a field in the first word.
- 10 is worst to parse; this "variable-width" type requires that parsing continue until a special "magic number" is reached. Although this allows strings to be of arbitrary length (which is better than imposing an artificial limit) it complicates construction of a bare-bones interpreter.
- 11 currently has no use, and will cause a parse error.
Single-Width: The Next 4 Bits[edit]
Single-width bytecodes are relatively simple; they use 4 bits for a control code, and 8 to 10 bits for data or identification.
- 0000 implies the commonly-used "short integer". The right-most remaining 8 bits store the actual value of the integer, from 0 to 255 in value. Negative numbers set the first bit after the control to 1, we call this a "sign bit".
- 0001 implies a Henceforth primitive command, like "pop" or "dup". The remaining 10 bits identify the command; see the table below for the specifics.
- 0010 implies a Hamsterspeak API call. The remaining 10 bits are used for the ID of the API call, as listed here.
- 0011 implies user-defined function call; the remaining 10 bits of this bytecode specifty the ID of that function (which is stored in a HF lump somewhere). If the user has more than 1023 functions, they can also be specified with a fixed-width bytecode (see below).
- 0100 implies a script-local subroutine definition. The remaining 10 bits are used for the ID of this routine.
- 0101 implies a call to a script-local subroutine. The remaining 10 bits are used for the ID of the routine to call.
- 0110 implies a global variable usage. The remaining 10 bits are the ID of the global variable to push to the stack. If all 10 bits are 1, the ID is popped from the stack.
- 0111 implies a global variable instantiation. The stack is popped, and stored in the global variable whose id is specified by the remaining 10 bits. If all 10 bits are 1, the ID is popped from the stack.
- 1000 implies a local variable (parameter). The next bit is 0 to push, 1 to pop, and the following bit is a sign bit. The 8 bits following identify the parameter (or return value, if it's negative 1). If all 8 bits and the sign bit are 1, the stack is popped, and that value is used as the ID of the parameter. Popping and pushing may occur on the parent stack, as described here.
- **** (i.e., "anything else) incurs an error from the parser.
The list of primitives is below:
ID | primitive | ID | primitive | ID | primitive |
1 | dup | 11 | b_xor | 21 | else_start |
2 | swap | 12 | b_and | 22 | if_end |
3 | drop | 13 | eq | 23 | end_define |
4 | over | 14 | lt | 24 | break |
5 | rot | 15 | not | 25 | continue |
6 | add | 16 | and | 26 | break_x |
7 | sub | 17 | xor | 27 | continue_x |
8 | mult | 18 | do_start | 28 | b_not |
9 | div | 19 | do_end | 29 | b_or |
10 | random | 20 | if_start | 30 | or |
Fixed-Width: The Next 6 Bits[edit]
The next six bits identify the command:
- 000000 implies a double-word integer value follows this word. The next bit is a sign bit, and the two words following contain the magnitude of this integer, from 0 to 4294967296.
- 000001 implies a single-word ID follows this word, which is the ID of a user-defined function to call. Although this can be used to call any function, the single-width version of this bytecode is preferred for smaller number; this is intended as an overflow bytecode.
It should go without saying that anything besides these two bytecodes crashes the system.
Variable-Width: The Next 6 Bits[edit]
Variable-width bytecodes are interesting. They are specified by the 6 bits following the first 2. The current variable-width bytecodes all revolve around strings, with a distinction for Unicode-free strings designed to save space. (In fact, all strings are converted to Unicode-strings within the HVM).
- 000000 implies an ASCII-style string, although the font in use by the OHR may wildly affect which character is associated with each letter.
- 000011 implies a unicode string. Don't use these just yet; they're nowhere near ready for deployment (mainly because of fonts).
The second byte in the first word is a "control" field, for describing what to do with this string.
- string_define — Declare this as a string; push it into the string table.
- begin_define — Start the definition of a subroutine with this string as the label.
- call_subroutine — Call a subroutine whose label is equal to this string.
- un-define — Un-define the latest definition of a subroutine with this string as its label.
- push_variable — Push a local variable to the stack; the string is the name of this variable.
- pop_variable — Pop the stack and store the result in a variable with this string for its name.