Talk:Henceforth VM

From OHRRPGCE-Wiki
Jump to navigation Jump to search

S'orlok Reaves: Thanks! I was wondering how to do numbered lists, but didn't have time to look it up. :D

Stacks[edit]

In the article, you mention that "Currently, the OHR is not threaded, but future versions may be, and some other script might use the stack while you are "wait"ing."

Wouldn't it be better to maintain independent stacks for each thread? I believe that's how processes work on real CPUs. This way, a misbehaving script (buggy, or perhaps corrupted somehow) can't corrupt the whole stack, only its own.

If memory usage is a concern, then perhaps analyzing the scripts at compile time would be in order. It is possible to determine how much stack each script uses, and then basically add it up to find the net difference. Then, you only have to allocate just enough for the stack usage (although, allowing for the stack to grow, in case the compiler can't figure it out properly, such as for recursive scripts).

Or, just start the stack out small, and grow it as necessary.

Bob the Hamster: yes, I do believe we will maintain a separate stack for each script thread. The grow-as-necessary approach seems easiest so we will probably do that first.


S'orlok Reaves: That's a very good suggestion, and avoids a lot of hackishness. I like it. One of my worries with having separate stacks was that (good) stack-based languages have implicit tail recursion by ignoring the keyword "return". This makes tight recursive loops much faster, and I was worried that having separate stacks would require copying some data on a "return". That was before I'd started reading the HSZ specs; I think we can now have separate stacks, and allow "return", "exitscript", and "exitreturning" to affect the parent stack. That way, we get the benefits of having separate stacks, and still retain the tail-recursion of any locally defined scripts.

I will change the documentation after I finish implementing the naive cross-compiler; I need a clear picture of this in my head before I start writing. Script-local stacks will definitely be an added feature, though. Thanks for mentioning it!

Mike C.: Wait, why would you need to copy anything? The stack used by a thread is used by all scripts executing in that thread. I.e., if we start with A, which calls B, and B returns something, it just leaves it on the stack for A. The point of having separate stacks is so that Foo, an infinite loop that does something every frame, doesn't affect A and B.

In other words, there is no "parent stack", just a single stack per thread.


S'orlok Reaves: Oh, I wish I'd read this earlier; I seem to have massively mis-understood threads. My idea was to have multitasking on the script level. Anyway, when multitasking is added to the classic OHR, I'll just narrow the HVM to match. Thanks for your clarification, though, I'll make sure to leave room in the design for that.


Moving Forward[edit]


The Mad Cacti: That native compiler/HS analyser tool looks cool! I think that it would be generally useful (to devs (ie. me)) outside of Henceforth. I sometimes want a tool to examine compiled HS scripts.

This page has inspired an idea. As I've probably mentioned, I want to rewrite the script interpreter for vanilla OHR too, also translating (at run time?) the HSZ format to something more suitable for interpretation. (Though, I have to watch out for certain games with 800kb script files - scripts will have to be translated on use.)

So, here's the idea: RPG files could continue to contain scripts in HSZ format (and hopefully won't contain duplicate copies in any other format). Instead, we can repurpose HSX as a formatfor describing scripts in a way convienent for tools like HSP2HF and HSDECMPL rather than intended for direct interpretation, and containing extra metadata. I think that the tree structure is good for this. I want to add arrays (with lots of complexity) in the near future, and also add debugging info to scripts, like line numbers and references to the script (the original text script files could be added to the RPG file for this purpose) and more. It would be nice to be able to do this without worrying about hacking/breaking the HSZ format to avoid sever penalties of direct interpretation. For example, to support hss file references, I could add a new "comment" type (say, #8) which wraps a command to state its text file location. HSZ interpreters which don't want that information could ignore it. Eg:

Offset 0  ::  Type: 2 flow,    Id: 0 do,      Args: 1, Arg0: (cmd) 4
Offset 4  ::  Type: 8 comment, Id: 1 hss pos, Args: 3, Arg0: (line no) 13, Arg1: (column no) 15, Arg2: (wrapped cmd) 10
Offset 10 ::  Type: 6 builtin, Id: 1 wait,    Args: 1, Arg0: (amount) 17
Offset 13 ::  Type: 1 int,     Id: 345   (line no)
Offset 15 ::  Type: 1 int,     Id: 2     (col no)
Offset 17 ::  Type: 1 int,     Id: 10    (wait amount)

Actually, this is just an example, prehaps it would be too bloaty and we need an alternative (binary?) format for hss references. But other features like arrays could work in a similar way. What do you think?

Your inlining rule seems pretty reasonable, I thought about it for a long while and couldn't think of any failure cases. I note that it might be possible to inline starting from the root rather than the leaves too.

You seem to have an awful lot of strings and hashmaps. Won't this eat memory? And, I really can't understand why the first 7 variables would be stored in the "Local Variable Array" and the rest in the "Local Variable Hash". Does this mean that local variables are numbered anyway?

What's this business about runtime modifiable/generated scripts? How are these created? I'm guessing that that information hasn't been added yet.


S'orlok Reaves: This is all good stuff! Let me see if I can cover it all:

1) I've read some of your emails about re-writing the script interpreter. Translating scripts the first time they're loaded makes sense to me, even for <800KB-scripted games. Henceforth is my attempt at just such an improvement; however, I think the standard OHR could do much better. For example, the HF bytecode is needed because I can't inject Java bytecode into J2ME apps. You guys, on the other hand, could optionally enable some kind of "inline threading" to really get some performance gains.

2) I like your idea for comments, and I'm sure you'll refactor it to perfection. Here's two ideas: a) You might keep "decorations" (file line numbers, etc.) in a separate file that's only loaded when debugging is enabled. b) You might do the "Color Forth" thing; make even-numbered Hamsterspeak nodes contain code, and odd-numbered ones contain the comments for the previous line. This keeps comments in the same file, and has no noticeable affect on the algorithm you already use ("load address 0, run it and jump to labeled addresses").

3) I think it'd be great if Hamsterspeak was extended with arrays. Keep me posted. (I really don't know enough about the internals of Hamsterspeak to have an informed opinion on arrays, sorry.)

4) As for (in)efficiency in the HVM, I'm being cautiously optimistic (and profiling, of course). If hashes bog it down, I'll give it my old classic: a) If the hash size is <20 elements or so, just store it in an array & search. :P (It's worked before). What do you mean by "a lot of strings"?

5) The first seven local variables ("function parameters", to you guys) are treated specially; this way, I can give them their own load/store bytecodes. This gives a useful boost to small scripts that are called often and have to store their variables, then load them several times. If a script takes more than seven parameters, it is probably so big that it won't notice the hit in performance from storing additional parameters in a hash. But, of course, I'll profile it and change it if I'm wrong; this is just a first approximation. Finally, local variables which are named are not numbered; declaring "@test" creates a test variable with no ID.

6) Runtime-modifiable scripts are those that can be "injected" at runtime. Let's say you want to constantly track the value of global variable 5. In this case, you can over-ride the HVM's "draw" routine to poll at GLOBAL[5] every tick. I still haven't fleshed-out the HF API for drawing yet, and it will probably have to wait until after the first release of the HVM.

7) Inlining from the leaves makes the most sense to me, although I see your point. Once I get a working prototype, the fun of comparing inlining policies will really begin.

Thanks for commenting on this article; I need all the input I can get. PS: The Script Inspector tool is open source, although it currently only handles HSZ files. I'd encourage you to wait for the first release of the HVM, and then bug me to bundle the Inspector separately; right now it's not fully capable.

The Mad Cacti: My very very late reply (actually, it's early in the morning)

1) Hmm, I am doing direct threading, however I could try out inline threading too. A register or two could be used to store intermediate results to pass between opcodes, moving considerably towards the speed of fully optimised JIT without having to do any instruction-level optimisation. It sounds like there are many potential problems though, like alignment and compiler instruction shuffling.

Would be actually be possible to translate HS scripts to Java bytecode? How would wait commands be possible?

2) Yes, I think it's more sensible to keep stuff like that in a totally separate lump (in the .HSP).

3) Got to get around to posting stuff on the mailing list about it.

4) Well, to be honest I'm not following you on the implementation details of the interpreter. All this support for dynamic runtime lookups for everything, apparently implemented with string opcodes (1000 0000) seems like it makes HVM much more complicated than the OHR's standard interpreter. But I guess that runtime lookups will actually not occur in bytecode translated from HSZ?

5) Ah, individual bytecodes for the nth variable! I have to look into stealing that idea for inner loops.

6) I suppose this is a big idea in Forth. I guess that might be an easy way for to have debugging support in the HVM.

7) Yup.

Some new stuff I notice while reading through the page, although it is so big that I just skimmed over it, and the fact that the textual format stuff is a great big sidetrack which was confusing me:

How are integers being stored? What I can make out seems rather bizarre:

Single-Width: The Next 4 Bits:

Short Integer: So are storing a 9 bit integer in 10 bits? Is it stored in one's complement? I mean, is the range from -256 to 256 or -257 to 256?

Parameters: Again, are you implying a one's complement storage format?

Global Variables: Oh no! 10 bits is only enough for 1024 global variables!

Fixed-Width: The Next 6 Bits:

Full-Width Integer: Huh? Your integers are 17-bit? (Again, one's or two's complement?) How do you store 32 bit integers???

Also, Re:Local Parameters & Return Values: "Some scripts simply aren't intended to use their return values, which would leave an extra element on the stack."

Watch out: the OHR's interpreter pushes a value onto the stack for every statement, so flow control statement like then() and do() have to pop a set of garbage values when they finish. A script which returns a value (containing return or exit returning) may or may not have its return value actually used in an expression. Likewise, do ( 3 ) is valid (though it generates a warning from HSpeak).


S'orlok Reaves: I always seem to catch your replies uber-late; I just found this one by chance, since I'm re-reading all y'all's comments before updating the page. (I'll split the HVM page into separate sections, to make it easier to digest.)


>>(TMC):Would be... possible to translate HS scripts to Java bytecode? How would wait commands be possible?

From my limited understanding, you have two options: 1) Only translate "blocks" of HS which translate easily; don't translate wait commands. If you have some "set hero hp" command, you can translate it to:

 game_data[CONS_HEROES][CONS_HP] = 23

...or something like that, for example. 2) Set up your system to represent scripts as Threads. (The HVM does NOT do this). Then, you can translate "wait" into "Thread.sleep(100)", or something equivalent.


>>(TMC): Got to get around to posting stuff on the mailing list about it.

Why not make a wiki page on it? That way, you can design a bit of the system, then post to the mailing list for comments. I'd really like to see where you're going with this, given that I'm quite new to interpreter design.


>>(TMC): But I guess that runtime lookups will actually not occur in bytecode translated from HSZ?

Basically, every subroutine must be defined before it is called. So, when you define "exp" as a subroutine, it sets a hash value:

 {"exp" --> [,,,,,]}

...where [,,,,,] is just a cache of the subroutine's bytecode. So, the lookup is roughly constant-time.

This system will have to change a bit, since OHR user-defined subroutines have so much extra baggage, but I'll update the main page (hopefully) today with that info.


>>(TMC): I have to look into stealing that idea for inner loops.

Please, it's all yours. Since I've got a full bytecode, I'll probably raise the limit from seven to "whatever", since 7's a bit arbitrary.


>>(TMC): Is it stored in one's complement?

Nope; if you want to get technical, one's and two's compliment have no sign bit, although the first bit functions similar to a sign bit.
It's very simple: take the 9-bit integer value, and multiply it by (sign_bit*-1).
Two's compliment is superior to one's compliment, and I don't like either one very much.


>>(TMC): 10 bits is only enough for 1024 global variables

Ah, dang it. I'll add a multi-width integer format as well. Thanks for noticing this.


>>(TMC): Watch out: the OHR's interpreter pushes a value onto the stack for every statement

Oh, wow, I totally missed that! Hm.... so a script that just contains do(3) will just return 3 to the parent script? I think I can handle that; I'll just use a $_ operator, like Perl, and push that as the return value.

Thanks for pointing that out! That would have ground everything to a halt later!

Think I covered everything... I'll be re-writing the specs later today, time permitting.


Missing operator?[edit]

Mike C.: I notice that there's no primitive "or" operator. Is this by design, or an oversight? Or, are we expected to call "not", and then "and"? Same thing for the missing "b_or" operator.

S'orlok Reaves: A valid point. If the interpreted "or" is too slow, I'll just add it to the primitives list. But I'd like to try keeping the core list of primitives minimal; do "or"s in script-heavy games actually take a significant amount of processing time?

Mike C.: Oh, I just looked down the list, and saw or in the list of subroutines. I'm curious to know how is is implemented. Is it something like this:

x
0
eq
not
y
0
eq
not
and

Still, I think it would be worthwhile to include it as a primitive, since OR is an instruction on all CPUs that I'm aware of, and should be optimized, rather than introducing a layer or two of indirection...


S'orlok Reaves: I was thinking:

 x  y  xor
 x  y  and
 xor

...since this logic applies to b_xor too. As far as custom hardware instructions go, I'll start with a minimalist set of operands; the rest defined in software, and then reduce them to primitives as I detect slowdown.

Perhaps I've been doing too much Python programming; it's given me a "don't optimize unnecessarily" attitude.

Mike C.: That's a good attitude, certainly. For the most part, I'm not particularly concerned about speed optimization (except for particularly egregious cases), preferring code elegance (although, that's debatable).

The only remaining reason to have "or" as a primitive is that as a subroutine, it's overridable. Not particularly important, except that "and" and "xor" are not overridable. Still not important, except that the pedant in me is screaming "WHY?!!"

PS: I just realized that there is no b_not operator, for realz. Being able to flip bits is important.

The Mad Cacti: I wonder whether it takes more memory in Java to implement b_or as a subroutine, or as an opcode. That's the only decent reason to do things like this.

Also, I'd like to point out that logor and lognot are shortcut evaluating. I see your 'or' and 'and' aren't, you could keep these if you want (as I see you've used them elsewhere), but I think it would be a good idea to stay fully compatible with HS, by converting logor and logand to equivalent if chains.


S'orlok Reaves: Allow me to mix replies & paraphrase:

>>(Mike): code elegance... (is) debatable
Point taken; as a kind of "Henceforth API", the "or" command's not really so elegant.

>>(Mike): there is no b_not operator
Thanks, I must have missed it. I'll add it to the spec before the 1.0 release.

>>(Mike): "or" ... (is) overridable
Oh, yes. I'd prefer to make all the primitives overridable, but I don't need to profile to know how slow that would be. (An indirect lookup to commonly used functionality like "add"? No thank you, sir.)

>>(TMC): whether it takes more memory in Java to implement b_or as a subroutine, or as an opcode
b_or as a subroutine will take slightly more memory, partly because the subroutine data structures in Henceforth are too "heavy" (having their own string tables, etc.). Even keeping the un-necessary structures null until used, we've still got a 32-bit pointer for each of them, and these add up. I have two thoughts on how to approach this:

  1. Don't bother; the difference is so minute
  2. In the far future, consider having a means of bytecode injection. Thus, slow-moving games can "compile" Henceforth into the JAR itself.

...but the second is years ahead, if ever.

>>(TMC): by converting logor and logand to equivalent if chains
Yes, I agree. You and Mike both put forth some good ideas, but this is the deal-breaker: I am fairly certain the Java version of this will be significantly faster & more elegant if we start getting into if statements and long code. I'll fix the specs before 1.0

Finally....
Before I get around to implementing the 1.0 spec, I need to add your changes, but I also need to fix a glaring weakness in the current spec. Essentially, some sub-routines are "Forth-like" --they are naturally responsible enough to operate on their parent's stack without bothering with any messy "return values". Recall from earlier that Forth-like HSpeak conversion is unwise, since "wait" commands could screw up multi-tasking stack sharing. But, for things like "exp" and "guess_x", eschewing creation of a child stack is a big win. I want this in 1.0, but right now the spec's too big for me. (I get confused easily.... and I've never written a real interpreter before.) So, I'm going to write a simple text-editor version of Henceforth, to get myself acquainted with the language I'm designing. Then, I'll tidy up 1.0 and release it with the next OHRRPGCE-FMF release ("Idiot Savant" or "Broasty", depending on popularity).

Signing out....

Mike C. 16:36, 23 November 2008 (UTC): Hmm, I never really thought about the stack sharing issue for the cooperative multi-uni-tasking we have. Perhaps the best idea would be to spawn threads when triggering new scripts (eg, on step scripts), and keep all the scripts on another, separate stack, and only execute the top script. That way, we keep everyone's stacks clean and apart, and retain the "only the last script executes" behaviour.

(Actually, I would like to see this happen to the OHR itself, but we'll take it one step at a time)

To switch from this to proper cooperative multitasking, you need only turn the stack into an array, an iterate over it. All the hard work was done before.

Also:

In the far future, consider having a means of bytecode injection.

Can you do that? I know you could write a custom class loader and thus generate classes at run time, but I was unaware that you could modify loaded classes like that.


S'orlok Reaves: I like your multi-threaded idea for the standard OHR, but I feel it's a bit of an overkill for the HVM. The goal is not to prevent stack mashing --I already do that by giving each subroutine its own stack, as you suggested earlier. Rather, the goal is to tighten constraints a bit and get a performance boost for small subroutines that get called often. In other words, if people write Henceforth code directly for their game, I want it to be fast & modularizeable. But, I'll consider the separate thread idea --I haven't sat down and really diagrammed it out yet.

Bytecode injection at runtime is slow, and it involves injecting code just before you finish loading the class. I'm looking for a way to inject bytecode statically; that way, you could take a game like "Trailblazers", and "compile" it with the OHRRPGCE-FMF as a library, instead of the other way around. This could be done by simply generating java source and re-compiling the entire project, but I think injecting would be simpler and faster. Again, though, I haven't fully looked into it yet.