User talk:Zzo38: Difference between revisions

From OHRRPGCE-Wiki
Jump to navigation Jump to search
No edit summary
(I may very well be wrong)
Line 53: Line 53:
* Japanese (including hiragana, katakana, kanji, and punctuation)
* Japanese (including hiragana, katakana, kanji, and punctuation)
* Common icons used in OHRRPGCE
* Common icons used in OHRRPGCE
* Some lowercase letters with apostrophe before (some of these are used in Pokemon, and may make sense in the text boxes of RPGs)
* Some lowercase letters with apostrophe before (some of these are used in Pokemon)
* Accented letters (some are present in Unicode; some aren't)
* Accented letters (some are present in Unicode; some aren't)
* Other precomposed characters (some of which might be present in Unicode; others might not be)
* Other precomposed characters (some of which might be present in Unicode; others might not be)
Line 62: Line 62:


There is also the question of what font sizes will be supported, and if they will be fixed, variable, or narrow/wide. In the case of variable, letters preceded by apostrophes might not need their own code points. In the case of narrow/wide, Unicode is especially bad; some widths may be ambiguous (especially private characters), and they may change with newer versions of Unicode, meaning the program won't work properly unless you use a character set and encoding meant to support this use (one thing I had partially designed some time ago is such a character set and encoding specifically for narrow/wide fix pitch text without complex rendering). Fixed is probably simplest, though.
There is also the question of what font sizes will be supported, and if they will be fixed, variable, or narrow/wide. In the case of variable, letters preceded by apostrophes might not need their own code points. In the case of narrow/wide, Unicode is especially bad; some widths may be ambiguous (especially private characters), and they may change with newer versions of Unicode, meaning the program won't work properly unless you use a character set and encoding meant to support this use (one thing I had partially designed some time ago is such a character set and encoding specifically for narrow/wide fix pitch text without complex rendering). Fixed is probably simplest, though.
Now that I think more about it, I think that I may very well be wrong about many of the things in this message (but am still very unsure).


--[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:49, 20 October 2021 (PDT)
--[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 21:49, 20 October 2021 (PDT)

Revision as of 23:45, 20 October 2021

Hi

I responded to your RELOAD comments including the textual format at Talk:RELOAD.

To run game.exe faster/slower, use Ctrl+Plus/Minus. That changes number of milliseconds per frame (shown in corner). You can also use the --runfast arg if you want no waiting at all for something like testcases.

You can give negative experience with the giveexperience command. I guess that there's no reason not to allow negative experience from enemies as well.

All of the suggestions are excellent and I'll add any that aren't already there to my TODO list. Several of them will be pretty easy to add. --TMC (talk) 05:56, 29 November 2014 (PST)


Hi, good to hear from you again! I very recently changed and cleaned up how ohrrpgce_config.ini works (I simplified it a lot as there were too many problematic edgecases) so am likely to start adding many more settings. You have a long list of requests here I could pick through for ones worth the time. In the current wip version you can now put game.gfx.fullscreen = no in ohrrpgce_config.ini to override the default fullscreen setting in all games.

BTW, I've just started on moving to UTF8. I've also designed my own space-efficient 8-bit-extended-ASCII-compatible character encoding to be used in existing fixed-length string fields, to allow more than 256 charcters in the font without switching those fields to UTF8. (Previously we discussed at Talk:RELOAD#Comments_about_file_format.) Have you ever used characters 1-31 in a game, aside from \n and \t? I want to repurpose those for codes such as "switch codepage". Also, I agree that it's better not to move icons in existing fonts out of the C1 control area, but need to think about what to do about characters 161-255 used for icons. --TMC (talk) 19:40, 14 October 2021 (PDT)

I have used them (with PC fonts imported into OHRRPGCE), although the games have not been released yet. However, hopefully it shouldn't be too difficult to be a global definition (perhaps by the format of the font lump) to disable the new feature if it is necessary to do so; if not, the game can be changed easily enough to support the new format I suppose. ASCII codes 14 ("shift out") and 15 ("shift in") are standard codes for switching code pages, so it may be a good idea to use those (if they are enough; they might not be). Escapes could also be used (including to display characters in the 0-31 positions of fonts, possibly). For existing fonts that do not use 0-31, there is probably nothing to change and it should just work (at least for display). --Zzo38 (talk) 23:27, 14 October 2021 (PDT)
I forgot that those two ASCII codes exist; I've never seen them used. But my encoding so far uses 12 control codes (it's not finalised) in order to be more compact, plus more for compressed text markup codes such as to change font or colour. Shift In/Shift Out don't match how my encoding works. Aside from \n and \t I'll also avoid \r since some OHR file formats use it, and the first few characters since there are fixed icons there in all OHR fonts, used by Custom - I think I've used those in a game at least once.
Disabling the new encoding if the game's font is in the old .fnt format seems simple enough that I can do that. I didn't really want to add an explicit setting for it, because it would be useless for almost everyone, dangerous to change, adds more code, etc. --TMC (talk) 05:11, 15 October 2021 (PDT)
OK, then do that; use twelve codes if that is how many you need. What is your encoding so far? If we can see, then it can be discussed. About disabling the new encoding when the old .fnt format is used, that seems reasonable, and is what I meant. --Zzo38 (talk) 09:51, 15 October 2021 (PDT)
Here is a draft decoder in FB. I'm not happy with the scheme yet. I'll create an article for it soon and write more. (I realised I should implemented the decoder/encoder in C instead so other projects can reuse it.) I found two unicode encoding/compressing schemes that are very similar to mine, SCSU and UTF-C. Neither are popular. Despite being published on unicode.org, it seems almost no implementations of SCSU exist. Plus, SCSU is pretty complex and even UTF-C is more complex than I want, so I think I will use my own. Here is FB code for a decoder.
I tested the scheme out on the FF6 Advance Japanese script (version with kanji). After removing the line prefixes, the characters are 21% ASCII, 39% Hiragana, 16% Katakana, 25% other (Kanji). Without using the "non-jump" codes, the text encodes to 1.706 bytes per character on average (with the additional codes compression will be slightly better). But 10-bit relative characters provide almost no additional benefit over 9-bit (1.707 Bpc) or even 8-bit ones (1.715 Bpc), so I should probably switch to 9-bit ones. --TMC (talk) 06:35, 18 October 2021 (PDT)
I think that treating the text fundamentally as Unicode is a mistake. Although many programs do this, I think that it is a bad idea (and my own projects aren't using Unicode, even when I do want to allow Japanese and other languages). Better would be just to use the font with multiple pages; they would be aligned as appropriate for the kinds of characters being used. A single shifting scheme can be used for the purpose of OHRRPGCE, although what each page means may vary depending on the font in use. (If translation between character sets/encodings is needed, another lump could be added to translate, although most of the program will probably not need such a thing.) (I believe that no one character set can ever be suitable for all uses, neither Unicode nor TRON nor anything else.) So, you probably won't need as much code space as Unicode (although I don't know if some uses might need as much or more, but probably OHRRPGCE needs less), and may wish to distinguish single-byte and double-byte pages (you can use double-byte pages for kanji, perhaps). --Zzo38 (talk) 11:04, 18 October 2021 (PDT)
Using unicode does have some disadvantages: it has a large space of codepoints which leads to less efficient encoding schemes (such as utf8), it has modifier/combining codepoints and other control codes, which we wouldn't try to support aside from simply drawing accents over the previous character, and it has multiple forms for the same character using different combining forms (I already have partially_normalise_unicode() in unicode.c to clean these up. But ultimately, the engine receives unicode input (as the user types, or when importing files) and aside from normalising composite characters it seems like extra complication to invent our own character set, which is ultimately just going to be based on unicode codepoints anyway, so it would be more correct to call it a subset of unicode. Fonts will still be organised into pages though. So you can add a Hiragana codepage, which provides characters for the unicode Hiragana block, initialised to defaults from an external font file (e.g. ttf), and customise them. As for the inefficiency of utf8, I don't like it, but it matters extremely little, and it's the standard and therefore the sensible thing to use for variable length strings. --TMC (talk) 17:13, 18 October 2021 (PDT)
OK, but what I am saying is to internally not use Unicode for text handling and use only the encoding of the fonts, and for translation to be performed separately when receiving input (and when exporting if needed). One option can be used to enable or disable character code translation when reading/writing external files (in case you do not want to use Unicode; I will not want to use Unicode). I am not trying to say that OHRRPGCE should define its own character set; it specifically should not, and also should not care (although it would have its own encoding scheme to select pages, similar to what you wrote). Perhaps, when editing fonts, you can specify the number that the game encoding uses, and optionally specify the corresponding Unicode code point (or pair of Unicode code points, which may sometimes be necessary); the Unicode numbers would then be used only for input and not for internal handling. I think that your idea of shift coding is mostly good (although with perhaps a few differences like I mentioned); your scheme is better than SCSU and UTF-C. (I try to write my suggestion/arguments but if you dislike it then you can do it your own way, I suppose. My ideas is merely just that--the idea, only. Note also that I have not worked with OHRRPGCE recently (and don't know when or if I will in future, although one thing I have done recently is to add OHRRPGCE and some related file formats into Just Solve the File Format Problem wiki)) --Zzo38 (talk) 18:07, 18 October 2021 (PDT)
Suggestions are welcome. Internal (in-memory) encoding of text and on-disk encoding are two separate issues. Given that we want fonts with more than 256 characters, only variable-length encodings are feasible; UTF-16 (or UCS2) is not an option for internal usage because it would be too much work to update all existing code, and not for on-disk usage in existing non-RELOAD file formats either because it's not compatible. UTF8 has various nice properties such as being stateless that make it good among variable-length encodings as an in-memory encoding (while my scheme is not), except that it is more complex to iterate over characters. Your shifted codepage idea makes iteration a little easier but is stateful (Edit: wait, being stateful makes iteration more complex), so I'm not sure it or anything else is a better alternative. (However we should use only normalised Unicode and avoid or even completely disallow combining characters; they are indeed a nightmare. I haven't thought enough about this.) Also, using the same on-disk encoding for all games instead of each game defining its own makes it easier to process game data files -- however admittedly the font editor already has a toggle between ASCII and Latin-1, so perhaps that boat has sailed.
Also, thanks for adding those pages to JSTFFP! It's a great project.
> I think that your idea of shift coding is mostly good (although with perhaps a few differences like I mentioned)
I'm confused about what differences you mean.
I agree that UTF-16 is a bad idea. I agree UTF-8 does have nice properties (even though I think Unicode isn't very good, UTF-8 is not a bad design). I agree also that each game can use the same on disk encoding, although the character sets of different games will not necessarily match even if the encoding does (this is already the case; this is simply extending it to work with large character sets); it is only guaranteed to be a superset of ASCII (and not necessarily a subset of Unicode; the characters that aren't in Unicode (if any) then just can't be input directly). You could use temporary shifts only if permanent shifts make it difficult (with only tempotary shifts, it will be stateless); the font used in the game would encode the languages more commonly used with smaller internal numbers in order to have shorter encoding sequences, I suppose. About combining characters, yes it is one of many messy things in Unicode. It is one reason why I suggested doing translation independently from internal handling (which might simplify some things at the cost of making others more complicated), and allowing a character in the game's font to optionally correspond to a sequence of up to two Unicode code points (or maybe more, but more than two would likely overcomplicate it too much), in case the game uses a character that is not available precomposed in Unicode, it would still allow that character to be translated correctly (if possible). (Other possibilities are to use private characters (which is possible in Unicode), or to abandon Unicode compatibility with characters not available in Unicode.) --Zzo38 (talk) 22:20, 18 October 2021 (PDT)
You've been using "encoding" and "character set" more carefully than me; I've been sloppy. I haven't totally made up my mind about any of this, because I don't entirely like any of the options. I agree that UTF8 is not so bad an encoding, and that a custom 1-2 byte encoding (e.g. one byte for 7-bit ASCII or two bytes a, b with 128<=a<b for up to 8256 characters) is possible if we use custom character sets, which could have nearly all the advantages of UTF8. But assuming that everyone already has easy access to utf8 encoding/decoding routines I don't think it's worth the extra work. Also, not having as large a range of codepoints as Unicode is a problem, because I wanted to allow people to use characters that aren't in their font (which would display as '?'), and use the same character set (assignment of codepoints) for all games, for various reasons: you can have multiple fonts and the plan is to fallback from one to another (e.g. the editor font fallsback to the game default font); so fonts from different games are compatible with each other and people can share them (this is the most important consideration of all); and also to make it easier to export data such as slices and textboxes, from one game to another without conversions. And I agree that we need to use a character set which differs more or less from Unicode, but I wanted to use the same codepoints as Unicode when possible to avoid needing to convert. I wanted to let people add pages of glyphs in their font for whatever (Unicode) codepoints they want. I also thought about adding extra precomposed characters to avoid combining characters (there are unofficial assignments of some precomposed characters to Private Use Areas) but it seems like the sort of feature noone will ever use. We already have icons that exist outside Unicode, I was going to put these in the C1 block and 0xE000 PUA, plus allow people to keep using 0xA1-0xFF for icons if they want. I am planning to translate from Unicode codepoints to the actual character pages+char indices that exist in the font at the point that a string is drawn to the screen, rather than storing them in-memory in that way. --TMC (talk) 18:15, 20 October 2021 (PDT)
(reset indent below)

You mention some things, and this will take a while for me to figure out all of my replies. So, I will write some now, and may write some more later once I think of it. (I am also writing in paragraphs this time.)

I fail to see why you would want to use characters that aren't in the font. However, about sharing fonts and exporting slices/textboxes, it does seem like good points to have a universal character set (although there will still need a private area for characters not in the set).

Maybe 8256 characters might be enough; it is more than JIS X 0208, but less than JIS X 0213, so I don't know; but I would think that 65536 characters is probably more than enough. There will be both characters in Unicode undesirable for OHRRPGCE (such as combining characters, and probably complex scripts) and characters wanted to be added (including some icons and some precomposed characters not present in Unicode). Some characters may be needed encoding as multiple Unicode code points, or a character might be ambiguous in some cases. There are also characters in Unicode that are the same as each other but semantically different and shouldn't need to be different (e.g. Greek uppercase Alpha), etc. (I really believe that no one character set or encoding can ever be suitable for all uses, no matter how much you try.) Because of this, using the same code point numbers as Unicode might be unnecessary (except ASCII, which is the same as Unicode and should use the same code point numbers).

Perhaps the characters to encode might include:

  • ASCII (with the same ASCII numbers; it already is)
  • Characters in PC character set and ISO-8859-1 character set (many of these overlap each other and below things; they should not be duplicated)
  • Latin (including the non-Latin letters used in some Germanic languages, such as "รพ")
  • Cyrillic
  • Greek
  • Runes
  • Japanese (including hiragana, katakana, kanji, and punctuation)
  • Common icons used in OHRRPGCE
  • Some lowercase letters with apostrophe before (some of these are used in Pokemon)
  • Accented letters (some are present in Unicode; some aren't)
  • Other precomposed characters (some of which might be present in Unicode; others might not be)
  • Private area
  • Unassigned area

(I suppose, the above (or a variant of this) is then the "RPG game character set") However, I am unsure. Maybe Unicode does help, but maybe not. (There are advantages and disadvantages either way.)

There is also the question of what font sizes will be supported, and if they will be fixed, variable, or narrow/wide. In the case of variable, letters preceded by apostrophes might not need their own code points. In the case of narrow/wide, Unicode is especially bad; some widths may be ambiguous (especially private characters), and they may change with newer versions of Unicode, meaning the program won't work properly unless you use a character set and encoding meant to support this use (one thing I had partially designed some time ago is such a character set and encoding specifically for narrow/wide fix pitch text without complex rendering). Fixed is probably simplest, though.

Now that I think more about it, I think that I may very well be wrong about many of the things in this message (but am still very unsure).

--Zzo38 (talk) 21:49, 20 October 2021 (PDT)