StringCodes

From GRFSpecs
Revision as of 19:23, 12 June 2011 by Orudge (talk | contribs) (13 revisions)
Jump to navigationJump to search

Description of characters in TTD's character set

String Codes

Texts in TTD are mostly in the Latin-1 (ISO-8859-1) character set (except when using UTF-8 encoding; see below), however a few characters are different.  Also, some characters have special meaning. These are explained in the following table.

||Range, hex|Meaning

00..1F|Control characters, unused except for the following:

|01 X offset in next byte of string (variable space)

|0D New line

|0E Set small font size

|0F Set large font size

|1F X and Y offsets in next two bytes of string

20..7A| Latin-1 characters, from space " " up to lower case "z"

7B..87| Formatting instructions, all take their argument from the stack if not otherwise specified

|7B Print dword

|7C Print signed word

|7D Print signed byte

|7E Print unsigned word

|7F Print dword in currency units

|80 Print substring (text ID from stack)

|81 Print substring (text ID in next 2 bytes of string)

|82 Print date (day, month, year)

|83 Print month and year

|84 Print signed word in speed units

|85 Discard next word from stack

|86 Rotate down top 4 words on stack

|87 Print signed word in litres

88..98|Colour codes

|88 Blue

|89 Light Gray

|8A Light Orange ("Gold")

|8B Red

|8C Purple

|8D ))Gray-Green[[]]Gray-Green[[

|8E Orange

|8F Green

|90 Yellow

|91 Light Green

|92 ))Red-Brown[[]]Red-Brown[[

|93 Brown

|94 White

|95 Light Blue

|96 Dark Gray

|97 Mauve (grayish purple)

|98 Black

99|Switch to company colour that follows in next byte (enabled by enhancegui)

9A|Extended format code in next byte:

|00 -or- 01 Display 64-bit value from stack in currency units

|02 Ignore next colour byte. Multiple instances will skip multiple colour bytes.

|03 WORD Push WORD onto the textref stack

|04 BYTE Un-print the previous BYTE characters.

|05 For internal use only. Not valid in GRF files.

|06 Print byte in hex (since TTDPatch r2007)

|07 Print word in hex (since TTDPatch r2007)

|08 Print dword in hex (since TTDPatch r2007)

|09 For internal use only. Usage in NewGRFs will most likely crash TTDPatch. (since TTDPatch r2128)

|0A For internal use only. Usage in NewGRFs will most likely crash TTDPatch. (since TTDPatch r2128)

|0B Print 64-bit value in hex (since TTDPatch r2178)

|0C Print name of station with id in next textrefstack word (since TTDPatch r2178)

|0D Print signed word in tonnes (since OpenTTD r21086)

|0E Set gender of string, NewGRF internal ID in next byte. Must be first in a string (since OpenTTD r21209) (*)

|0F Select case for next substring, NewGRF internal ID in next byte (since OpenTTD r21209) (*)

|10 Begin choice list value, NewGRF internal ID in next byte (since OpenTTD r21211) (**)

|11 Begin choice list default (since OpenTTD r21211) (**)

|12 End choice list (since OpenTTD r21211) (**)

|13 Begin gender choice list, stack offset of substring to get gender from in next byte (since OpenTTD r21211) (**)

|14 Begin case choice list (since OpenTTD r21211) (**)

|15 Begin plural choice list, stack offset of value to get plural for in next byte (since OpenTTD r21216) (***)

9B..9D|Reserved

9E..FF|Latin-1 characters, except for the following:

|9E Euro character "€"

|9F Capital Y umlaut "Ÿ"

|A0 Scroll button up

|AA Scroll button down

|AC Tick mark

|AD X mark

|AF Scroll button right

|B4 Train symbol

|B5 Truck symbol

|B6 Bus symbol

|B7 Plane symbol

|B8 Ship symbol

|B9 Superscript -1

|BC Small scroll button up

|BD Small scroll button down||

The formatting instructions must not be used except in strings that expect them, and then they may not be out of order (with the possible exception of code 86 shuffling the internal stack). When used improperly, they will most likely crash TTD. Code 81 is always safe to use (provided that the referenced text ID uses no unsafe formatting instructions either), and will insert the given text ID (e.g. "\81\3D\A0" will insert text ID A03D, "\98Refit Aircraft"). Note however that if you want to include e.g. ID D000/D400, the 00 byte will be considered the end of string, and this will therefore break if additional texts are supposed to follow in the action 4. DCxx IDs must not be included; neither codes 80 nor 81 correctly access DCxx IDs.

Each formatting instructions removes its argument from the stack, so that the next one will receive the following bytes as arguments. Code 86 takes the top four words from the stack, let's call them W1 through W4, and reorders them as W4 W1 W2 W3. This is used for languages in which industries or stations should be named not "Flinfingbury Power Plant" but "Power Plant Flinfingbury".

(*) Maps a NewGRF internal gender or case ID to an OpenTTD gender or case. The internal ID is resolved to the appropriate OpenTTD gender or case at load time by means of the mapping. The first internal ID in the mapping that matches the ID from the string and has an existing OpenTTD gender or case is taken, i.e. the list of mappings is filtered by internal ID and existance of the OpenTTD gender/case and then the top element is used. When the gender or case ID is not known, or there is no existing OpenTTD gender or case with the mapped names the whole mapping is ignored and the default gender or case is taken.

-=Example=-

~pp~// Gender translation table

// Current OpenTTD German translation uses m, w, n and p but

// support a (fictitious) previous version that used masculine,

// feminine, neuter and plural as gender names.

 0 * 56     00 08 01 01 02

    13

       01 "m" 00

       01 "masculine" 00

       02 "w" 00

       02 "feminine" 00

       03 "n" 00

       03 "neuter" 00

       04 "p" 00

       04 "plural" 00

       00

// Brauerei is a female word in German; this sets it as female.

 1 * 40     04 0A 82 01 73 DC C3 9E 9A 0E 02 "Brauerei" 00

~/pp~

In this case OpenTTD would look for NewGRF internal ID 2 in the gender table. This would yield "w" and "feminine" as OpenTTD gender names. In current OpenTTD this would match "w", in the fictitious older version of OpenTTD it will match "feminine".

(**) Maps an OpenTTD gender or case to the NewGRF internal gender or case ID. The mapping is resolved at load time by going through all cases or genders OpenTTD's translation knows an mapping these to NewGRF internal IDs. If mapping is found the default choice list item is chosen. This happens by filtering the mapping on the gender or case name and then the NewGRF internal ID of the top element is used.

The choice list string codes are related and must be used in a specific manner:

Genders: 9A 13 <offset> (9A 10 <index> <string>)+ 9A 11 <default> 9A 12

Cases: 9A 14 (9A 10 <ndex> <string>)+ 9A 11 <default> 9A 12

Plurals: 9A 13 <offset> (9A 10 <index> <string>)+ 9A 11 <default> 9A 12

The offset is the stack location of the substring/value you want to get the gender/plural for. This is the real offset plus 80, i.e. an offset of 0 becomes 80 and an offset of 1 becomes 81 in the NFO.

-=Example=-

~pp~// Assuming the translation table of the previous example

// A string with a gender choice list and a stack item that gets resolved

 2 * 29     04 0A FF 01 1A DC "D" 9A 13 80 9A 10 1 "er" 9A 10 3 "as" 9A 11 "ie" 9A 12 " " 80 00

~/pp~

Imagine the "Brauerei" from the previous example being, as substring, on the stack. Then this string would resolve to "Die Brauerei".

What happens in OpenTTD is that whenever the "begin gender choice list" string code is found it will resolve the string at the given stack location. Of that resolved string the first character is compared to the "set gender" string code and if that is the case the (mapped) OpenTTD gender is retrieved. When there is "set gender" string code the first OpenTTD gender is used. After resolving the OpenTTD gender that gender is reverse mapped to a NewGRF internal ID. If that NewGRF internal ID exists in one of the "choice list values" that (sub)string is taken (up till the next choice list value/default). If there is no reverse mapping the string at the "choice list default" string code is used up till the "end choice list" string code. Further processing of the string happens after the choice list, i.e. the (sub)strings in the choice list may not contain any special string codes except colour codes.

Case choice lists work in a similar matter, except that instead of resolving a case from a (sub)string we "are" the substring; the string that includes this substring has set a case using the "select case" string code. As such no offset has to be given to the choice list, but the rest works in the same way as gender choice lists.

(***) The plural list works like a gender list, however you have to choose one "mapping" from value to plural index by setting the plural form using Action0GeneralVariables property 15.

If, for example, plural form 0 is chosen using the Action0GeneralVariables property 15, then there are 2 plural indices. If the value at the stack with the given offset equals 1 you get plural index 1, otherwise plural index 2. These plural indices are the indices that are used in the choice lists.

||Plural form|Plural index|Description

0|Two forms:

|1|1

|2|rest

1|Only one form:

|1|every form

2|Two forms:

|1|0 or 1

|2|rest

3|Three forms:

|1|ending in 1, but not ending in 11

|2|0

|3|rest

4|Five forms:

|1|1

|2|2

|3|3-6

|4|7-10

|5|rest

5|Three forms:

|1|ending in 1, but not ending in 11

|2|ending in 2-9, but not ending in 1~91~2-9~93~

|3|rest

6|Three forms:

|1|ending in 1, but not ending in 11

|2|ending in 2-4, but not ending in 1~91~2-4~93~

|3|rest

7|Three forms:

|1|0

|2|ending in 2-4, but not ending in 1~91~2-4~93~

|3|rest

8|Four forms:

|1|ending in 01

|2|ending in 02

|3|ending in 03 or ending in 04

|4|rest

9|Two forms:

|1|ending in 1, but not ending in 11

|2|ret

10|Three forms:

|1|1

|2|2-4

|3|rest

11|Two forms:

|1|ending in 0, 1, 3, 6, 7 and 8

|2|ending in 2, 4, 5 and 9

12|Four forms:

|1|1

|2|0 or ending in 02-10

|3|ending in 11-19

|4|rest||

-=Example=-

~pp~// Gender translation table

// Set the plural type to type 0

 0 * 7     00 08 01 01 02 15 00

// In case of the first stack item being 1 use "Tonne", otherwise use "Tonnen"

 1 * 34     04 0B 82 01 1A DC C3 9E "\UE07C Tonne" 9A 15 80 9A 10 01 "" 9A 11 "n" 9A 12 " Sand" 00

~/pp~

UTF-8 support

Since 2.0.1 alpha 68, TTDPatch supports UTF-8 encoded input strings. Use action 12 to define glyphs for the characters which do not exist in TTD's .grf files (possible since 2.0.1 alpha 73).

To indicate that a given string is in UTF-8 encoding, start it with a capital thorn (U+00DE, "Þ"), encoded in UTF-8 as usual with the bytes C3 9E.  Everything in that string is then assumed to be in UTF-8 encoding, with the following exception: if characters appear that are not valid UTF-8 sequences, they are assumed to be one of the above control codes. This way, it is still possible to write, e.g. "ÞCapacity: " 87 "litres", without encoding the 87 in UTF-8 (which would instead refer to a character installed at codepoint U+0087).

In addition, this allows using the non-Unicode characters 9E, 9F, A0, AA, AC, AD, AF, B4..B9, BC and BD from the above list, which when encoded with UTF-8 would refer to their respective Unicode characters instead. To use the TTD characters, simply do not encode them using UTF-8 but enter them directly as bytes. This causes them to be an invalid UTF-8 sequence and allows TTDPatch to use the correct symbol from TTD's fonts.

Alternatively, these symbols (in fact, the TTD character set from 20 to FF) are also mapped into the Unicode Private Use Area at U+E0xx, so to encode the truck symbol, you may use character U+E0B5 as well, although this will probably be an unprintable character in most text editors.

Finally, characters 7B..7F no longer function as the above formatting instructions, but will display regular glyphs instead (provided they are installed; by default TTD has none at these codepoints). Instead, to use these formatting instructions in UTF-8 mode, you need to use their Private Use Area codepoint at U+E0xx.

Basically there are three possibilities:

  1. Characters U+E020..U+E0FF  in the E0xx Private Use Area do what their respective character xx would do in TTD, as do the control characters below U+0020
  2. All other valid UTF-8 sequences display actual glyphs, if they are available
  3. All invalid UTF-8 sequences do what their individual bytes would do in TTD

To summarize, here's a handy table:

||Character|Encoding in UTF-8 mode|Meaning

7E|7E|Unicode Character 'TILDE' (~)

82|82 (invalid UTF-8)|Print date (day, month, year)

82|C2 82|Display glyph for U+0082

AC|AC (invalid UTF-8)|Tick mark

AC|C2 AC|Unicode Character 'NOT SIGN' (¬)

E07E|EE 81 BE|Print unsigned word

E082|EE 82 82|Print date (day, month, year)

E0AC|EE 82 AC|Tick mark||