Difference between revisions of "StringCodes"

From GRFSpecs
Jump to navigationJump to search
(No difference)

Revision as of 16:58, 15 August 2009

Description of characters in TTD's character set

String Codes

Texts in TTD are mostly in the Latin-1 (ISO-8859-1) character set (except when using UTF-8 encoding; see below), however a few characters are different.  Also, some characters have special meaning. These are explained in the following table.

||Range, hex|Meaning

00..1F|Control characters, unused except for the following:

|01 X offset in next byte of string (variable space)

|0D New line

|0E Set small font size

|0F Set large font size

|1F X and Y offsets in next two bytes of string

20..7A| Latin-1 characters, from space " " up to lower case "z"

7B..87| Formatting instructions, all take their argument from the stack if not otherwise specified

|7B Print dword

|7C Print signed word

|7D Print signed byte

|7E Print unsigned word

|7F Print dword in currency units

|80 Print substring (text ID from stack)

|81 Print substring (text ID in next 2 bytes of string)

|82 Print date (day, month, year)

|83 Print month and year

|84 Print signed word in speed units

|85 Discard next word from stack

|86 Rotate down top 4 words on stack

|87 Print signed word in litres

88..98|Colour codes

|88 Blue

|89 Light Gray

|8A Light Orange ("Gold")

|8B Red

|8C Purple

|8D ))Gray-Green[[]]Gray-Green[[

|8E Orange

|8F Green

|90 Yellow

|91 Light Green

|92 ))Red-Brown[[]]Red-Brown[[

|93 Brown

|94 White

|95 Light Blue

|96 Dark Gray

|97 Mauve (grayish purple)

|98 Black

99|Switch to company colour that follows in next byte (enabled by enhancegui)

9A|Extended format code in next byte:

|00 -or- 01 Display 64-bit value from stack in currency units

|02 Ignore next colour byte. Multiple instances will skip multiple colour bytes.

|03 WORD Push WORD onto the textref stack

|04 BYTE Un-print the previous BYTE characters.

|05 For internal use only. Not valid in GRF files.

|06 Print byte in hex (since TTDPatch r2007)

|07 Print word in hex (since TTDPatch r2007)

|08 Print dword in hex (since TTDPatch r2007)

|09 For internal use only. Usage in NewGRFs will most likely crash TTDPatch. (since TTDPatch r2128)

|0A For internal use only. Usage in NewGRFs will most likely crash TTDPatch. (since TTDPatch r2128)

|11 Print 64-bit value in hex (since TTDPatch r2178)

|12 Print name of station with id in next textrefstack word (since TTDPatch r2178)

9B..9D|Reserved

9E..FF|Latin-1 characters, except for the following:

|9E Euro character "€"

|9F Capital Y umlaut "Ÿ"

|A0 Scroll button up

|AA Scroll button down

|AC Tick mark

|AD X mark

|AF Scroll button right

|B4 Train symbol

|B5 Truck symbol

|B6 Bus symbol

|B7 Plane symbol

|B8 Ship symbol

|B9 Superscript -1

|BC Small scroll button up

|BD Small scroll button down||

The formatting instructions must not be used except in strings that expect them, and then they may not be out of order (with the possible exception of code 86 shuffling the internal stack). When used improperly, they will most likely crash TTD. Code 81 is always safe to use (provided that the referenced text ID uses no unsafe formatting instructions either), and will insert the given text ID (e.g. "\81\3D\A0" will insert text ID A03D, "\98Refit Aircraft"). Note however that if you want to include e.g. ID D000/D400, the 00 byte will be considered the end of string, and this will therefore break if additional texts are supposed to follow in the action 4. DCxx IDs must not be included; neither codes 80 nor 81 correctly access DCxx IDs.

Each formatting instructions removes its argument from the stack, so that the next one will receive the following bytes as arguments. Code 86 takes the top four words from the stack, let's call them W1 through W4, and reorders them as W4 W1 W2 W3. This is used for languages in which industries or stations should be named not "Flinfingbury Power Plant" but "Power Plant Flinfingbury".

UTF-8 support

Since 2.0.1 alpha 68, TTDPatch supports UTF-8 encoded input strings. Use action 12 to define glyphs for the characters which do not exist in TTD's .grf files (possible since 2.0.1 alpha 73).

To indicate that a given string is in UTF-8 encoding, start it with a capital thorn (U+00DE, "Þ"), encoded in UTF-8 as usual with the bytes C3 9E.  Everything in that string is then assumed to be in UTF-8 encoding, with the following exception: if characters appear that are not valid UTF-8 sequences, they are assumed to be one of the above control codes. This way, it is still possible to write, e.g. "ÞCapacity: " 87 "litres", without encoding the 87 in UTF-8 (which would instead refer to a character installed at codepoint U+0087).

In addition, this allows using the non-Unicode characters 9E, 9F, A0, AA, AC, AD, AF, B4..B9, BC and BD from the above list, which when encoded with UTF-8 would refer to their respective Unicode characters instead. To use the TTD characters, simply do not encode them using UTF-8 but enter them directly as bytes. This causes them to be an invalid UTF-8 sequence and allows TTDPatch to use the correct symbol from TTD's fonts.

Alternatively, these symbols (in fact, the TTD character set from 20 to FF) are also mapped into the Unicode Private Use Area at U+E0xx, so to encode the truck symbol, you may use character U+E0B5 as well, although this will probably be an unprintable character in most text editors.

Finally, characters 7B..7F no longer function as the above formatting instructions, but will display regular glyphs instead (provided they are installed; by default TTD has none at these codepoints). Instead, to use these formatting instructions in UTF-8 mode, you need to use their Private Use Area codepoint at U+E0xx.

Basically there are three possibilities:

  1. Characters U+E020..U+E0FF  in the E0xx Private Use Area do what their respective character xx would do in TTD, as do the control characters below U+0020
  2. All other valid UTF-8 sequences display actual glyphs, if they are available
  3. All invalid UTF-8 sequences do what their individual bytes would do in TTD

To summarize, here's a handy table:

||Character|Encoding in UTF-8 mode|Meaning

7E|7E|Unicode Character 'TILDE' (~)

82|82 (invalid UTF-8)|Print date (day, month, year)

82|C2 82|Display glyph for U+0082

AC|AC (invalid UTF-8)|Tick mark

AC|C2 AC|Unicode Character 'NOT SIGN' (¬)

E07E|EE 81 BE|Print unsigned word

E082|EE 82 82|Print date (day, month, year)

E0AC|EE 82 AC|Tick mark||