Index
Conversion Tables

When opening a file with Simredo, you generally convert it from its original format (encoding) to Simredo's internal format (Unicode), and when saving it, you convert back to the original format. However, you can choose a different format for saving. By doing this, you can convert between various formats.

Another way of doing conversions is by using Simredo's Conversion Table feature. This feature allows you to easily define your own conversions.


How to Use Conversion Tables

Select Conversion from the 'Other' menu. A dialogue window similar to the one at right will appear.

There are three pull-down menus. The one at the top right is used to select the conversion table. In the screen shot shown here, the Vietnamese table is selected. The second are third menus are used to select the formats to convert from and to respectively.

VIRQ, which is shown selected in the From menu, is a standard method for representing Vietnamese accents when only ASCII characters are available. One or more ASCII symbols, such as ^, ~, +, etc., are placed after a vowel to indicate diacritical marks (eg. e^' for ế). Suppose a text file contains the following:

Ca'i ne^'t dda'nh che^'t kho^ng chu+`a.

Clicking on the Convert button will convert it to Unicode:

Cái nết đánh chết không chừa.

How to Create Conversion Tables

The Vietnamese conversions are defined in a file called Vietnamese.kon. This file contains a line corresponding to VIRQ and another corresponding to Unicode, as shown below. (The actual lines in the file are very long, so only the first few characters are shown here.)

VIRQ,A^',a^',A^`,a^`,A^?,a^?,A^~,a^~,A^.,a^.,A(',a(',A(`,a(`,

Unicode,Ấ,ấ,Ầ,ầ,Ẩ,ẩ,Ẫ,ẫ,Ậ,ậ,Ắ,ắ,Ằ,ằ,

When the conversion function is executed, each letter sequence in the VIRQ list is converted to the corresponding character in the Unicode list (except for the first item, which is the name of the list).

Conversion table files must be in UTF‑16 Big Endian format. As with keymap files, empty lines and lines beginning with a space are ignored. (Putting a space in front of a line is useful for entering comments.) Character strings are separated by commas. If you need to convert the comma itself, use the backslash‑u code \u002c to represent it. (In fact, you can use the backslash‑u code to represent any Unicode code.)

Conversion table files must have an extension of kon, for example, Armenian.kon, Vietnamese.kon, etc., and must be located in Simredo's program folder, with Simredo4.jar, the key map files, etc.

The Show Character Set feature can be useful when creating new conversion tables. Characters can be copied from the character field and pasted into the new conversion table.


Escape Codes

Programmers and HTML-designers may find the conversion table EscCodes particularly useful. This conversion table enables you to convert between the following codes:

Unicode values above 127,  eg., Ĝ
Backslash-u,  eg., \u011c
Decimal HTML,  eg., Ĝ
Hexadecimal HTML,  eg., Ĝ

The above conversions could not be implemented with simple comma separated lists, so they are hard-coded into Simredo. The file EscCodes.kon contains only the names of each kind of escape code.


Esperanto

The conversion routines for Esperanto's accented letters are quite sophisticated. As with escape code conversions, they could not be implemented with simple lists, so they are also hard-coded.

When you convert from the h-method, Simredo looks for the Esperanto spelling dictionary. If the dictionary is available, Simredo uses it to test whether letter pairs with h should or should not be converted. In this way, a sentence such as 'Shi iris al la flughaveno' is correctly converted to: 'Ŝi iris al la flughaveno.'

The lists in Esperanto.kon are provided for reference. They are not actually read by Simredo. If you want to create a new set of conversions for Esperanto, I suggest that you copy Esperanto.kon to Esperanto2.kon, and make changes in the new file.


Index