Description

cin 5 file formats nosh cin CIN files for input methods Description console-input-method1 is table-driven, controlled by data files containing tables of mappings from the original ASCII composition element sequences to the unconverted and converted characters to display. These are CIN files, that are compatible with OpenVanilla, the Chinese Open Desktop, xcin, gcin, hime, OkidoKey.app, and MacOS. More general specifications for CIN files can be found at the Chinese MAC WWW site and in the Chinese MAC discussion forum. This is a subset of the full file specification. In particular, this subset does not encompass alternative file encodings. Files are always to be encoded in UTF-8. CIN files comprise a sequence of lines, which divide into three types: comment lines Comment lines begin with a # character as their first character. They are wholly ignored. directive lines Directive lines begin with a % character as their first character. The following directives are recognized: %keep_key_case In the absence of this directive, the ASCII source parts of mappings are considered to be case-insensitive. Case-insensitivity is the appropriate default for East Asian languages, but for some of them and for specialist mappings for European scripts where the target parts can be in differing cases, case sensitivity is more useful. %keyname %keyname begin and %keyname end directives enclose a block of data lines that define the "engravings" for each ASCII letter. The "engraving" controls how an ASCII character is displayed on-screen by console-input-method1 when it is in its unconverted form. The first field must be a single ASCII character, and the second field is its UTF-8 display string. %chardef %chardef begin and %chardef end directives enclose a block of data lines that define the mappings from a sequence of ASCII characters, given in the first field, to a UTF-8 string, given in the second field. It is an error for the first field to contain anything other than ASCII characters. Neither source forms nor what they are mapped to may contain whitespace. In console-input-method1 the SPC character when input acts as a separator between multiple mappable strings that are not intended to be treated as one single larger source string, and so cannot be part of the source form of a mapping in any case. (e.g. The input sequence TH OR N does not match a data line that maps the input sequence THORN, only the individual sequences TH, OR, and N.) data lines Data lines begin with any other character as their first character. These comprise 2 whitespace-separated fields. It is an error for there to be anything other than exactly 2 fields per line. Leading whitespace is treated as extraneous separator characters and stripped. So to encode data lines where the first field begins with a # or a % character, simply prepend leading whitespace. Examples The Hangul CIN file from unicon exemplifies how the "engravings" cause ASCII input sequences to be displayed as Jamo when unconverted: A ㄼ b ㅠ B ㅟ Its mappings block exemplifies how mappings files can need to be case sensitive: rn 구 rN 궤 The romaji-X11 CIN file exemplifies both long mappings (that do not match per ) and how data lines beginning with % characters are written. percent % permil ‰ %o ‰ %0 ‰ %: ‰ perlakh ‱ %.. ‱ The latin-letters CIN file from gcin exemplifies case-sensitive mappings: THORN Þ thorn þ Author Jonathan de Boyne Pollard