0
|
1 �������������������������������
|
|
2 �����������������������������
|
|
3 � Chapter 7 INPUT AUTOMATON �
|
|
4 �����������������������������
|
|
5 �������������������������������
|
|
6
|
|
7
|
|
8 ���������
|
|
9 � 7.1 OVERVIEW �
|
|
10 ���������
|
|
11
|
|
12 The input automaton, also known as user input automaton, is used for converting
|
|
13 the user's input into the standard internal representation used by the system. The
|
|
14 conversion is done automatically, and the input automaton setting is completed via the
|
|
15 environment setup. This chapter will explain how to setup the input automaton.
|
|
16
|
|
17 We will now give some introduction on the structure of cWnn input automaton,
|
|
18 as well as the setting of the automaton.
|
|
19
|
|
20
|
|
21 �������������������
|
|
22 � 7.2 STRUCTURE OF INPUT AUTOMATON �
|
|
23 �������������������
|
|
24
|
|
25 Figure 7.1 below shows the brief structure of the input automaton. The "Input"
|
|
26 refers to the actual user input from the keyboard, and "Output" refers to the final input
|
|
27 received by the system, after some initial processing in the input automaton.
|
|
28
|
|
29 The mapping from "Input" to "Output" is performed by the input automaton. However, the
|
|
30 mapping rules of the mapping process are defined in the "Environment Settings". Through
|
|
31 environment settings, different types of input automaton mapping relationships can be
|
|
32 defined.
|
|
33
|
|
34 �����������������������������
|
|
35 � �
|
|
36 � ��������� �
|
|
37 � Input ��★� Input ���★ Output �
|
|
38 � ��★� Automaton � �
|
|
39 � � ��������� �
|
|
40 � � ● ◎ �
|
|
41 � feedback � ������������� �
|
|
42 � ����� Environment Setting � �
|
|
43 � ������������� �
|
|
44 � �
|
|
45 �����������������������������
|
|
46 Figure 7.1 : Brief Structure of Input Automaton
|
|
47 �������������������������
|
|
48
|
|
49
|
|
50 - 7-1 -
|
|
51 During the startup of a front-end processor, the initial environment setting of the
|
|
52 input automaton is read from the default path. After this, the user may input with the
|
|
53 help of this input automaton. There is a default input automaton environment in the cWnn
|
|
54 system. However, the user is able to set his individual input environment via the
|
|
55 "Environment Setting".
|
|
56
|
|
57 "Environment Setting" is done by using a simple language similar to "Lisp".
|
|
58 This "Environment setting" is stored as source files in the system. During the startup
|
|
59 of a front-end processor, it first reads in the "Environment Setting" files, and
|
|
60 subsequently convert them to binary format used by the system. The characteristics of
|
|
61 the input automaton is entirely dependent on the "Environment Setting". Thus, from the
|
|
62 user's viewpoint, the "Environmnet Setting" is the input automaton.
|
|
63
|
|
64 Examples of "Environment Setting" are given in Section 7.5.
|
|
65 We shall now describe the components of the input automaton and its settings.
|
|
66
|
|
67
|
|
68 1. Components of the Input Automaton
|
|
69 ������������������
|
|
70 An input automaton consists of a "mode definition table" and several "mapping tables",
|
|
71 collectively known as the "conversion table".
|
|
72
|
|
73 The mode definition table describes the different input modes and the relationship
|
|
74 among them. One input mode provides one input method at the cWnn user interface.
|
|
75 Refer to Section 7.3 for details on mode definition table.
|
|
76
|
|
77 The mapping table describes the followings:
|
|
78 (1) Mapping relation from the "Input" to "Output" shown in Figure 7.1
|
|
79 (2) Feedback input via the "Environment Setting" in Figures 7.1
|
|
80 (3) Operating state of the mode variables defined in the mode definition
|
|
81 table.
|
|
82
|
|
83 Figure 7.2 below shows the components of an input automaton. We may see from the
|
|
84 figures that the mapping table is divided into initial mapping table, intermediate
|
|
85 mapping table and final mapping table. Refer to Section 7.4 for details on mapping
|
|
86 tables.
|
|
87
|
|
88 ������������������������������������
|
|
89 � �
|
|
90 � �� (1) Mode definition table �
|
|
91 � Input automaton �� �
|
|
92 � �� (2) Mapping �
|
|
93 � table ��� Initial mapping table �
|
|
94 � �� Intermediate mapping table �
|
|
95 � �� Final mapping table �
|
|
96 � �
|
|
97 ������������������������������������
|
|
98 Figure 7.2 Components of Input Automaton
|
|
99 ���������������������
|
|
100 - 7-2 -
|
|
101 2. User Input Environment
|
|
102 �������������
|
|
103 (a) Phonetic input
|
|
104 Through the settings in input automaton, all Pinyin input may be standardized.
|
|
105 That is, all user inputs will first be processed in the automaton before they
|
|
106 are passed to the system.
|
|
107 Each Pinyin is represented as a standard internal representation in the system.
|
|
108 When a user input a Pinyin (external representation), it will be converted to
|
|
109 the internal representation in the automaton. This internal code is used by
|
|
110 the system.
|
|
111 Hence regardless of any type of Pinyin input (Quanpin, Erpin or Sanpin,
|
|
112 together with the four tones), the system will always receive the standardized
|
|
113 internal form of Pinyin, which is treated as a single unit.
|
|
114
|
|
115 (b) Encoded input
|
|
116 Through the settings in the input automaton, different types of encoded input
|
|
117 such as Wubi and Cangjie, may be set.
|
|
118
|
|
119 Other encoded input such as internal code input, Quwei input, Guobiao as well
|
|
120 as other inputs of Hanzi are also possible.
|
|
121
|
|
122
|
|
123
|
|
124 3. Setting of Input Automaton
|
|
125 ���������������
|
|
126 During startup of front-end processor "cuum", the input automaton setting file will
|
|
127 be read. The file will be searched in the following order:
|
|
128 (in descending priority)
|
|
129
|
|
130 (a) During startup, the "-r" option of "cuum" command is used. Refer to Section 3.2.
|
|
131 The path indicated together with this "-r" option is the specified path where all
|
|
132 the input automaton files are stored. The system will read in the input automaton
|
|
133 files from this directory, starting from the mode definition file "mode".
|
|
134
|
|
135 (b) If the above (a) is not used, the path specified by the command "setrkfile" in
|
|
136 the initialization file "uumrc" will be read. The path is the specified
|
|
137 directory where all the input automaton files are stored. The system will read in
|
|
138 the input automaton files from this directory, starting from the mode definition
|
|
139 file "mode".
|
|
140
|
|
141 (c) If (a) and (b) are not set or the file does not exist, the respective default
|
|
142 file for the front-end processor "cuum" will be read.
|
|
143 The default input automaton files are as follow:
|
|
144
|
|
145 /usr/local/lib/wnn/zh_CN/rk/mode (for Combination of Pinyin and Zhuyin input
|
|
146 environment)
|
|
147 /usr/local/lib/wnn/zh_CN/rk_p/mode (for Pinyin centred input environment)
|
|
148 /usr/local/lib/wnn/zh_CN/rk_z/mode (for Zhuyin cwntred input environment)
|
|
149
|
|
150 - 7-3 -
|
|
151 ����������������
|
|
152 � 7.3 MODE DEFINITION TABLE �
|
|
153 ����������������
|
|
154
|
|
155 The mode definition table consists of the mode variable definitions and the
|
|
156 input mode expressions. It describes the definition of the mode variables, the input
|
|
157 modes, as well as the relationship among the different input modes.
|
|
158 The default filename of the mode definition table in cWnn is "mode" under the default
|
|
159 directories:
|
|
160 "/usr/local/lib/wnn/zh_CN/rk" (for Combination of Pinyin & Zhuyin input environment)
|
|
161 "/usr/local/lib/wnn/zh_CN/rk_p" (for Pinyin centred input environment)
|
|
162 "/usr/local/lib/wnn/zh_CN/rk_z" (for Zhuyin centred input environment)
|
|
163
|
|
164 The mode definition table is made up of the following three types of expressions:
|
|
165 (1) Search path of mapping table
|
|
166 (2) Mode variable definition
|
|
167 (3) Input mode expression
|
|
168
|
|
169
|
|
170 1. Search Path of Mapping Table
|
|
171 ����������������
|
|
172 The filename of the mapping table which appears in the mode definition table is assumed
|
|
173 to have the same path as the mode definition table. If the path is different, it can
|
|
174 be set in the mode definition table via the "search" command as follows:
|
|
175
|
|
176 * Format
|
|
177 (search <pathname> ... ...)
|
|
178
|
|
179 - <pathname> is the path of the mapping table.
|
|
180 Several pathnames may be set, each separated by a space.
|
|
181 - This path must be specified before the mapping tables.
|
|
182
|
|
183
|
|
184 2. Mode Variable Definition
|
|
185 ��������������
|
|
186 The different mode variables are defined here.
|
|
187
|
|
188 * Format
|
|
189 (defmode <mode_name> <initial_state>)
|
|
190
|
|
191 - <mode_name> is a defined name for each input mode.
|
|
192 It begins with an alphabet, and may consist of numbers and alphabets. The mode
|
|
193 variable may have two values: ON and OFF.
|
|
194 - <initial_state> may be ON or OFF. This indicates the initial state of the mode
|
|
195 variable. Default state is OFF.
|
|
196 - A mode variable must be defined before it can be used.
|
|
197
|
|
198
|
|
199
|
|
200 - 7-4 -
|
|
201 3. Input Mode Expression
|
|
202 ������������
|
|
203 The definition of the input mode can be done in the following three ways :
|
|
204
|
|
205 * Format
|
|
206 <control_table>
|
|
207 ( if condition <mapping_table> [<mapping_table>...] <mode_indicator> )
|
|
208 ( when condition <mapping_table> [<mapping_table>...] <mode_indicator> )
|
|
209
|
|
210 - <control_table> is a special mapping table that allows the user to switch among
|
|
211 the input modes.
|
|
212 - <mapping_table> are the mapping files for each input mode. In the mode definition
|
|
213 table, the identifying file for the initial mapping tables begin with a "1". For
|
|
214 example, 1B_BS, 1B_TOLOWER and 1B_TOUPPER.
|
|
215 Intermediate mapping tables begin with a "2" and final mapping tables begin with
|
|
216 a "3". For example, 2P_RongCuo, 2Z_tail_ma and 3B_quanjiao.
|
|
217 Several mapping tables are allowed by they must follow the sequence of initial,
|
|
218 intermediate and final.
|
|
219 - The <mode_indicator> can be represented by a string of characters quoted in " ",
|
|
220 to indicate the current input mode to the user. If there are more than one mode
|
|
221 indicator in the mode expression, only the last indicator is valid.
|
|
222 - Both "if" and "when" are conditional statements, with some differences between
|
|
223 them. For "if" statements, if the condition is ture, the remaining part of the
|
|
224 "if" statement will be evaluated, and the next statement will not be evaluated.
|
|
225 If the condition is false, leave the current "if" statement and proceed to
|
|
226 evaluate the next statement.
|
|
227 - For "when" statements, if the condition is true, the remaining part of the "when"
|
|
228 statement will be evaluated; otherwise the remaining part will not be evaluated.
|
|
229 In any case, the next statement after the "when" statement will be evaluated.
|
|
230 the condition definition will be explained in details below:
|
|
231
|
|
232 Condition Definition
|
|
233 ����������
|
|
234 The "condition" above can be expressed in the following ways :
|
|
235
|
|
236 ����������������������������������������
|
|
237 � Mode variable name � True when ON, False when OFF �
|
|
238 ����������������������������������������
|
|
239 �(and condition condition) �True when both conditions are true. �
|
|
240 � � �
|
|
241 �(or condition condition) �True when at least one of the two conditions is �
|
|
242 � �true. �
|
|
243 � � �
|
|
244 �( not condition ) �True when the condition is false �
|
|
245 � � �
|
|
246 �( false ) �False �
|
|
247 � � �
|
|
248 �( true ) �True �
|
|
249 ����������������������������������������
|
|
250 - 7-5 -
|
|
251 The following is an example of input mode expression.
|
|
252 $ £ ¢ represent conditions, and A B C D E represent conversion table.
|
|
253 Assume that the $ £ ¢ conditions as true in this example.
|
|
254
|
|
255 Example
|
|
256 ����
|
|
257 (when $ A (if £ B) C) (if ¢ D) E
|
|
258
|
|
259 Reading from left to right, we first consider (when $ A (if £ B) C).
|
|
260 Since $ is true, we proceed to A (if £ B) C. First select A , then (if £ B).
|
|
261 Since £ is true, B is selected. As the condition is true for "if", the
|
|
262 remaining statement of A (if £ B) C will not be processed.
|
|
263
|
|
264 Since (when $ A (if £ B) C) is a subset of
|
|
265 (when $ A (if £ B) C) (if ¢ D) E , the rest of the statement must be
|
|
266 processed.
|
|
267
|
|
268 Lastly for (if ¢ D), as ¢ is true, D is selected. Furthermore, the statement
|
|
269 is an "if" statement, hence the rest of the statement
|
|
270 (when $ A (if £ B) C) (if ¢ D) E need not be processed. As a result,
|
|
271 A, B and D are selected after the execution of this statement.
|
|
272
|
|
273
|
|
274 3. Example of Mode Definition Table
|
|
275 ������������������
|
|
276 (search /usr/local/lib/wnn/zh_CN/rk)
|
|
277
|
|
278 (defmode YIN on)
|
|
279 (defmode PY on) (defmode ZY)
|
|
280 (defmode ASCII )
|
|
281 (defmode ban_jiao on) (defmode quan_jiao)
|
|
282
|
|
283 2A_CTRL
|
|
284 (if YIN
|
|
285 (if PY 1B_TOUPPER 2P_QuanPin 2P_RongCuo 2Z_tail_pin "畠憧:P")
|
|
286 (if ZY 1Z_ZhuYin 1B_TOUPPER 2Z_ZhuYin 2Z_tail "廣咄:Z")
|
|
287 )
|
|
288 (if ASCII
|
|
289 (if ban_jiao "磯叔:")
|
|
290 (if quan_jiao 3B_quanjiao "畠叔:")
|
|
291 )
|
|
292
|
|
293
|
|
294
|
|
295
|
|
296
|
|
297
|
|
298
|
|
299
|
|
300 - 7-6 -
|
|
301 ������������
|
|
302 � 7.4 MAPPING TABLES �
|
|
303 ������������
|
|
304
|
|
305 In any input mode, the relation between the "Input" and "Output" of the input
|
|
306 automaton (as in Figure 7.1) is represented in the mapping tables. The mapping tables
|
|
307 consist of the initial, intermediate and final mapping tables. Refer to Figure 7.3.
|
|
308 In the whole process, the intermediate mapping plays the main role, with the initial
|
|
309 and final mapping acting as the preparation and touching up respectively.
|
|
310
|
|
311 During the input automaton mapping process, the input characters first undergo
|
|
312 the initial mapping as shown in Figures 7.3. The result (output-1) is then passed to
|
|
313 the intermediate mapping table as input to undergo a character string mapping.
|
|
314 Subsequently, output-2 is passed as input for final mapping. Output-3 is the final
|
|
315 output of the input automaton.
|
|
316 The feedback shown in diagram is treated as input to the intermediate mapping.
|
|
317
|
|
318 ����������������������������������������
|
|
319 � Initial Intermediate Final �
|
|
320 � mapping mapping mapping �
|
|
321 � User ����� ������� ������ �
|
|
322 � Input ★� e E �output-1� EU Eu �output-2� I Ch � output-3 �
|
|
323 � � u U ����★� . . ����★� U Sh ���★ �
|
|
324 � � . . � �★� . . � � V Zh � Final Output �
|
|
325 � ����� � ������� ������ of Automaton �
|
|
326 � � ◎ feedback �
|
|
327 � ������ �
|
|
328 ����������������������������������������
|
|
329 Figure 7.3 Input Automaton Process
|
|
330 �����������������
|
|
331
|
|
332
|
|
333 The initial mapping can only perform mapping between characters. For example,
|
|
334 to map an "e" to an "E" as in Figure 7.3. Intermediate mapping is able to perform
|
|
335 mapping between character strings. For example, mapping from "EU" to "Eu". Final
|
|
336 mapping can perform mappings from character and character string. For example, from
|
|
337 "I" maps to "Ch" during Erpin input.
|
|
338 Besides, feedback input can also be provided by the intermediate mapping.
|
|
339
|
|
340 We will now describe the variable definitions in each of the mapping tables.
|
|
341
|
|
342
|
|
343
|
|
344
|
|
345
|
|
346
|
|
347
|
|
348
|
|
349
|
|
350 - 7-7 -
|
|
351 1. Variable Definition
|
|
352 �����������
|
|
353 Through definitions and the use of variables, similar mapping relations can be
|
|
354 described easily and effectively.
|
|
355
|
|
356 In the mapping tables, each process table has its own format of definitions:
|
|
357
|
|
358 (a) Initial mapping table
|
|
359 �������������
|
|
360 In the initial mapping table, the definitions consists of the followings:
|
|
361 - Format
|
|
362 (defvar <variable_name> (list <character> ... ...)) -----(a)
|
|
363 (defvar <variable_name> (all)) -----(b)
|
|
364 <variable_name> [<variable_representation>]
|
|
365
|
|
366 Either (a) or (b) may be used.
|
|
367 In (a), <variable_name> can be any of the characters in "list".
|
|
368 In (b), the <variable_name> can be any character.
|
|
369
|
|
370 - Format Pattern Description
|
|
371 The format has the following pattern:
|
|
372 Character_Variable_Definition
|
|
373 Input_Character_Representation [Output_Character_Representation]
|
|
374
|
|
375 If the user input character matches the character in "Input_Character_
|
|
376 Representation", the input automaton converts it to the character in
|
|
377 "Output_Character_Representation".
|
|
378
|
|
379 (a) and (b) above are the two types of Character_Variable_Definition.
|
|
380 The example below show the similar conversion relations.
|
|
381
|
|
382 - Example
|
|
383 eg1 : (defvar bs (list '\x08'))
|
|
384 (bs) R
|
|
385
|
|
386 eg2 : (defvar up (all))
|
|
387 (up) (tozenhira (tolower(up)))
|
|
388
|
|
389
|
|
390
|
|
391
|
|
392
|
|
393
|
|
394
|
|
395
|
|
396
|
|
397
|
|
398
|
|
399
|
|
400 - 7-8 -
|
|
401 (b) Intermediate mapping table
|
|
402 ���������������
|
|
403 In the intermediate mapping table, the definitions consists of the followings:
|
|
404 - Format
|
|
405 (defvar <variable_name> (list <character> ... ...)) -----(a)
|
|
406 (defvar <variable_name> (all)) -----(b)
|
|
407 <input_variable> [<output_variable>] [<feedback_variable>]
|
|
408 <variable_condition> <operation>
|
|
409
|
|
410 Either (a) or (b) may be used. In (a), <variable_name> can be any of the
|
|
411 characters in "list". In (b), the <variable_name> can be any character.
|
|
412
|
|
413 - Format Pattern Description
|
|
414 The format above has the following pattern:
|
|
415 Character_Variable_Definition
|
|
416 Input_Character_String_Representation [Output_Character_String_Representation]
|
|
417 [Feedback_Character_String_Representation] ---(*)
|
|
418 Input_Character_String_Representation Operation ---(@)
|
|
419
|
|
420 In (*), if the input character string matches the character string in
|
|
421 "Input_Character_String_Representation", the input automaton converts it
|
|
422 to the character string in "Output_Character_String_Representation".
|
|
423 During output, the "Feedback_Character_String_Representation" will be
|
|
424 treated as new input to intermediate mapping.
|
|
425
|
|
426 In (@), if the input character matches the character in
|
|
427 "Input_Character_String_Representation", the input automaton performs the
|
|
428 specified operation on the mode variables.
|
|
429
|
|
430 - Example
|
|
431 eg1 : (defvar A (list B C D) )
|
|
432 (A)A (A)aタ
|
|
433
|
|
434 eg2 : (defvar str (list 0 1 2 3 4 5 \
|
|
435 6 7 8 9 ))
|
|
436 (if strk0)(str) (str) 'A' ;feedback
|
|
437 'A' (off strk0)(on strk1)
|
|
438
|
|
439
|
|
440
|
|
441
|
|
442
|
|
443
|
|
444
|
|
445
|
|
446
|
|
447
|
|
448
|
|
449
|
|
450 - 7-9 -
|
|
451 (c) Final mapping table
|
|
452 ������������
|
|
453 In the final mapping table, the definitions consists of the followings:
|
|
454 - Format
|
|
455 (defvar <variable_name> (list <character> ... ...)) -----(a)
|
|
456 (defvar <variable_name> (all)) -----(b)
|
|
457 <variable_name> [<variable_representation>]
|
|
458
|
|
459 Either (a) or (b) may be used. In (a), <variable_name> can be any of the
|
|
460 characters in "list". In (b), the <variable_name> can be any character.
|
|
461
|
|
462 - Format Pattern Description
|
|
463 The format above has the following pattern:
|
|
464 Character_Variable_Definition
|
|
465 input_Character_Representation [Output_Character_String_Representation]
|
|
466
|
|
467 If the input character matches the character in "Input_Character_Representation",
|
|
468 the input automaton converts it to the character string in
|
|
469 "Output_Character_String_Representation".
|
|
470
|
|
471 - Example
|
|
472 eg1 : (defvar a (all))
|
|
473 (a) (tozenalpha (a))
|
|
474
|
|
475
|
|
476 NOTE:
|
|
477 - In the parts in [ ] of the above "Format Pattern Description" are options.
|
|
478 - One expression should be in the same line. If there is not enough space for
|
|
479 the expression, it can be continued on the following line by using the \.
|
|
480 - Anything after a semicolon ";" in a line is treated as comment.
|
|
481
|
|
482
|
|
483 SUPPLEMENT:
|
|
484 During the definition and use of variables,
|
|
485 (a) The variable must be defined before it is used.
|
|
486 (b) The variable definitions are only valid in the current mapping table, and not
|
|
487 in other tables.
|
|
488 (c) Variables in the same line have the same value. For example:
|
|
489
|
|
490 (defvar a1 (list A B))
|
|
491 (a1) (tolower(a1)) 3
|
|
492
|
|
493 When input [Aa] or [Bb], the result will be 3. However, there is no match
|
|
494 when input is [Ab] or [Ba].
|
|
495
|
|
496
|
|
497
|
|
498
|
|
499
|
|
500 - 7-10 -
|
|
501 2. Evaluation of Characters
|
|
502 ��������������
|
|
503 The evaluation result of a character representation must be a character. This character
|
|
504 includes a single character and multi-characters. For example,
|
|
505 a 嶄'\x9f'
|
|
506 b 忽'\x9f'
|
|
507
|
|
508 The format of representing characters and functions are given below.
|
|
509
|
|
510 (a) Character representation
|
|
511 ��������������
|
|
512 Certain characters cannot be represented by itself. The following shows the
|
|
513 format for these characters:
|
|
514
|
|
515 ���������������������������������������
|
|
516 � Format � Description �
|
|
517 ���������������������������������������
|
|
518 � Character � Character other than ( ) ' " \ ; SP �
|
|
519 � 'Character' � Character other than ' \ ^ �
|
|
520 � '^Character' � Indicates control character <control + character>. The �
|
|
521 � � character must be between 64-95 or lower case alphabets. �
|
|
522 � '\Character' � Indicates special characters. Generally, '\character' �
|
|
523 � � refers to the character after [\]. �
|
|
524 � � Besides, '\n', '\t', '\b', '\r', '\f' have the same �
|
|
525 � � meaning as the ESC symbol in C language; �
|
|
526 � � '\e', '\E' represent ESC; �
|
|
527 � � and '\8 ...' '\o...', '\d ...' ,'\x ...' represent �
|
|
528 � � octal, decimal and hexadecimal repsectively. �
|
|
529 ���������������������������������������
|
|
530
|
|
531
|
|
532 (b) Function representation
|
|
533 ��������������
|
|
534 There are some special functions in the automaton. These functions can be used
|
|
535 directly. The table below gives a summary of the functions.
|
|
536
|
|
537 - Representation Format <1>:
|
|
538 (<function> <name> <operand>)
|
|
539
|
|
540 - Representation Format <2>:
|
|
541 (<function> <name> <operand> <operand>)
|
|
542
|
|
543
|
|
544
|
|
545
|
|
546
|
|
547
|
|
548
|
|
549
|
|
550 - 7-11 -
|
|
551 �����������������������������������������
|
|
552 � Function Name� Format � Function Description �
|
|
553 �����������������������������������������
|
|
554 �toupper � <1> �If the operand is a lower case alphabet, the �
|
|
555 � � �upper case alphabet will be used. For example, �
|
|
556 � � �( toupper a ) will produce A �
|
|
557 �����������������������������������������
|
|
558 �tolower � <1> �If the operand is an upper case alphabet, the �
|
|
559 � � �lower case alphabet will be used. For example, �
|
|
560 � � �( tolower B ) will produce b �
|
|
561 �����������������������������������������
|
|
562 �toupdown � <1> �If the operand is an upper(lower) case alphabet, �
|
|
563 � � �the corresponding lower(upper) case alphabet �
|
|
564 � � �will be used. �
|
|
565 �����������������������������������������
|
|
566 �tozenalpha � <1> �If the operand is an ASCII character, the �
|
|
567 � � �corresponding wide ASCII character will be used. �
|
|
568 � � �For example, ( tozenalpha A ) will produce A �
|
|
569 �����������������������������������������
|
|
570 �value � <1> �Indicates the internal code value of the operand �
|
|
571 � � �For example, �
|
|
572 � � � (value 0 ) will produce '\x0 �
|
|
573 � � � (value A ) will produce '\xa �
|
|
574 �����������������������������������������
|
|
575 � + � <2> �Indicates addition operation of two operands. �
|
|
576 � � �For example, �
|
|
577 � � � ( + A 0x20 ) will produce a �
|
|
578 � � � ( + 0 ( value 3 ) ) will produce 3 �
|
|
579 �����������������������������������������
|
|
580 � - � <2> �Indicates subtraction operation of two operands �
|
|
581 �����������������������������������������
|
|
582 � * � <2> �Indicates multiplication operation of two operands�
|
|
583 �����������������������������������������
|
|
584 � / � <2> �Indicates division operation of two operands �
|
|
585 �����������������������������������������
|
|
586
|
|
587
|
|
588
|
|
589
|
|
590
|
|
591
|
|
592
|
|
593
|
|
594
|
|
595
|
|
596
|
|
597
|
|
598
|
|
599
|
|
600 - 7-12 -
|
|
601 3. Evaluation of Character String
|
|
602 �����������������
|
|
603 The character string representation is a sequence of character representations,
|
|
604 which has been described in 2 (Evaluation of Characters). The evaluation result of the
|
|
605 character string representation is also a character string, which includes a single
|
|
606 character and multi-characters.
|
|
607
|
|
608 The format of representing character strings, functions and mode operation are given
|
|
609 below.
|
|
610
|
|
611 (a) Character representation
|
|
612 ��������������
|
|
613 Similar to the character representation and evaluation in 2 (Evaluation of
|
|
614 characters).
|
|
615
|
|
616
|
|
617 (b) Function representaiton
|
|
618 ��������������
|
|
619 - Representation Format <1>:
|
|
620 <function> last=
|
|
621 If the last character of the most recently mapped character string matches
|
|
622 the function parameter, the function evaluates to an empty string.
|
|
623
|
|
624 - Representation Format <2>:
|
|
625 <function> todigit
|
|
626 Convert the code given by the first parameter to the value in the base of
|
|
627 the code given by the second parameter.
|
|
628
|
|
629
|
|
630
|
|
631
|
|
632
|
|
633
|
|
634
|
|
635
|
|
636
|
|
637
|
|
638
|
|
639
|
|
640
|
|
641
|
|
642
|
|
643
|
|
644
|
|
645
|
|
646
|
|
647
|
|
648
|
|
649
|
|
650 - 7-13 -
|
|
651 (c) Mode operation and evaluation
|
|
652 �����������������
|
|
653 The following table shows the functions available for mode operation.
|
|
654 For example, in the mode control file "2A_CTRL" in cWnn, the following
|
|
655 functions are used.
|
|
656
|
|
657 �������������������������������������
|
|
658 � Function Name� Function Description �
|
|
659 �������������������������������������
|
|
660 � if �To evaluate the state of mode operation. �
|
|
661 � �If ON, it will be treated as empty character string. �
|
|
662 � � �
|
|
663 � unless �To evaluate the state of mode operation. �
|
|
664 � �If OFF, it will be treated as empty character string. �
|
|
665 � � �
|
|
666 � on �To set the state of mode operation to ON. �
|
|
667 � � �
|
|
668 � off �To set the state of mode operation to OFF. �
|
|
669 � � �
|
|
670 � switch �To switch the mode operation state. �
|
|
671 � �In other words, if the state is ON, set it to OFF �
|
|
672 � �and vice versa. �
|
|
673 � � �
|
|
674 � allon �Set all modes to ON. �
|
|
675 � � �
|
|
676 � alloff �Set all modes to OFF. �
|
|
677 � � �
|
|
678 � (error) �Error handling for input keys that cannot be mapped. �
|
|
679 � � �
|
|
680 � (restart) �To read in new mode definition table and re-define �
|
|
681 � �the conversion. If error exists in the new conversion �
|
|
682 � �table, an error message will be given and the system �
|
|
683 � �returns to the settings of the original conversion �
|
|
684 � �table. �
|
|
685 �������������������������������������
|
|
686
|
|
687 NOTE: - Function "if" and "unless" can only be used in the Input Character
|
|
688 String Representations;
|
|
689 - "on", "off" and "switch" can only be used in the Output Character
|
|
690 String Representations;
|
|
691 - "allon" and "alloff" and "(error)" can only be used in the Output
|
|
692 Character String Representations of intermediate mapping tables.
|
|
693 - "(restart)" is used by itself.
|
|
694
|
|
695
|
|
696
|
|
697
|
|
698
|
|
699
|
|
700 - 7-14 -
|
|
701 ��������������������
|
|
702 � 7.5 AN EXAMPLE OF INPUT AUTOMATON �
|
|
703 ��������������������
|
|
704
|
|
705 We have introduced the input automaton in the above Sections. We will now
|
|
706 give an example of a simple input automaton, using the front-end processor "cuum".
|
|
707 Take note that some of the definitions are DIFFERENT from the standard definition
|
|
708 in cWnn. For example, the encoded input definitions are not included in this
|
|
709 sample input automaton.
|
|
710
|
|
711 The mode definition table and the mode control table will be shown.
|
|
712 However, only some of the mapping tables are shown. Users who are interested in
|
|
713 the input automaton can refer directly to the files under the default path.
|
|
714
|
|
715 1. Mode Definition Table (mode)
|
|
716 ����������������
|
|
717 This is the "mode definition table" we have described in Section 7.3. It consists
|
|
718 of the relationship between the mode variables and the input mode.
|
|
719
|
|
720 * Default Path
|
|
721 /usr/local/lib/wnn/zh_CN/rk/mode
|
|
722
|
|
723 * Content
|
|
724 (defmode YIN on)
|
|
725 (defmode PY on) (defmode ZY)
|
|
726 (defmode ASCII )
|
|
727 (defmode ban_jiao on) (defmode quan_jiao)
|
|
728 2A_CTRL
|
|
729 (if YIN
|
|
730 (if PY 1B_TOUPPER 2P_QuanPin 2P_RongCuo 2Z_tail_pin "畠憧:P")
|
|
731 (if ZY 1Z_ZhuYin 1B_TOUPPER 2Z_ZhuYin 2Z_tail "廣咄:Z")
|
|
732 )
|
|
733 (if ASCII
|
|
734 (if ban_jiao "磯叔:")
|
|
735 (if quan_jiao 3B_quanjiao "畠叔:")
|
|
736 )
|
|
737
|
|
738
|
|
739
|
|
740
|
|
741
|
|
742
|
|
743
|
|
744
|
|
745
|
|
746
|
|
747
|
|
748
|
|
749
|
|
750 - 7-15 -
|
|
751 * Description
|
|
752 The above mode definition table defines the "Pinyin" , "Banjiao" character
|
|
753 and "Quanjiao" input modes.
|
|
754
|
|
755 Initially � YIN mode is set to ON
|
|
756 � PY mode under YIN is set to ON
|
|
757 � ZY is set to OFF
|
|
758 � ASCII mode is set to OFF
|
|
759 � ban_jiao mode under ASCII is set to ON
|
|
760 � quan_jiao mode under ASCII is set to OFF
|
|
761
|
|
762 From the above ASCII definitions in the mode definition table, during the
|
|
763 initial state, the input automaton receives Banjiao input. Notice that
|
|
764 the ban_jiao state under the ASCII state has no mapping table, this means
|
|
765 that the user input is received directly by the system.
|
|
766
|
|
767 For the YIN definition, to input Pinyin, user needs to change the mode to
|
|
768 PY (under YIN). The way of changing the mode is defined in the mapping
|
|
769 table 2A_CTRL (see next paragraph). Here, we assume that we are already
|
|
770 in the PY mode, and the input automaton receives Pinyin input.
|
|
771 The automaton will first follow the definition of mapping table 1B_TOUPPER
|
|
772 to convert the actual user input to upper case alphabets. Subsequently,
|
|
773 the automaton creates the "actual final input received by the system"
|
|
774 based on mapping tables 2P_QuanPin, 2P_RongCuo and 2Z_tail_pin.
|
|
775 We will now show a simple example:
|
|
776
|
|
777 - When a user inputs "Zhong", or "ZHONG", according to the
|
|
778 definitions in 2P_QuanPin, the input automaton outputs "Zhongタ".
|
|
779
|
|
780 - When a user inputs "JA" by mistake, the automaton automatically
|
|
781 corrects this error to "Jia" based on the definitions in mapping
|
|
782 table 2P_RongCuo.
|
|
783
|
|
784 - As for punctuations, in mapping table 2Z_tail_pin, mapping
|
|
785 relation between ASCII "." and Chinese "。" is defined.
|
|
786 Hence, when user enters an ASCII ".", the automaton outputs the
|
|
787 Chinese "。".
|
|
788
|
|
789 Similarly, the user is able to change the mode to Wubi mode to input Hanzi,
|
|
790 or change to Quanjiao mode to input wide ASCII characters.
|
|
791
|
|
792 NOTE: Wubi mode is not described in this example. However, the definitions
|
|
793 are similar. Refer to the system standard files for examples.
|
|
794 (/usr/local/lib/wnn/zh_CN/rk/)
|
|
795
|
|
796
|
|
797
|
|
798
|
|
799
|
|
800 - 7-16 -
|
|
801 2. Mode Control Table (2A_CTRL)
|
|
802 ����������������
|
|
803 This is the <control_table> in the "Input Mode Expression" mentioned in Section 7.3.
|
|
804 It controls the mode variables and allows the user to switch among different input
|
|
805 modes.
|
|
806
|
|
807 2A_CTRL is referred in the initialization file "mode". The key code defined in
|
|
808 "uumkey" is used in this table.
|
|
809
|
|
810 * Default Path
|
|
811 /usr/local/lib/wnn/zh_CN/rk/2A_CTRL
|
|
812
|
|
813 * Content
|
|
814 (defvar pf1 (list '\x81') )
|
|
815 (defvar pf3 (list '\x83') )
|
|
816
|
|
817 (unless YIN)(pf1) (on YIN)(off BX)(off ASCII)
|
|
818 (if YIN)(pf1) (switch PY)(switch ZY)
|
|
819
|
|
820 (unless ASCII)(pf3) (on ASCII)(off YIN)(off BX)
|
|
821 (if ASCII)(pf3) (switch quan_jiao)(switch ban_jiao)
|
|
822
|
|
823
|
|
824 3. Quanpin Mapping Table (2P_QuanPin)
|
|
825 �������������������
|
|
826 This is the mapping table of Quanpin input.
|
|
827
|
|
828 * Default Path
|
|
829 /usr/local/lib/wnn/zh_CN/rk/2P_QuanPin
|
|
830
|
|
831 * Content
|
|
832 (defvar A (list B C D F G H K L M N P S T W Y Z ))
|
|
833 (defvar AI (list B C D G H K L M N P S T W Z ))
|
|
834 (defvar AN (list B C D F G H K L M N P R S T W Y Z )) ;ANG
|
|
835 (defvar AO (list B C D G H K L M N P R S T W Y Z ))
|
|
836 (defvar E (list B C D G H K L M N R S T Y Z ))
|
|
837 : :
|
|
838 : :
|
|
839 (A)A (A)aタ
|
|
840 (A)A1 (A)。
|
|
841 (A)A2 (A)「
|
|
842 (A)A3 (A)」
|
|
843 (A)A4 (A)、
|
|
844 (AI)AI (AI)ai
|
|
845 (AI)AI1 (AI)。i
|
|
846 : :
|
|
847 : :
|
|
848
|
|
849
|
|
850 - 7-17 -
|
|
851 4. Pinyin Error Correction Mapping Table (2P_RongCuo)
|
|
852 ���������������������������
|
|
853 * Default Path
|
|
854 /usr/local/lib/wnn/zh_CN/rk/2P_RongCuo
|
|
855
|
|
856 * Content
|
|
857 The auto-correcting definition in the Pinyin input
|
|
858
|
|
859 (defvar A (list J Q X ))
|
|
860 (A)A (A)iaタ
|
|
861 (A)A1 (A)i。タ
|
|
862 (A)A2 (A)i「タ
|
|
863 (A)A3 (A)i」タ
|
|
864 (A)A4 (A)i、タ
|
|
865 (A)AI (A)iaタ
|
|
866 : :
|
|
867 : :
|
|
868 (A)EN (A)inタ
|
|
869 (A)EN1 (A)ゥnタ
|
|
870 (A)EN2 (A)ェnタ
|
|
871 (A)EN3 (A)ォnタ
|
|
872 (A)EN4 (A)ャnタ
|
|
873 : :
|
|
874 (A)OU (A)iuタ
|
|
875 (A)OU1 (A)iアタ
|
|
876 (A)OU2 (A)iイタ
|
|
877 (A)OU3 (A)iウタ
|
|
878 (A)OU4 (A)iエタ
|
|
879 : :
|
|
880 : :
|
|
881
|
|
882
|
|
883
|
|
884 5. Mapping Table (1B_TOUPPER)
|
|
885 ��������������
|
|
886 This mapping table converts the input characters into upper case alphabets.
|
|
887
|
|
888 * Default Path
|
|
889 /usr/local/lib/wnn/zh_CN/rk/1B_TOUPPER
|
|
890
|
|
891 * Content
|
|
892 (defvar low (all))
|
|
893 (low) (toupper (low))
|
|
894
|
|
895
|
|
896
|
|
897
|
|
898
|
|
899
|
|
900 - 7-18 -
|
|
901 6. Mapping Table (3B_quanjiao)
|
|
902 ���������������
|
|
903 This mapping table converts the input characters to wide ASCII characters.
|
|
904
|
|
905 * Default Path
|
|
906 /usr/local/lib/wnn/zh_CN/rk/3B_quanjiao
|
|
907
|
|
908 * Content
|
|
909 (defvar a (all))
|
|
910 (a) (tozenalpha (a))
|
|
911
|
|
912
|
|
913
|
|
914
|
|
915
|
|
916
|
|
917
|
|
918
|
|
919
|
|
920
|
|
921
|
|
922
|
|
923
|
|
924
|
|
925
|
|
926
|
|
927
|
|
928
|
|
929
|
|
930
|
|
931
|
|
932
|
|
933
|
|
934
|
|
935
|
|
936
|
|
937
|
|
938
|
|
939
|
|
940
|
|
941
|
|
942
|
|
943
|
|
944
|
|
945
|
|
946
|
|
947
|
|
948
|
|
949
|
|
950 - 7-19 -
|