view cWnn/manual.en/chap6 @ 2:b605a0e60f5b

- reverted jdata.h - fixed the bug which occurred on changing length of bunsetsu.
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Thu, 13 Dec 2007 17:42:01 +0900
parents bbc77ca4def5
children
line wrap: on
line source


		*************************************************
		*	Chapter 6	INPUT AUTOMATION	*
		*************************************************




   6.1 OVERVIEW   
   ============

	The input automaton, also known as user input automaton, is used for converting 
the  user's input  into the standard  internal representation  used by the system.  The 
conversion is done  automatically, and the definition of the input automaton is set via 
the environment. 


1. Structure of Input Automaton 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	The  diagram  below shows  the structure  of the input automaton.  The "input" 
refers to  the  user input from  the keyboard, and "output" refers to the final output
received  by the system.  The  mapping from  input to output is performed by the input 
automaton.  Environment setting defines the mapping process from input to output.  
Through  environment setting, different types of input automaton mapping relationships
can be defined.  At the same time, the input automaton is capable of perform automaton 
input by providing  appropriate feedback  characters  which are sent back to the input 
automaton automatically.

        	+----------------------------------------------------+
	        |                                                    |
	        |                    +--------------+                |
	        |     Input    ----->|    Input     |-----> Output   |
	        |              +---->|  Automaton   |                |
	        |              |     +--------------+                |
	        |              |          ^  |                       |
		|	       |          |  V			     |
	        |    feedback  |       +--+-------------------+      |
	        |              +-------| Environment Setting  |      |
	        |                      +----------------------+      |
	        |                                                    |
	        +----------------------------------------------------+
			Figure 6.1 : Structure of Input Automaton
			~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~






					- 6-1 -
	During  the startup of  cuum (client), the initial environment  setting of the
input automaton is read via the standard path.  After this, the user can input through 
the input automaton.  There is a  default  input automaton  environment in the system.  
However, the  user is  able to  define his  own input automaton environment.  In other 
words, he can set his individual input environment via the "environment setting". 

	"Environment setting" is done by using a simple language similar to "Lisp". 
This is stored in the system  as source files.  During cuum startup, it first reads in 
all the environment  files, and subsequently  convert them to internal format  used by 
the system.  The  characteristics of the input automaton is entirely dependent  on the 
"environment setting".   Thus, from the user's viewpoint, the "environmnet setting" is 
the input automaton.


2. User Input Environment
~~~~~~~~~~~~~~~~~~~~~~~~~
(a) Phonetic input
	Definition  of  Pinyin input  is possible  through the definition of the input 
	automaton.   Hence  for all  Pinyin  input  (including  Quanpin, Erpin, Sanpin 
	together  with  the four  tones),  the system will always receive the standard 
	Pinyin. 

	The input automaton also performs a checking for legal pinyin input.
	When user inputs a Pinyin (external representation), the automaton converts it 
	to its internal representation.  This internal representation is treated as a 
	unit by the system.  This has also made Pinyin tone error tolerance possible.
 
(b) Radical input
	Through the  definition of  the input  automaton,  different types  of radical 
	input  can be defined.  Besides, internal  code input, Quwei input, Guobiao as 
	well as other inputs of Hanzi are also possible.

	Similarly,  the "environment setting"  for  radical input  is done  by using a
	simple language  similar to  "Lisp".   This is  stored in the system as source 
	files. During the startup of the client, it first reads in all the environment 
	files,  and  subsequently convert  them to internal format used by the system.  
	The characteristics of the input automaton is entirely dependent on the 
	"environment setting".  Thus, from the user's viewpoint, the 
	"environmnet setting" is the input automaton. 










					- 6-2 -
3. Setting of Input Automaton
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
During startup of front-end processor "cuum", the mode definition file is searched in 
the following order :

(a) During startup, the path of input automaton is set using the "-r" option of cuum.
    If the path indicated is a directory name, the system will read in the file "mode" 
    under the directory.

(b) In the initialization files "uumrc" of cuum, the path is set via the "setautofile" 
    command.  If the path is a directory name, the system will read in the file "mode" 
    under the directory.
 
(c) If (a) and (b) are not set or the file does not exist, the respective standard 
    path for cuum will be read.  The standard path is as follows :

	/usr/local/lib/wnn/zh_CN/rk/mode           
































					- 6-3 -

   6.2 CONVERSION METHOD    
   =====================

	An input  automaton consists  of a mode definition table and several mapping
tables,  collectively known  as the  conversion table.  Figure 6.3 shows the logical 
structure of input automaton conversion tables.  The mode definition table describes 
the mapping tables of the input automaton and relationships among them.
	One input mode provides one input method at the user interface.  The mapping 
tables show (1) Mapping from the input to output shown in Figure 6.1
	    (2)	Feedback input via environment setting
	    (3) State of the mode variables

The mapping  table describes the variable definitions and the mapping relationship.  
It can be divided  into initial mapping table, intermediate mapping table and final 
mapping table as shown below :

   	+---------------------------------------------------------------------+
   	|                                              		      	      |
   	|                   +--- (1) Mode definition table       	      |
   	| Input automaton --| 					      	      |
   	|                   +--- (2) Mapping 				      |
   	|			      table --+--- Initial mapping table      |
   	|                                     |--- Intermediate mapping table |
   	|                                     +--- Final mapping table        |
   	|                                              		      	      |
   	+---------------------------------------------------------------------+
	    Figure 6.2 Structure of Input Automaton Description Language 
	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	



















					- 6-4 -
	During the input automaton mapping process, the input first undergoes the 
initial mapping as shown in Figures 6.3.  The result (output-1) is then passed to 
the intermediate  mapping table  as input, and subsequently output-2 is passed as 
input  for final mapping.  Output-3  is the  final output of the input automaton.  
The feedback shown in diagram is treated as input to the intermediate mapping.

    +---------------------------------------------------------------------------+
    |           Initial           Intermediate            Final                	|
    |           mapping             mapping              mapping               	|
    |          +-------+        +-----------+        +---------+            	|
    |  Input ->| e  E  |output-1| EU   Eu   |output-2| E   Ch  | output-3   	|
    |          | u  U  |------->|  .    .   |------->| A   Sh  |----->      	|
    |          | .  .  |    +-->|  .    .   |        | V   Zh  |            	|
    |          +-------+    |   +-----------+        +---------+            	|
    |       		    |		| feedback                             	|
    |			    |		V					|
    |            	    +-----------+                             		|
    +---------------------------------------------------------------------------+
			Figure 6.3 Input Automaton Process
			~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
The mode definition table consists of the mode variable definitions and the input 
mode expressions.  In mode  definition table, the setting sequence of the mapping 
tables must be initial, intermediate, and followed by final mapping.

























					- 6-5 -

   6.3 MODE DEFINITION TABLE    
   =========================

	The  mode definition table describes the definition of the mode variables, 
input modes, as well as the relation among the different input modes. The table is 
made up of the following three types of expressions :


1. Mode Variable Definition 
~~~~~~~~~~~~~~~~~~~~~~~~~~~
(defmode  <mode name>  [initial state])

 * Mode name
  	Begins  with  an alphabet.  Consists  of numbers and alphabets.  The mode 
	variable may have two values : ON and OFF.

 * Initial state
  	The initial state can be ON or OFF.  This indicates  the initial state of 
	the mode variable.  Default is OFF.

 * A mode variable must be defined before it can be used. 
 

2. Search Path of Mapping Table 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mapping tables  which appear in the mode definition table are assumed to have the
same path as the  mode definition table.  If the path is different, the directory
name has to be set as follows :

	(search pathname ... ...)

  	* Several pathnames may be set, each separated by a separator.
	* The path must be defined before the mapping tables.















					- 6-6 -
3. Input Mode Expression
~~~~~~~~~~~~~~~~~~~~~~~~
The definition of the input mode can be done in the following three ways :
  	(a)  Mapping table [mapping table ...] mode indicator
  	(b)  ( if    condition   mapping table  [mapping table ...]  Mode indicator )
  	(c)  ( When  condition   mapping table  [mapping table ...]  Mode indicator )

"mapping table" is an  identifier that  specifies an  actual file in the current search 
path.  The file describes the input condition under the current input mode.
Both "if" and "when" are conditional statements, with some differences between them.  
For "if" statements, if the condition is ture, the remaining part of the "if" statement
will  be evaluated,  and the next statement will not be evaluated.  If the condition is
false, leave the current "if" statement and proceed to evaluate the next statement.
For  "when"  statements,  if the  condition is true, the remaining  part  of the "when"
statement will be evaluated; otherwise the remaining part will not be evaluated. In any
case, the next statement after the "when" statement will be evaluated.

In the mode definition table, the identifier of initial mapping tables begin with a "1".
Intermediate mapping tables begin with a "2"and final mapping tables begin with a "3".  

The mapping tables must follow the sequence of initial, intermediate and final.
There may be several mapping tables in each stage (initial, intermediate, final). 

The  "mode indicator"  can be represented  by a string of characters quoted in " ", to 
indicate the current input mode to the user.  If there are more than one mode indicator 
in the mode expression, only the last indicator is valid.



Condition Definition 
~~~~~~~~~~~~~~~~~~~~
The "condition" above can be expressed in the following ways :

   +-----------------------------------------------------------------------------+ 
   |    Mode variable name     | 	  ON when true, OFF when false 		 |
   |---------------------------+-------------------------------------------------|
   |(and condition condition)  |True when both conditions are true.              |
   |			       |					         |
   |(or condition condition)   |True when at least one of the two conditions is  |
   |			       |true.					         |
   |			       |					         |
   |( not condition )          |True when the condition is false  	         |
   |			       |					         |
   |( false )  	       	       |False					         |
   |			       |						 |
   |( true )   	               |True					   	 |
   +-----------------------------------------------------------------------------+ 


					- 6-7 -

<Table-c-6.1>















































					- 6-8 -

   6.4 MAPPING TABLES   
   ==================
 
	The relation between input and output of the input automaton in any input mode 
is  represented  in  the  mapping  tables.  The mapping  tables consist of the initial, 
intermediate and final mapping tables.  In the whole process, the intermediate mapping 
plays the main role, with the initial and final mapping  acting as the preparation and 
touchings respectively. 

Each process table has its own representation as shown below: 
(a) Initial mapping table 
  	(a0) Character variable definition
	(a1) Input character representation  [output character representation]

(b) Intermediate mapping table 
  	(b0) Character variable definition
  	(b1) Input character string representation  [output character representation]
 	     [feedback character string representation] 
	(b2) Input character string representation    operation

(c) Final mapping table 
  	(c0) Character variable definition
  	(c1) Input character representation  [output character string representation]


	The initial mapping can only perform mapping between characters.  Intermediate 
mapping is  able to perform  mapping between  character strings, and final mapping can 
perform  mappings from  character  and character string.  Besides, feedback  input can 
only be provided by the intermediate mapping. 

In the above (a1) -- if the input character matches the character in
		     "input character representation", the input automaton converts it 
		     to the character in "output character representation".

	     (b1) -- if the  input character  string  matches  the character string in
		     "input  character  string  representation",  the  input automaton 
		     converts  it to  the character string in "output character string 
	 	     representation".
		     During output, the "feedback character string representation" will 
		     be treated as new input to intermediate mapping.








					- 6-9 -
	     (b2) -- if the input character matches the character in 
		     "input character representation", the input automaton performs the 
		     operation on the mode variables.

	     (c1) -- if the input character matches the character in 
		     "input character representation", the input automaton converts it 
		     to the character string in "output character string representation".
 
 	Note:	
        ~~~~
	* In the above (a) (b) and (c), parts in [ ] are options.
	* One expression should be in the same line.  If there is not enough space for 
	  the expression, it can be continued in the following line by using the \.
	* Anything after a semicolon ";" in a line is treated as comment. 



































					- 6-10 -
1. Variable Definition 
~~~~~~~~~~~~~~~~~~~~~~
Through  definitions and  the use of  variables,  similar mapping relations can be
described easily and effectively.  The format of variable definition is as follows:

 (a) (defvar  variable name  (list  character  ...  ...))
 (b) (defvar  variable name  (all))

In (a), variable  name can  be any of the characters in list.  In (b), the variable 
name can be any character.  The example below show the similar conversion relations. 

 * example 1:
	(defvar A (list B C D) )
	(A)A       (A)a

 * example 2:
	BA    Ba
	CA    Ca
	DA    Da

During the definition and use of variables,
 (a) The variable must be defined before it is used. 
 (b) The variable definition is valid for all the current mapping tables.
 (c) Besides the current mapping tables, the variable definition is not valid for 
     other tables.
 (d) Variables in the same line have the same value.  For example :

		(defvar a1 (list A B))
		(a1) (tolower(a1))  3

	When input [Aa] or [Bb], the result will be 3.  However, there is no match 
	when input is [Ab] or [Ba]. Besides, variable that occurs in the remaining 
	character string must also appear at the same input character string. 
















					- 6-11 -
2. Evaluation of Characters 
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The format of representing  a character is shown below.  This format is different 
from the character string representation.  The evaluation result of the character 
representation must be a  character,  and character can be a single character and 
multi-characters.

 (a) Character representation
	Character   	--  Character other than ---- ( ) ' \ " ; SP 
	'Character' 	--  Character other than ----  ' \ ^  
	'^Character' 	--  Indicates control character <ctrl + character>.  The 
			    character must be between 64-95 or lower case alphabets.
	'\Character' 	--  Indicates special characters.  Generally, '\character'
			    refers to the character after [\].  Besides,
			    '\n', '\t', '\b', '\r', '\f' having the same 
			    meanings as C language;  '\e', '\E' represent ESC;  
			    and '\8 ...' '\o...', '\d ...' ,'\x ...' represent 
			    octal, decimal and hexadecimal repsectively.































					- 6-12 -
 (b) Function representation
	There are some special functions in the automaton.  These functions can be used 
	in direct representation.  The table below gives a summary of the functions.


<Table-c-6.2>


































        Function representation format :
               < 1 > (function name   operand)
               < 2 > (function name   operand   operand)






					- 6-13 -
3. Evaluation of Character String 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	The character string representation is a sequence of character representations, 
which has been described in 2 (Evaluation of characters).  The evaluation result of the 
character string  representation  is also a character string, can be a single character 
as well as multi-characters.

 (a) Character representation 
	Similar  to  the  character  representation  and evaluation in 2 (Evaluation of 
	characters).

 (b) Function representaiton 
  	* function last=
	  If the last character of the most recently mapped character string matches the
	  function parameter, the function evaluates to an empty string. 

  	* function todigit
	  Convert the code given by the first parameter to the value in the base of the
	  code given by the second parameter.






























					- 6-14 -
 (c) Mode operation and evaluation 
	
	Function name			Function
	~~~~~~~~~~~~~       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  	 if 			To evaluate the state of mode operation.  
				If ON, it will be treated as empty character string.

 	 unless 		To evaluate the state of mode operation.  
				If OFF, it will be treated as empty character string.

  	 on 			To set the state of mode operation to ON.

 	 off 			To set the state of mode operation to OFF.

  	 switch 		To switch the mode operation state.  
				In other words, if the state is ON, set it to OFF
				and vice versa.

  	 allon 			Set all modes to ON.

 	 alloff 		Set all modes to OFF.

 	 (error) 		Error handling for input keys that cannot be mapped.

  	 (restart) 		To read  in new  mode definition table and re-define 
				the conversion. If error exists in the new conversion 
				table, an error message will be given and the system 
				returns to the  settings  of the original conversion 
				table.

	Note :	* Function "if" and  "unless" can only be used in the input character 
		  string representations;  
		* "on",  "off" and  "switch" can only be used in the output character
		  string representations; 
		* "allon" and  "alloff" and  "(error)" can only be used in the output 
		  character string representations of intermediate mapping tables.  
		* "(restart)" is used by itself.

 










					- 6-15 -

   6.5 EXAMPLE OF MODE DEFINITION   
   ==============================
 
	We have introduced the input automaton in the above sections.  We will now
give an example of a simple input automaton, using front-end processor "cuum".
Take note that some of the definitions are different from the standard definition.
For example, only two Pinyin input definitions are given here and the Bixing input
definitions are not included.
	Users  who are interested in the input automaton can refer directly to the
files under the standard path.


1. Mode Definition Table (mode)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Standard path
	/usr/local/lib/wnn/zh_CN/rk/rk/mode

* Content
	Relation between mode variables and input mode.


<Table-c-6.3>


























					- 6-16 -
2. Mode Control Table (2A_CTRL)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Standard path
	/usr/local/lib/wnn/zh_CN/rk/2A_CTRL

* Content
	Control of mode variables 

	(defvar pf1 (list '\x81') )
        (defvar pf2 (list '\x82') )

        (unless PIN_YIN)(pf1) (on PIN_YIN)(off ASCII)
        (if PIN_YIN)(pf1)     (switch QUAN_PIN)(switch ER_PIN)

        (unless ASCII)(pf2)   (on ASCII)(off PIN_YIN)
        (if ASCII)(pf2)       (switch QUAN_JIAO)(switch BAN_JIAO)


3. Quanpin Mapping Table (2P_QuanPin)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Standard path
	/usr/local/lib/wnn/zh_CN/rk/2P_QuanPin

* Content
	Mapping table of Quanpin input 

	(defvar A      (list B C D F G H   K L M N P     S T W   Y Z  ))
        (defvar AI     (list B C D   G H   K L M N P     S T W     Z  ))
        (defvar AN     (list B C D F G H   K L M N P   R S T W   Y Z  )) ;ANG
        (defvar AO     (list B C D   G H   K L M N P   R S T W   Y Z  ))
        (defvar E      (list B C D   G H   K L M N     R S T     Y Z  ))
         . . .           . . .

           
<Table-c-6.4>














					- 6-17 -
4. Erpin Mapping Table (ER_PIN)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Standard path
	/usr/local/lib/wnn/zh_CN/rk/2P_ErPin

* Content
	Mapping table of Erpin input

	(defvar A     (list B C E D F G H   K L M N P     S A T W   Y Z V))
        (defvar AI    (list B C E D   G H   K L M N P     S A T W     Z V))
        (defvar AN    (list B C E D F G H   K L M N P   R S A T W   Y Z V));ANG
        (defvar AO    (list B C E D   G H   K L M N P   R S A T W   Y Z V))
        (defvar E     (list B C E D   G H   K L M N     R S A T     Y Z V))
        . . .            . . .


<Table-c-6.5>
































					- 6-18 -
5. Pinyin Error Correction Table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Standard path
	/usr/local/lib/wnn/zh_CN/rk/2P_RongCuo

* Content
	The auto-correcting definition in the Pinyin input


<Table-c-6.6>







































					- 6-19 -
6. Other Mapping Tables
~~~~~~~~~~~~~~~~~~~~~~~

(a) 1B_TOUPPER mapping table
    	Convert the input characters into upper case alphabets.

	(defvar low (all))
        (low)   (toupper (low))

(b) 2P_Tail mapping table

<Table-c-6.7>








(c) 3B_QuanJiao mapping table
    	Convert the input characters to wide ASCII characters.

	(defvar a (all))
        (a)     (toQalpha (a))


	The above mode definition table defines the "Pinyin" , "Erpin", "Banjiao" 
	character and "Quanjiao" input modes.  

	Initially  -- PIN_YIN mode is set to OFF 
		   -- QUAN_PIN mode under PIN_YIN is set to ON 
		   -- ER_PIN is set to OFF
		   -- ASCII mode is set to ON
		   -- BAN_JIAO mode under ASCII is set to ON
		   -- QUAN_JIAO mode under ASCII is set to OFF


	From the above definitions in the mode definition table, during the initial 
state, the input automaton receives  Banjiao input.  Notice that the BAN_JIAO state 
under  the  ASCII  state has  no mapping  table, this  means that the user input is 
received directly by the system.  To input Pinyin, user needs to change the mode to 
QUAN_PIN (under PIN_YIN).  The way of  changing the  mode is defined in the mapping 
table 2A_CTRL  ( see next paragraph ).  Here, we assume that  we are already in the 
QUAN_PIN mode, and the input automaton receives Pinyin input.




					- 6-20 -
	Note that  from the  above mode  definition table, the automaton will first 
follow the  definition of mapping table 1B_TOUPPER to convert the actual user input 
to  upper case  alphabets.  Subsequently, the  automaton  creates the "actual input
received by the system" based on  mapping tables 2P_QuanPin, 2P_RongCuo and 2P_Tail.
We will now show a simple example.


<Table-c-6.8>






Similarly,  the user  is able to change  the mode to Erpin mode to input Hanzi, or 
change to Quanjiao mode to input wide ASCII characters.

































					- 6-21 -