annotate cWnn/manual/chap4 @ 13:778894f4449f

fixed typos
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Sun, 02 Mar 2008 19:47:09 +0900
parents bbc77ca4def5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
1 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
2 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
3 ┃ Chapter 4 PINYIN-HANZI CONVERSION ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
4 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
5 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
6
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
7
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
8 ┏━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
9 ┃ 4.1 OVERVIEW ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
10 ┗━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
11
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
12 In Chapter 3, we have described Pinyin input. In this chapter, we will explain
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
13 in greater details on the Pinyin input environment, and how the input is being processed
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
14 in the system. General concepts on Pinyin-Hanzi conversion are also explained, as well
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
15 as the conversion methods used in cWnn.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
16
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
17
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
18
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
19 ┏━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
20 ┃ 4.2 PINYIN INPUT ENVIRONMENT ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
21 ┗━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
22
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
23 As described in Chapter 3, Pinyin can be input via three methods: Quanpin, Erpin
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
24 and Sanpin. The implementation of these methods is not performed internally, but is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
25 through some definitions in an external environment of the system. The function in the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
26 external environment which allows such definitions is known as Input Automaton. Refer
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
27 to Chapter 7 for details.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
28 Input automaton provides different input environments for different users. For
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
29 example, a user who needs to input Pinyin may use the Pinyin centred input environment,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
30 as explained in Chapter 3.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
31 However, besides the input automaton, user may specify their own Pinyin input
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
32 methods in certain environment files.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
33
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
34
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
35
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
36
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
37
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
38
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
39
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
40
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
41
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
42
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
43
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
44
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
45
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
46
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
47
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
48
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
49
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
50 - 4-1 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
51 Internal/External Representations of Chinese Pronuncations
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
52 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
53 Pinyin is the external representation of Chinese pronunciation. When a user inputs a
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
54 Pinyin at the user interface, the input will first be processed in the input automaton.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
55 Through the input automaton, it will be converted into the standard Pinyin as defined in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
56 the system. This standard Pinyin is known as the internal representation (音码).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
57 For example:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
58
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
59 ┌────────────────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
60 │ The phrase 汉语语音在计算机中的表现 during user input will be: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
61 │ han4yu3yu3yin1zai4ji4suan4ji1zhong1debiao3xian4 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
62 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
63 │ However, the system will automatically convert these to the standard │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
64 │ Pinyin defined in the system, that is: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
65 │ H帳n Y幊 Y幊 Y帺n Z帳i J幀 Su帳n J帺 Zh幁ng De Bi帲o Xi帳n │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
66 └────────────────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
67
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
68 Within the system, each Pinyin is represented by an individual internal code as defined
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
69 in the system. Before the process of Hanzi conversion, the user Pinyin input will first
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
70 be converted into its corresponding internal representations.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
71
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
72 You may observe from the above that the system does not require the user to segment the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
73 Pinyin input string. The user only needs to input the correct Pinyin, and the system
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
74 will perform the segmentation on the input. For example, the input "han4yu3yu3yin1"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
75 will be segmented to "H帳n Y幊 Y幊 Y帺n" automatically by the system. To the system, one
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
76 Pinyin is represented as an individual unit. For example, 汉 is not considered as four
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
77 characters "h", "a", "n", "4", but is represented as a single unit "H帳n". Hence, the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
78 input string will be segmented and displayed as a more readable form to the user.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
79
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
80 The Pinyin input interface is an editor by itself. Besides having the input feature
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
81 mentioned above, facilities such as cursor movement, inserting and deleting operations
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
82 on the input string are also provided.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
83
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
84
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
85
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
86
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
87
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
88
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
89
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
90
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
91
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
92
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
93
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
94
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
95
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
96
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
97
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
98
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
99
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
100 - 4-2 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
101 ┏━━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
102 ┃ 4.3 CONCEPT ON PINYIN-HANZI CONVERSION ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
103 ┗━━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
104
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
105 A good Pinyin input system should provide users with a good Pinyin input environment
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
106 and a Pinyin-Hanzi conversion mechanism with high accuracy. Pinyin-Hanzi conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
107 refers to converting the input from Pinyin to the expected Chinese character(Hanzi).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
108
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
109 There are 3 categories of conversion mechanism:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
110 (a) Conversion based on character
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
111 (b) Conversion based on word
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
112 (c) Conversion based on phrase or any arbitrary Pinyin string
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
113
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
114 (a) Conversion based on character
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
115 This conversion mechanism only allows one Pinyin to be input. The conversion result
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
116 is a Chinese character (Hanzi), which has the same pronunciation as the Pinyin input.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
117 We must take note that there are several Hanzi that have the same pronunciation.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
118 This would mean that the Pinyin that has been input will correspond to many Hanzi.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
119 In order to obtain the correct Hanzi, it has to be selected manually among all the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
120 candidates. For example,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
121
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
122 The Pinyin "Zh幁ng幚" corresponds to Hanzi 中, 钟, 仲 ..etc. Hence, if the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
123 user wants the word 中国, then he has to select the Hanzi 中.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
124
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
125 This mechanism of conversion is time consuming and is not a convenient way of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
126 conversion.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
127
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
128
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
129 (b) Conversion based on word
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
130 In this conversion mechanism, more than one Pinyin is allowed. This Pinyin input
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
131 will correspond to the expected Chinese word. A word may consist of more than
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
132 one character. Hence, by having word based conversion mechanism, the number of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
133 candidates is much reduced. For example,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
134
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
135 The word 中国 consist of characters 中 and 国. If the user wants this word,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
136 he only needs to input "Zh幁ng幚Gu幃幚" and the conversion result will be 中国.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
137
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
138 We can see from the above example that the number of candidate selections is reduced.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
139 However, user must have the concept of word during input, and we need to take note
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
140 that only words that are registered in the system are valid. Hence, the need of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
141 candidate selection still exists.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
142
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
143
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
144
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
145
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
146
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
147
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
148
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
149
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
150 - 4-3 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
151 (c) Conversion based on phrase or any arbitrary Pinyin string
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
152 For this conversion, the user is able to input any arbitrary length of Pinyin. That
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
153 is, the user terminates the Pinyin input string whenever he thinks is suitable. The
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
154 system will analyse the input string, performs the necessary grammatical analysis
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
155 and word segmentation, and subsequently produces a more accurate conversion output.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
156 The number of conversions is very much reduced than in (b).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
157
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
158 cWnn system makes use of this mechanism of conversion, hence provides a more flexible
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
159 user input interface. The diagram below shows the conversion process for the entire
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
160 cWnn system.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
161
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
162
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
163 ↓ Input Output ↑
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
164 ━┿━━━ ━━┿━━ User Interface
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
165 ┌────┼────────────┼──────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
166 │ │ │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
167 │ │ ↓ Internal / External │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
168 │ │ │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
169 │ ┏━┷━━━━┓ ┏━┷━━┓ ┏━━━━━━┓ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
170 │ ┃ Input ┠──↓──┨ Editor ┠────┨ Conversion ┃ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
171 │ ┃ Automaton ┃External/ ┗━━━━┛ ┃ Mechanism ┃ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
172 │ ┗━┯━━━━┛Internal ┗━━━┯━━┛ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
173 │ │ │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
174 │ ┏━┷━━━━━┓ ┏━━━┷━━━┓ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
175 │ ┃ Input ┃ ┃ Conversion ┃ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
176 │ ┃ Environment ┃ ┃ Environment ┃ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
177 │ ┗━━━━━━━┛ ┗━━━━━━━┛ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
178 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
179 └────────────────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
180
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
181
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
182
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
183
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
184
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
185
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
186
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
187
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
188
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
189
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
190
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
191
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
192
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
193
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
194
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
195
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
196
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
197
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
198
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
199
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
200 - 4-4 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
201 ┏━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
202 ┃ 4.4 PINYIN-HANZI CONVERSION IN CWNN ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
203 ┗━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
204
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
205 In cWnn system, there are two ways of conversion: (1) Forward conversion (正向变换)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
206 (2) Reverse conversion (逆向变换)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
207 Forward conversion refers to Pinyin-Hanzi conversion, whereas reverse conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
208 refers to Hanzi-Pinyin conversion. In Pinyin-Hanzi conversion, Pinyin is the input and
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
209 the conversion result is the corresponding Hanzi. Vice versa for Hanzi-Pinyin conversion.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
210
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
211 As mentioned in Section 4.3, no conversion mechanism is able to perform a 100%
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
212 accuracy conversion. Hence, besides providing a multi-phrase conversion mechanism, cWnn
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
213 also provides facilities to perform re-editing, re-conversion as well as to allow the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
214 user to segment the words and phrases manually.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
215
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
216 We will now explain the Pinyin-Hanzi conversion mechanism, as well as the assessment
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
217 formula for the multi-phrase conversion in cWnn system.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
218
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
219
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
220 1. Conversion Mechanism
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
221 ━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
222 Pinyin-Hanzi conversion includes the following five conversions. The first three listed
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
223 below are most commonly used. The last two conversions are meant for system developers
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
224 to check on grammatical analysis.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
225
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
226 (a) Multi-phrase conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
227 The concept of multi-phrase conversion has been mentioned in Section 4.3. In
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
228 cWnn, once a Pinyin input string is sent for conversion, the system will perform
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
229 the conversion based on the current environment (refer to Chapter 5) as well as
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
230 the conversion parameters of the current environment. After conversion, the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
231 result will appear on the input line, with the cursor positioned at the first
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
232 word of the sentence. If a re-conversion is required (done by pressing the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
233 conversion 变换 key again), the conversion method as in (c) will be performed.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
234
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
235 (b) Word conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
236 The concept of word conversion has been mentioned in Section 4.3. In cWnn, the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
237 portion of the input string indicated by the cursor is treated as a word and
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
238 conversion is performed based on this word. The candidate word that has the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
239 highest assessment value is output as the result.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
240 For example, Pinyin "Shi幚Yong幚" corresponds to Hanzi such as 使用, 适用, 施用,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
241 实用, 食用 and 试用 ...etc. However, 使用 has the highest assessment value in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
242 the system. Hence, 使用 will be the initial conversion result.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
243
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
244 (c) Word candidates extraction
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
245 Treat the portion of the input string indicated by the cursor as a word and
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
246 perform word conversion. Output all the possible word candidates.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
247 From the above example of (a), if 使用 is not the word that you want, press the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
248 conversion 变换 key again to get all the possible candidates such as 适用, 施用,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
249 实用, 食用 and 试用 ...etc. and select accordingly.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
250 - 4-5 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
251 (d) Phrase conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
252 Treat the portion of the input string indicated by the cursor as a phrase and
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
253 perform phrase conversion. Output the candidate phrase that has the highest
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
254 assessment value as result.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
255
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
256 (e) Phrase candidates extraction
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
257 Treat the portion of the input string indicated by the cursor as a phrase and
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
258 perform phrase conversion. Output all the possible phrase candidates.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
259
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
260
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
261
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
262 2. Manual Word Segmentation
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
263 ━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
264 Automatic character segmentation has been mentioned in Section 4.2. cWnn also performs
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
265 word segmentation. However, in Pinyin-Hanzi conversion, automatic word segmentation may
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
266 not be 100% accurate. Hence, when the conversion result is incorrect, the user needs to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
267 segment the words manually by using the segmentation keys (^O or ^I). The word indicated
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
268 by the cursor will be segmented. To complete the manual segmentation process, press the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
269 conversion key again. For example,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
270 ┎────────────────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
271 │ In the phrase 今天天气正好, 今天 ,天气 and 正好 are treated as words. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
272 │ We know that 正好 is not the correct word. Hence we need to segment │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
273 │ the word 正好 to individual characters, then perform a re-conversion. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
274 │ (1) First, move the cursor to 正好, then press ^I to separate the │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
275 │ word. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
276 │ (2) The word will be converted back to individual Pinyin. You may now │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
277 │ do a re-conversion be pressing the conversion 变换 key again. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
278 └────────────────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
279
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
280 Similarly, to ^O may be used to combine characters into one unit. For exmaple,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
281 ┎────────────────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
282 │ 你好 is treated as separate characters. In order to make these two │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
283 │ characters as one unit. You may do the following: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
284 │ (1) Place the cursor at 你, then press ^O to combine 你 and 好. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
285 │ (2) The characters will be converted back to Pinyin. You may now do a │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
286 │ re-conversion be pressing the conversion 变换 key again. If no │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
287 │ candidate for this word exists, a message will be displayed │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
288 │ " 侯补1个也没有 (怎么办)". This means that the word does not │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
289 │ exist in the current dictionaries. │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
290 └────────────────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
291
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
292 NOTE: Another way to perform ^O is by using the 文节→ key.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
293 Another way to perform ^I is by using the 文节← key.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
294
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
295
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
296
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
297
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
298
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
299
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
300 - 4-6 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
301 3. Assessment Formula for Multi-Phrase Conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
302 ━━━━━━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
303 The multi-phrase conversion plays a major role in the Pinyin-Hanzi conversion. The level
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
304 of accuracy for this conversion has a direct effect on the effectiveness of the system.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
305 There are several factors that affect the conversion result of Pinyin-Hanzi conversion,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
306 each differs according to different conditions. The followings are the assessment
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
307 formula for multi-phrase conversion. Users are able to change the corresponding
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
308 conversion parameters in order to obtain the most suitable conversion environment.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
309
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
310 (a) Assessment parameters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
311 parameter (0) Number of phrase "n"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
312 During the assessment process, this is the maximum number of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
313 phrases that can be assessed at one time.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
314 The default "n" value in cWnn is "1".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
315
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
316 parameter (1) Number of words "m"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
317 During the assessment process, this is the maximum number of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
318 words that can be in a phrase.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
319 The default "m" value in cWnn is "5".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
320
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
321 (b) Word assessment parameters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
322 parameter (2) Usage frequency weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
323 A usage frequency is given to each word in the dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
324 When a user uses the dictionary, the system will create as
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
325 well as manage a usage frequency file for the user. As the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
326 user uses the system, the usage frequency of each word in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
327 the dictionary will be updated according to how often the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
328 user uses each word. Hence, each user will have his
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
329 individual usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
330 The default value in cWnn is "2".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
331
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
332 parameter (3) Word length weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
333 Word length refers to the number of characters in a word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
334 The default value in cWnn is "750".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
335
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
336 parameter (4) Tone correctness weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
337 This gives higher assessment values to words entered with
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
338 correct four tones, although cWnn allows input with or
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
339 without four tones. The default value in cWnn is "10".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
340
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
341 parameter (5) Last used weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
342 Last used refers to the most recently used word for a Pinyin.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
343 By increasing the weight of this parameter, the assessment
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
344 value of recently used word can be increased.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
345 The default value in cWnn is "80".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
346
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
347
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
348
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
349
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
350 - 4-7 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
351 parameter (6) Dictionary priority weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
352 Each dictionary has a priority defined by the environment.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
353 By changing this value, assessment values may be biased
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
354 towards certain dictionaries.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
355 The default value in cWnn is "10".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
356
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
357 (c) Phrase assessment parameters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
358 parameter (7) Average word assessment value weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
359 A phrase consists of several words, and each word has its
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
360 own word assessment value as described above. The average
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
361 of these values is the average word assessment value.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
362 The default value in cWnn is "5".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
363
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
364 parameter (8) Phrase length weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
365 Phrase length refers to the number of characters in a phrase.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
366 The default value in cWnn is "1000".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
367
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
368 parameter (9) Number of words in phrase weight
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
369 This refers to the the number of words in a phrase. Larger
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
370 number of words in a phrase shows greater grammatical
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
371 certainty among the words, and hence higher reliability.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
372 The default value in cWnn is "50".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
373
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
374 (d) Other paramters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
375 Characters other than Hanzi that appear at the input line have their own
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
376 individual weights. The followings are the parameters:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
377
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
378 parameter (10) Usage frequency of numerals
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
379 The default value in cWnn is "0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
380
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
381 parameter (11) Usage frequency of alphabets
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
382 The default value in cWnn is "-200".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
383
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
384 parameter (12) Usage frequency of symbols
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
385 The default value in cWnn is "0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
386
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
387 parameter (13) Usage frequency of open parentheses
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
388 The default value in cWnn is "0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
389
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
390 parameter (14) Usage frequency of close parentheses
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
391 The default value in cWnn is "0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
392
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
393 parameter (16) Maximum number of candidates allowed during conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
394 The default value in cWnn is "16".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
395
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
396
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
397
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
398
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
399
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
400 - 4-8 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
401 (e) Assessment formula for multi-phrase conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
402 Pinyin-Hanzi conversion in cWnn is based on an assessment formula. We can see
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
403 from the above that each parameter has its value. By increasing their values,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
404 their weightage in the conversion process will increase.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
405
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
406 The formulae shown below determine the assessment values for a word, a phrase
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
407 and the total assessment value for candidates of a phrase.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
408
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
409 Assessment value for word :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
410 f = (c1 x frequency) + (c2 x word length)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
411 + (c3 x tone correctness) + (c4 x last used)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
412 + (c5 x dictionary priority)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
413
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
414 Assessment value for phrase :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
415 F = k1 x avg( f1, f2, ..fm ) + (k2 x phrase length)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
416 + (k3 x number of words in phrase)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
417
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
418 Total assessment value for candidates of a phrase :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
419 Vi = avg( Fi1 + Fi2 + ... + Fin )
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
420
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
421 Best assessment value for a phrase :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
422 MAX( V1, V2, ... Vk )
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
423
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
424
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
425 NOTE:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
426 * c1 = parameter (2)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
427 c2 = parameter (3)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
428 c3 = parameter (4)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
429 c4 = parameter (5)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
430 c5 = parameter (6)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
431
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
432 * k1 = parameter (7)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
433 k2 = parameter (8)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
434 k3 = parameter (9)
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
435
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
436
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
437 The above mentioned parameter values are the default values set in cWnn. These
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
438 default values may be set in a environment file. Refer to Section 5.3.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
439
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
440 The default values may be changed dynamically by using the environment operation
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
441 functions. Refer to Section 5.2 for explanations.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
442
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
443
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
444
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
445
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
446
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
447
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
448
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
449
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
450 - 4-9 -