annotate cWnn/manual.en/chap7 @ 10:fc3022f61fc7

tiny clean up
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Fri, 21 Dec 2007 17:23:36 +0900
parents bbc77ca4def5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
1 *************************************************
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
2 * Chapter 7 CWNN FILE MANAGMENT *
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
3 *************************************************
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
4
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
5
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
6
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
7
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
8
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
9 7.1 OVERVIEW
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
10 =============
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
11
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
12 In cWnn system, the server plays an important role in managing the different
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
13 resources and files.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
14 The files to be read in during cserver startup, as well as when requested by
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
15 certain client, are shown as follows :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
16
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
17 (1) Dictionary files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
18 (2) Usage frequency files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
19 (3) Grammar files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
20 (4) Pinyin error tolerance files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
21
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
22 The above cWnn files are explained in this chapter.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
23
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
24
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
25
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
26
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
27
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
28
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
29
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
30
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
31
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
32
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
33
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
34
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
35
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
36
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
37
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
38
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
39
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
40
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
41
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
42
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
43
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
44
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
45
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
46
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
47
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
48
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
49
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
50 - 7-1 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
51
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
52 7.2 DICTIONARY FILES
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
53 =====================
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
54
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
55 Dictionary is classified into two categories : (1) Text format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
56 (2) Binary format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
57 Text format dictionary can be read but binary format dictionary cannot be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
58 read. The text dictionary is converted to binary format using the "atod" utility
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
59 (please refer to Section 4.6). Only the binary format dictionary is used by cWnn
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
60 system. However, the binary form of dictionary can be converted back to text form
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
61 using the "dtoa" utility (please refer to Section 4.7). The number of words which
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
62 can be stored in a dictioanry is between 0 to 65535.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
63
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
64
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
65 1. Dictionary in Text Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
66 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
67 The format of the text dictionary is shown below. The first two lines can be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
68 omitted. However, the text dictionary obtained from "dtoa" has the following format.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
69
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
70 +-------------------------------------------------------+
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
71 | \comment Comment <CR> |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
72 | \total Total frequency <CR> |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
73 | Pinyin word Cixing Frequency <CR> |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
74 | Pinyin word Cixing Frequency <CR> |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
75 | Pinyin word Cixing Frequency <CR> |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
76 | : : : : |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
77 | : : : : |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
78 | : : : : |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
79 | |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
80 | (EOF) |
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
81 +-------------------------------------------------------+
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
82
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
83 * Comment : Comments can be added in a dictionary
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
84 * Total frequency : This is the total usage frequency for all the conversion
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
85 performed using this dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
86 * Pinyin : For the Pinyin-Hanzi conversion dictionary, the Pinyin
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
87 here refers to the pronunciation for each word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
88 For Bianma dictionary, the Pinyin refers to the code for
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
89 each word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
90 The maximum length for Pinyin is 256 characters.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
91 * Word : Each word should not exceed 256 characters.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
92 If a space, carriage return or other special characters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
93 need to be added in the word, it can be done by appending
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
94 them in octal after "\0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
95 If characters other than "0" is appended after the "\",
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
96 it will refer to the character itself.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
97 For example, "\\" refers to "\" itself.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
98
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
99
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
100 - 7-2 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
101 * Cixing : Refers to the part of speech defined in the grammar file,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
102 such as noun and pronoun. For details, please refer to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
103 grammar files explained in 7.4.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
104 * Frequency : The frequency of usage for each word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
105 In the text format dictionary, the range value of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
106 frequency is between 0 and 2400.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
107
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
108
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
109 2. Dictionary in Binary Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
110 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
111 This is the dictionary used by the cWnn system. While reading in a file, the server is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
112 able to determine if the file is a dictionary via the binary format. Once a dictionary
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
113 is accessed by the server, its contents may be changed. During the termination of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
114 server, the updated dictionary will be written back to the file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
115
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
116 All the words in the dictionary have serial numbers. The serial numbers are for the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
117 purpose of matching between the words in the dictionary and the usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
118
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
119
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
120 3. System Dictionary and User Dictionary
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
121 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
122 System dictionary refers to the dictionary provided by the system itself. For the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
123 Pinyin input and Zhuyin input environment, the following dictionary files are used :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
124
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
125 /usr/local/lib/wnn/zh_CN/dic/sys/level-1.dic
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
126 /usr/local/lib/wnn/zh_CN/dic/sys/level-2.dic
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
127 /usr/local/lib/wnn/zh_CN/dic/sys/basic-1.dic
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
128 /usr/local/lib/wnn/zh_CN/dic/sys/basic-2.dic
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
129
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
130 User dictionary refers to dictionary that is created by the user. The user is able to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
131 add or delete his own words into or from the dictionary. The dictionary structure is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
132 similar to that of the system dictionary. The user dictionary has the standard path
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
133 "/usr/local/lib/wnn/zh_CN/dic/usr/@USR/ud".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
134
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
135 "ud" is the standard filename for user dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
136 Both system and user dictionaries can be added or removed through the setting of the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
137 system environment. Please refer to chapter 5 for details.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
138
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
139
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
140 4. Logical Dictionary and Dictionary Files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
141 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
142 In the cWnn system, several clients are connected to one server, and all the resources
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
143 managed by the server are used by the different clients. Each dictionary file may have
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
144 several different usage frequency files. Hence, we may say that there are several
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
145 dictionaries existing logically.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
146 In addition, a dictionary can be used for reverse conversion, such as Pinyin-Hanzi
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
147 conversion and Hanzi-Pinyin conversion. Hence, there are two dictionaries logically.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
148 For details, please refer to "wnnstat" in Section 4.6.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
149
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
150 - 7-3 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
151
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
152 7.3 USAGE FREQUENCY FILES
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
153 ==========================
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
154
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
155 Usage frequency files are attached to dictionaries. In every dictionary, there
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
156 exists information on each word's usage frequency. This information represents the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
157 standard usage frequency for each word in the dictionary. The standard usage frequency
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
158 is obtained from statistical results by analying large amount of Chinese articles.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
159 Since the usage frequency information is included in the text form of dictionary, there
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
160 is no explicit system usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
161
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
162 Note that the standard usage frequency defined by the system may not be suitable
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
163 for all users (client). Hence, besides the standard frequency, the server will create a
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
164 user usage frequency file for each user. The initial file is a copy of the standard
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
165 file, and it is created when the user executes the client for the first time. As the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
166 system is being used by the user, the usage frequency of each word will be changed
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
167 according to how often a word is being used. Therefore, this user frequency file is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
168 accustomed to the individual user. During the termination of the server or environment,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
169 the user usage frequency file will be updated. This file will be read in by the server
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
170 when the same user activates the client again, instead of creating a new frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
171
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
172 The usage frequency of each word in the dictionary plays a part in the Hanzi
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
173 conversion. Hence, the weight for usage frequency can be changed to adjsut its impact
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
174 on the conversion process so as to obtain a more accurate conversion result. In the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
175 conversion evaluation, there is a "last used" information which also resides in the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
176 usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
177
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
178 The standard path for the usage frequency file is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
179 "/usr/local/lib/wnn/zh_CN/dic/usr/@USR/dictionary_name.h".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
180
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
181
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
182
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
183
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
184
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
185
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
186
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
187
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
188
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
189
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
190
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
191
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
192
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
193
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
194
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
195
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
196
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
197
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
198
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
199
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
200 - 7-4 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
201
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
202 7.4 GRAMMAR FILES AND CIXING FILES
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
203 ===================================
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
204
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
205 The definition of the grammar files and Cixing file are dependent of the system.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
206 Substantial knowledge on Chinese grammar and the Pinyin-Hanzi conversion process of this
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
207 system are required to understand them. We will now give only some necessary and brief
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
208 explanations.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
209
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
210
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
211 1. Cixing File
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
212 ~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
213 Cixing file defines a set of grammatical attributes, which is based upon to define
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
214 Chinese grammar. The grammatical attributes of all the words in the dictionary must be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
215 in this set.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
216
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
217 Standard path : /usr/local/lib/wnn/zh_CN/dic/cixing.data
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
218
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
219 The Cixing file is intepreted line by line. Whatever that comes after a semicolon ";"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
220 in a line is regarded as comments, and a backslash "\" means it will be continued on
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
221 the following line.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
222
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
223 The Cixing file is divided into three portions :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
224
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
225
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
226 <Table-c-7.1>
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
227
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
228
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
229
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
230
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
231
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
232
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
233
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
234
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
235
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
236
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
237
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
238
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
239
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
240
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
241
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
242
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
243
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
244
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
245
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
246
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
247
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
248
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
249
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
250 - 7-5 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
251 2. Grammar Files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
252 ~~~~~~~~~~~~~~~~
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
253 Based on the defined set of Cixing, the grammar files define a grammar for Chinese.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
254 The text format of the grammar files is readable but the binary format cannot be read.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
255 The conversion from readable grammar files to binary format is through the "atoc"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
256 utility (refer to Section 4.8). Only this binary form of grammar file can be used by
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
257 the server.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
258
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
259 Standard path : /usr/local/lib/wnn/zh_CN/dic/full.con
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
260
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
261 When the server reads the binary form of grammar file, it is able to determine whether
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
262 it is a grammar file. Two or more grammar files can be managed by the server.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
263 Different user environment can make use of different grammar files, and a user is also
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
264 able to change the grammar file dynamically.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
265
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
266 For related grammar file in text form, please refer to "CWNN APPLICATION DEVELOPMENT
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
267 MANUAL".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
268
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
269
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
270
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
271
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
272
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
273
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
274
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
275
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
276
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
277
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
278
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
279
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
280
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
281
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
282
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
283
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
284
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
285
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
286
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
287
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
288
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
289
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
290
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
291
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
292
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
293
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
294
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
295
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
296
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
297
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
298
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
299
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
300 - 7-6 -