annotate cWnn/manual/chap8 @ 17:a3fdd8ad75dc

2ch dictionary has been added
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Mon, 14 Apr 2008 17:01:14 +0900
parents bbc77ca4def5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
1 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
2 ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
3 ┃ Chapter 8 CWNN FILE MANAGEMENT ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
4 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
5 ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
6
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
7
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
8 ┏━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
9 ┃ 8.1 OVERVIEW ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
10 ┗━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
11
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
12 In cWnn system, the cserver plays an important role in managing the different
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
13 resources and files.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
14
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
15 Resource files are read in during cserver startup. If the files are not read,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
16 they will be read in by the cserver subsequently when requested by certain front-end
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
17 processors during their startup.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
18
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
19 There are three categories of files in cWnn, namely:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
20
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
21 (1) Dictionary files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
22 (2) Usage frequency files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
23 (3) Grammar files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
24
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
25 We will now explain in details each of the three cWnn file types.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
26
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
27
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
28
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
29
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
30
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
31
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
32
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
33
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
34
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
35
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
36
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
37
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
38
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
39
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
40
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
41
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
42
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
43
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
44
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
45
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
46
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
47
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
48
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
49
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
50 - 8-1 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
51 ┏━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
52 ┃ 8.2 DICTIONARY FILES ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
53 ┗━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
54
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
55 Dictionary is classified into two categories : (1) Text format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
56 (2) Binary format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
57 Text format dictionary is readable, but binary format dictionary is not
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
58 readable. The text format dictionary is converted to binary format using the "catod"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
59 utility (refer to Section 6.7). Only the binary format dictionary is used by cWnn
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
60 system. The binary format dictionary may be converted back to text format via the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
61 the "cdtoa" utility (refer to Section 6.8).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
62
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
63 The maximum number of words allowed in a dictionary is 70,000.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
64
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
65
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
66 1. Dictionary in Text Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
67 ━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
68 The format of the text dictionary is shown below.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
69
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
70 The text format is as follows:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
71 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
72 ┌───────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
73 │ \comment <Comment> <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
74 │ \total <Total_frequency> <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
75 │ \cixing <Dict_cixing> <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
76 │ \Pinyin <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
77 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
78 │ pinyin word Cixing Frequency <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
79 │ pinyin word Cixing Frequency <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
80 │ pinyin word Cixing Frequency <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
81 │ : : : : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
82 │ : : : : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
83 │ (EOF) │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
84 └───────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
85
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
86 Description:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
87 - comment : These are comments in a dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
88 - total : This is the total number of times a dictionary is used
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
89 for conversion, ie, the usage frequency of a dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
90 - cixing : This specifies the part of speech used by THIS particular
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
91 dictionary ONLY. The format of the part of speech here
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
92 is the same as that in the system standard cixing file
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
93 (cixing.data). Refer to Section 8.4.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
94 If the part of speech is NOT specified here, the default
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
95 file will be "/usr/local/lib/wnn/zh_CN/cixing.data".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
96 - Pinyin : This determines the type of dicionary. It can be "Zhuyin"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
97 or "Bixing", depending on the dictionary itself.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
98
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
99
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
100 - 8-2 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
101 - pinyin : For the Pinyin-Hanzi conversion dictionary, the Pinyin
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
102 here refers to the pronunciation for each character/word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
103 For encoded input, the Pinyin refers to the code of each
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
104 character/word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
105 The maximum length for Pinyin is 256 characters.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
106
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
107 - word : This refers to the actual Chinese character/word. Each
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
108 character or word should not exceed 256 characters.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
109 If a space, carriage return or other special characters
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
110 are needed to be added to the character/word, it can be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
111 done by appending them in octal after "\0".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
112 If characters other than "0" is appended after the "\",
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
113 it will refer to the character itself.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
114 For example, "\\" refers to "\" itself.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
115 - Cixing : This refers to the part of speech defined in the grammar
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
116 file, such as noun, pronoun etc. For details, refer to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
117 grammar files explained in 8.4.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
118 - Frequency : The usage frequency for each word.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
119
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
120
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
121 Example of a text format dictionary:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
122 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
123 ┌───────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
124 │ \comment This is a Pinyin dictionary │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
125 │ \total 0 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
126 │ \cixing │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
127 │ \Pinyin │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
128 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
129 │ W幆幚 我 人称代 1200 <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
130 │ R帵n幚 人 单位量 10 <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
131 │ De幚 的 语气 30 <CR> │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
132 │ : : : : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
133 │ : : : : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
134 └───────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
135
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
136
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
137 2. Dictionary in Binary Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
138 ━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
139 This is the binary format dictionary used by the cWnn system. While reading in a file,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
140 the cserver is able to determine whether the file is a dictionary via the binary format.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
141 Once a dictionary is accessed by the cserver, its contents may be changed. During the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
142 termination of cserver, the updated dictionary will be written back to the file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
143
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
144 Each tuple (词条) in a dictionary has a serial number. The serial number is used for
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
145 matching the tuples in a dictionary with those in the usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
146
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
147
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
148
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
149
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
150 - 8-3 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
151 3. System Dictionary and User Dictionary
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
152 ━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
153 System dictionary refers to the dictionary provided by the system itself. There are
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
154 two types of system dictionaries. One consists of only characters, while the other
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
155 consists of words. For the Pinyin input and Zhuyin input environments, the following
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
156 dictionary files are used :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
157
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
158 (1) level_1.dic - consists of only characters (单字). These are the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
159 Chinese characters that are more commonly used.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
160 (2) level_2.dic - consists of Chinese characters that are not so
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
161 commonly used.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
162 (3) basic.dic - This is a word dictionary ie. it consists of single
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
163 character word (单字词) and multi-character words
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
164 (多字词).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
165
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
166 User dictionary refers to dictionary that is created by the user. This dictionary
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
167 allows the user to register or delete his own words. The dictionary structure is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
168 similar to that of the system dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
169
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
170
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
171 4. Assess of Dictionary Files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
172 ━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
173 Both system and user dictionaries can be added or removed through the settings of the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
174 environment files.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
175
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
176 It may be set via the "setdic" command in the initialization file "cserverrc" (refer
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
177 to Section 5.3) or in the initialization file "wnnenvrc" (refer to Section 5.5).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
178 Similar settings need to be done for the reverse initialization file "wnnenvrc_R"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
179 (refer to Section 5.6).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
180
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
181 Default path for system dictionary : /usr/local/lib/wnn/zh_CN/dic/sys/
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
182 ".dic" is the default filename extension for dictionary. For example, level_1.dic
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
183
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
184 Default path for user dictionary : /usr/local/lib/wnn/zh_CN/dic/usr/@USR/
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
185 "ud" is the default filename for user dictionary.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
186
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
187
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
188 5. Logical Dictionary and Dictionary Files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
189 ━━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
190 In the cWnn system, several front-end processors are connected to the cserver, and all
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
191 the resources managed by cserver are utilized by the different front-end processors.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
192 Each dictionary file may combine with several different usage frequency files. Hence,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
193 each combination will form different dictionary logically.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
194
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
195 A dictionary may also be used for both forward and reverse conversion, such as Pinyin-
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
196 Hanzi conversion and Hanzi-Pinyin conversion. Hence, they form two separate logical
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
197 dictionaries. For details, refer to "cwnnstat" in Section 6.4.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
198
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
199 NOTE: ONE default dictionary may form several logical dictionaries.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
200 - 8-4 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
201 ┏━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
202 ┃ 8.3 USAGE FREQUENCY FILES ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
203 ┗━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
204
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
205 Usage frequency files are attached to a dictionary. In every dictionary, there
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
206 are information on the usage frequency of each word. This information represents the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
207 default usage frequency for each word in the dictionary. The default usage frequency
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
208 is obtained from statistical results by analysing large amount of Chinese articles.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
209
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
210 Since the usage frequency information of each word is already included in the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
211 text format dictionary, there is NO need for an explicit text format of usage frequency
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
212 file. Refer to the example of text format dictionary in Section 8.2 above.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
213
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
214
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
215 1. System Usage Frequency File and User Usage Frequency File
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
216 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
217 Note that the default usage frequency defined by the system may not be suitable for all
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
218 users. Hence, besides the default usage frequency, the cserver will create a user usage
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
219 frequency file for each user. The initial file is a copy of the default file, and it
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
220 is created when the user starts the front-end processor for the first time. As the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
221 system is being used by the user, the usage frequency of each word will be changed
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
222 according to how often a word is being used. Therefore, this user frequency file is
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
223 accustomed to the individual user. During the termination of the cserver, or during the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
224 termination of those environments using the frequency file, the user usage frequency file
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
225 will be updated. When the same user activates the front-end processor again, instead of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
226 creating a new user usage frequency file, the updated frequency file will be read in by
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
227 cserver.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
228
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
229 The usage frequency of each word in the dictionary plays a part in the Hanzi conversion.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
230 Hence, the weight for usage frequency of each word may be changed to adjust its impact on
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
231 the conversion process so as to obtain a more accurate conversion result.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
232
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
233 In the conversion evaluation, there is a "last used" information which also resides in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
234 the usage frequency file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
235
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
236
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
237 2. Assess of Usage Frequency Files
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
238 ━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
239 Usage Frequency Files is specified in the initialization file "wnnenvrc" (refer to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
240 Section 5.5) and "wnnenvrc_R" (refer to Section 5.6).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
241
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
242 Default path for usage frequency file: /usr/local/lib/wnn/zh_CN/dic/usr/@USR/
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
243 ".h" is the default filename extension for usage frequency file. For example, basic.h,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
244 level_1.h, level_2.h.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
245
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
246
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
247
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
248
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
249
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
250 - 8-5 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
251 ┏━━━━━━━━━━━━━━━━━━━┓
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
252 ┃ 8.4 GRAMMAR FILES AND CIXING FILES ┃
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
253 ┗━━━━━━━━━━━━━━━━━━━┛
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
254
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
255 The definition of the grammar(词法) files and part of speech(词性) file are
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
256 dependent of the system. Substantial knowledge on Chinese grammar and the Pinyin-Hanzi
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
257 conversion process of this system are required in order to understand them. We will now
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
258 only give some necessary and brief explanations on the grammar used in cWnn.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
259
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
260 NOTE: We will now refer part of speech as Cixing (词性).
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
261
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
262
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
263 1. Cixing File in Text Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
264 ━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
265 Cixing file defines a set of grammatical attributes, which is based upon to define
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
266 the Chinese grammar. The grammatical attributes of all the words in the dictionary must
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
267 be in this Cixing file.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
268
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
269 The content in the Cixing file is intepreted line by line. Whatever that comes after
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
270 a semicolon ";" in a line is regarded as comments. A backslash "\" means it will be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
271 continued on the following line. Refer to cWnn default Cixing file for example.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
272
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
273 The Cixing file is divided into three portions:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
274 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
275 (a) Tree structure : During a word add operation via the front-end processor,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
276 the user needs to choose the appropriate grammatical
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
277 attribute for the word to be added.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
278 This tree structure will be searched accordingly until
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
279 the user has chosen the required grammatical attribute.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
280 For example :
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
281 普通名词/|普通名:人名—:事物名—
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
282
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
283 This means that 普通名词 can be further classified into
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
284 普通名, 人名 or 事物名. Only the leaves are the actual
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
285 Cixing that can be attached to words.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
286
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
287 (b) Cixing definitions : These are Cixing that may include Chinese characters,
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
288 such as 普通名 and 单字.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
289 "@" refers to a null Cixing, and "@" may replace any
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
290 new Cixing to be appended, without affecting the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
291 compatibility with the existing dictionary and grammar
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
292 files.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
293
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
294 (c) Combined Cixing : This defines the combined Cixing that contain two of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
295 more grammatical definition attributes. Combined
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
296 Cixing can be assigned to single word and they reduce
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
297 the number of tuples (词条) having the same Chinese
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
298 characters.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
299
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
300 - 8-6 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
301 Example of a text format Cixing file:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
302 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
303 ┌──────────────────────────────┐
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
304 │ ;;;; 词性的树型结构: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
305 │ 名词/|普通名词/:抽象名:时间名:处所名:方位名词/:表人特殊名/ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
306 │ 普通名词/|普通名:人名—:事物名— │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
307 │ 方位名词/|单纯方位名:合成方位名 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
308 │ 表人特殊名/|百家性—:称谓名— │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
309 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
310 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
311 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
312 │ ;;;; 词性的定义: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
313 │ 终止 ;;; 0 终止, 作为文节的终止 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
314 │ 数字 ;;; 1 数字 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
315 │ @ ;;; 11 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
316 │ 单字 ;;; 13 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
317 │ 普通名 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
318 │ 人名— │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
319 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
320 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
321 │ │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
322 │ ;;;; 复合词性定义: │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
323 │ 姓名词-$普通名:百家姓— │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
324 │ 表人物量-$表人量:表物量 │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
325 │ 行为动词-$及物动—:不及物动— │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
326 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
327 │ : │
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
328 └──────────────────────────────┘
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
329
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
330
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
331
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
332
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
333
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
334
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
335
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
336
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
337
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
338
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
339
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
340
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
341
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
342
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
343
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
344
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
345
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
346
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
347
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
348
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
349
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
350 - 8-7 -
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
351 2. Grammar Files in Text Format
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
352 ━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
353 Based on the defined set of Cixing, a set of grammar rules for Chinese is defined in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
354 the grammar file. This grammar file is a database and is read during the startup of
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
355 the cserver.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
356
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
357 The text format grammar files are as follow:
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
358 (1) con.master
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
359 (2) con.masterR
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
360 (3) con.attr
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
361 (4) con.jirattr
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
362 (5) con.jircon
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
363 (6) con.shuutan
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
364 (7) con.shuutanR
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
365 These files may be found under the directory "/cdic" in the cWnn source.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
366
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
367 The binary format grammar file may be created using the "catof" utility (refer to
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
368 Section 6.9). This binary format grammar file will be used by the cserver.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
369 In order to create the binary grammar file, the Cixing text file is also needed in
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
370 addition to the seven text format grammar files listed above.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
371
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
372 When cserver reads in the grammar file, it is able to determine whether it is a
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
373 grammar file by analysing the binary format. Two or more grammar files can be
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
374 managed by the cserver. Different user environments may make use of different
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
375 grammar files. A user is also able to change the grammar file dynamically via the
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
376 operation function (文法变更). Refer to Section 5.2.
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
377
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
378
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
379 3. Assess of Grammar File and Cixing File
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
380 ━━━━━━━━━━━━━━━━━━━━━
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
381 Default path of Cixing text file: /usr/local/lib/wnn/zh_CN/
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
382 The default filename for the text format Cixing file in cWnn is "cixing.data"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
383
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
384 Default path of grammar binary file: /usr/local/lib/wnn/zh_CN/dic/sys/
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
385 The default filename for the binary format grammar file in cWnn is "full.con"
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
386 and "full.conR".
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
387
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
388
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
389
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
390
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
391
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
392
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
393
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
394
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
395
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
396
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
397
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
398
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
399
bbc77ca4def5 initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
400 - 8-8 -