annotate README @ 0:d9b6ff839eab

initial import
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Fri, 30 Nov 2007 19:34:51 +0900
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
1 libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
2 Kawai.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
3
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
4
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
5
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
6
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
7
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
8 int dfa_validate_utf8(const char *buf, int buflen)
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
9
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
10 This function validates given string is utf8 or not.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
11
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
12 buf: string
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
13
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
14 buflen: length of a string to be validated.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
15
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
16 return: 1 if buf is utf8, 0 if not utf8.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
17
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
18
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
19 const char *guess_jp(const char *buf, int buflen)
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
20
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
21 detect character encoding for a given string in Japanese.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
22
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
23 buf: string to be checked.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
24
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
25 buflen: length of a string to be checked.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
26
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
27 return: encoding name which can be feeded to g_convert() or iconv().
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
28
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
29 Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
30
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
31 returned string is constant, so you MUST NOT free.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
32
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
33 If the given string is not ehough long to destinguish, guess_jp takes
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
34 order list into account to determine encoding.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
35
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
36 For instance, the order for Japanese is defined as
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
37
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
38 #define ORDER_JP &utf8, &sjis, &eucj
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
39
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
40 leftmost encoding has highest priority. it will be applied even if
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
41 only two encodings are alive.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
42
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
43 if utf8 and sjis remain, guess_jp will returns utf8.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
44
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
45 if sjis and eucj remain, sjis will be returned.
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
46
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
47 this means if score of each encoding is same,
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
48
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
49
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
50
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
51
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
52 const char *guess_tw(const char *buf, int buflen)
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
53
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
54
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
55 const char *guess_cn(const char *buf, int buflen)
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
56
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
57 const char *guess_kr(const char *buf, int buflen)
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
58
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
59
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
60 Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert()
d9b6ff839eab initial import
Yoshiki Yazawa <yaz@cc.rim.or.jp>
parents:
diff changeset
61 cannot