comparison README @ 0:d9b6ff839eab

initial import
author Yoshiki Yazawa <yaz@cc.rim.or.jp>
date Fri, 30 Nov 2007 19:34:51 +0900
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:d9b6ff839eab
1 libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro
2 Kawai.
3
4
5
6
7
8 int dfa_validate_utf8(const char *buf, int buflen)
9
10 This function validates given string is utf8 or not.
11
12 buf: string
13
14 buflen: length of a string to be validated.
15
16 return: 1 if buf is utf8, 0 if not utf8.
17
18
19 const char *guess_jp(const char *buf, int buflen)
20
21 detect character encoding for a given string in Japanese.
22
23 buf: string to be checked.
24
25 buflen: length of a string to be checked.
26
27 return: encoding name which can be feeded to g_convert() or iconv().
28
29 Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8.
30
31 returned string is constant, so you MUST NOT free.
32
33 If the given string is not ehough long to destinguish, guess_jp takes
34 order list into account to determine encoding.
35
36 For instance, the order for Japanese is defined as
37
38 #define ORDER_JP &utf8, &sjis, &eucj
39
40 leftmost encoding has highest priority. it will be applied even if
41 only two encodings are alive.
42
43 if utf8 and sjis remain, guess_jp will returns utf8.
44
45 if sjis and eucj remain, sjis will be returned.
46
47 this means if score of each encoding is same,
48
49
50
51
52 const char *guess_tw(const char *buf, int buflen)
53
54
55 const char *guess_cn(const char *buf, int buflen)
56
57 const char *guess_kr(const char *buf, int buflen)
58
59
60 Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert()
61 cannot