Mercurial > libguess
comparison README @ 0:d9b6ff839eab
initial import
author | Yoshiki Yazawa <yaz@cc.rim.or.jp> |
---|---|
date | Fri, 30 Nov 2007 19:34:51 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:d9b6ff839eab |
---|---|
1 libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro | |
2 Kawai. | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 int dfa_validate_utf8(const char *buf, int buflen) | |
9 | |
10 This function validates given string is utf8 or not. | |
11 | |
12 buf: string | |
13 | |
14 buflen: length of a string to be validated. | |
15 | |
16 return: 1 if buf is utf8, 0 if not utf8. | |
17 | |
18 | |
19 const char *guess_jp(const char *buf, int buflen) | |
20 | |
21 detect character encoding for a given string in Japanese. | |
22 | |
23 buf: string to be checked. | |
24 | |
25 buflen: length of a string to be checked. | |
26 | |
27 return: encoding name which can be feeded to g_convert() or iconv(). | |
28 | |
29 Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8. | |
30 | |
31 returned string is constant, so you MUST NOT free. | |
32 | |
33 If the given string is not ehough long to destinguish, guess_jp takes | |
34 order list into account to determine encoding. | |
35 | |
36 For instance, the order for Japanese is defined as | |
37 | |
38 #define ORDER_JP &utf8, &sjis, &eucj | |
39 | |
40 leftmost encoding has highest priority. it will be applied even if | |
41 only two encodings are alive. | |
42 | |
43 if utf8 and sjis remain, guess_jp will returns utf8. | |
44 | |
45 if sjis and eucj remain, sjis will be returned. | |
46 | |
47 this means if score of each encoding is same, | |
48 | |
49 | |
50 | |
51 | |
52 const char *guess_tw(const char *buf, int buflen) | |
53 | |
54 | |
55 const char *guess_cn(const char *buf, int buflen) | |
56 | |
57 const char *guess_kr(const char *buf, int buflen) | |
58 | |
59 | |
60 Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert() | |
61 cannot |