Mercurial > libguess
comparison README @ 0:d9b6ff839eab
initial import
| author | Yoshiki Yazawa <yaz@cc.rim.or.jp> |
|---|---|
| date | Fri, 30 Nov 2007 19:34:51 +0900 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:d9b6ff839eab |
|---|---|
| 1 libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro | |
| 2 Kawai. | |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | |
| 8 int dfa_validate_utf8(const char *buf, int buflen) | |
| 9 | |
| 10 This function validates given string is utf8 or not. | |
| 11 | |
| 12 buf: string | |
| 13 | |
| 14 buflen: length of a string to be validated. | |
| 15 | |
| 16 return: 1 if buf is utf8, 0 if not utf8. | |
| 17 | |
| 18 | |
| 19 const char *guess_jp(const char *buf, int buflen) | |
| 20 | |
| 21 detect character encoding for a given string in Japanese. | |
| 22 | |
| 23 buf: string to be checked. | |
| 24 | |
| 25 buflen: length of a string to be checked. | |
| 26 | |
| 27 return: encoding name which can be feeded to g_convert() or iconv(). | |
| 28 | |
| 29 Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8. | |
| 30 | |
| 31 returned string is constant, so you MUST NOT free. | |
| 32 | |
| 33 If the given string is not ehough long to destinguish, guess_jp takes | |
| 34 order list into account to determine encoding. | |
| 35 | |
| 36 For instance, the order for Japanese is defined as | |
| 37 | |
| 38 #define ORDER_JP &utf8, &sjis, &eucj | |
| 39 | |
| 40 leftmost encoding has highest priority. it will be applied even if | |
| 41 only two encodings are alive. | |
| 42 | |
| 43 if utf8 and sjis remain, guess_jp will returns utf8. | |
| 44 | |
| 45 if sjis and eucj remain, sjis will be returned. | |
| 46 | |
| 47 this means if score of each encoding is same, | |
| 48 | |
| 49 | |
| 50 | |
| 51 | |
| 52 const char *guess_tw(const char *buf, int buflen) | |
| 53 | |
| 54 | |
| 55 const char *guess_cn(const char *buf, int buflen) | |
| 56 | |
| 57 const char *guess_kr(const char *buf, int buflen) | |
| 58 | |
| 59 | |
| 60 Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert() | |
| 61 cannot |
