Mercurial > libguess
diff README @ 0:d9b6ff839eab
initial import
author | Yoshiki Yazawa <yaz@cc.rim.or.jp> |
---|---|
date | Fri, 30 Nov 2007 19:34:51 +0900 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README Fri Nov 30 19:34:51 2007 +0900 @@ -0,0 +1,61 @@ +libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro +Kawai. + + + + + +int dfa_validate_utf8(const char *buf, int buflen) + +This function validates given string is utf8 or not. + +buf: string + +buflen: length of a string to be validated. + +return: 1 if buf is utf8, 0 if not utf8. + + +const char *guess_jp(const char *buf, int buflen) + +detect character encoding for a given string in Japanese. + +buf: string to be checked. + +buflen: length of a string to be checked. + +return: encoding name which can be feeded to g_convert() or iconv(). + +Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8. + +returned string is constant, so you MUST NOT free. + +If the given string is not ehough long to destinguish, guess_jp takes +order list into account to determine encoding. + +For instance, the order for Japanese is defined as + +#define ORDER_JP &utf8, &sjis, &eucj + +leftmost encoding has highest priority. it will be applied even if +only two encodings are alive. + +if utf8 and sjis remain, guess_jp will returns utf8. + +if sjis and eucj remain, sjis will be returned. + +this means if score of each encoding is same, + + + + +const char *guess_tw(const char *buf, int buflen) + + +const char *guess_cn(const char *buf, int buflen) + +const char *guess_kr(const char *buf, int buflen) + + +Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert() +cannot