Mercurial > libguess
view README @ 6:c61a7765c8f5 default tip
added COPYING to make the licence and copyrights clear.
author | Yoshiki Yazawa <yaz@honeyplanet.jp> |
---|---|
date | Thu, 08 Mar 2012 11:08:07 +0900 |
parents | d9b6ff839eab |
children |
line wrap: on
line source
libguess is derived from Gauche-0.8.3, a scheme interpretor by Shiro Kawai. int dfa_validate_utf8(const char *buf, int buflen) This function validates given string is utf8 or not. buf: string buflen: length of a string to be validated. return: 1 if buf is utf8, 0 if not utf8. const char *guess_jp(const char *buf, int buflen) detect character encoding for a given string in Japanese. buf: string to be checked. buflen: length of a string to be checked. return: encoding name which can be feeded to g_convert() or iconv(). Encoding name is one of folloings: UTF-16, ISO-2022-JP, EUC-JP, SJIS, UTF-8. returned string is constant, so you MUST NOT free. If the given string is not ehough long to destinguish, guess_jp takes order list into account to determine encoding. For instance, the order for Japanese is defined as #define ORDER_JP &utf8, &sjis, &eucj leftmost encoding has highest priority. it will be applied even if only two encodings are alive. if utf8 and sjis remain, guess_jp will returns utf8. if sjis and eucj remain, sjis will be returned. this means if score of each encoding is same, const char *guess_tw(const char *buf, int buflen) const char *guess_cn(const char *buf, int buflen) const char *guess_kr(const char *buf, int buflen) Although gues_xx() can distinguish UCS-2BE and UCS-2LE, g_convert() cannot