changeset 8561:2d4ccd94e298

[gaim-migrate @ 9305] " In the irc tooltip, there's a line "Channel:". In 0.75, this seems to have been merged with the "_Channel:" line. In English, this works because underscores in the tooltip are removed before being displayed. However, in Chinese and Japanese, the translation of "_Channel:" looks like "Channel (_C):" and this translated text does not make any sense in the tooltip. The tooltip thus should not use the "_Channel:" string. Otherwise the tooltip output would look very strange in certain locales (at least in Chinese and Japanese)." --Ambrose C. LI who continues: "This second patch should be better. It correctly undoes the space character typically present before the left parenthesis, and added some checks so that it should not corrupt multibyte utf-8 characters. However, this has not been tested a lot. UTF8 handling is also not an area I am familiar with. I don't know whether the C library has existing functions to handle the utf8 things." i'm assuming we have time to test this before 0.77 committer: Tailor Script <tailor@pidgin.im>
author Luke Schierer <lschiere@pidgin.im>
date Fri, 02 Apr 2004 06:18:14 +0000
parents 832fd9b754d0
children e3c059c3d92d
files src/util.c
diffstat 1 files changed, 34 insertions(+), 2 deletions(-) [+]
line wrap: on
line diff
--- a/src/util.c	Fri Apr 02 06:06:45 2004 +0000
+++ b/src/util.c	Fri Apr 02 06:18:14 2004 +0000
@@ -2475,6 +2475,7 @@
 {
 	char *out;
 	char *a;
+	char *a0;
 	const char *b;
 
 	g_return_val_if_fail(in != NULL, NULL);
@@ -2483,16 +2484,47 @@
 	a = out;
 	b = in;
 
+	a0 = a; /* The last non-space char seen so far, or the first char */
+
 	while(*b) {
 		if(*b == '_') {
-			if(*(b+1) == '_') {
+			if(a > out && b > in && *(b-1) == '(' && *(b+1) && !(*(b+1) & 0x80) && *(b+2) == ')') {
+				/* Detected CJK style shortcut (Bug 875311) */
+				a = a0;	/* undo the left parenthesis */
+				b += 3;	/* and skip the whole mess */
+			} else if(*(b+1) == '_') {
 				*(a++) = '_';
 				b += 2;
+				a0 = a;
 			} else {
 				b++;
 			}
+		/* We don't want to corrupt the middle of UTF-8 characters */
+		} else if (!(*b & 0x80)) {	/* other 1-byte char */
+			if (*b != ' ')
+				a0 = a;
+			*(a++) = *(b++);
 		} else {
-			*(a++) = *(b++);
+			/* Multibyte utf8 char, don't look for _ inside these */
+			int n = 0;
+			int i;
+			if ((*b & 0xe0) == 0xc0) {
+				n = 2;
+			} else if ((*b & 0xf0) == 0xe0) {
+				n = 3;
+			} else if ((*b & 0xf8) == 0xf0) {
+				n = 4;
+			} else if ((*b & 0xfc) == 0xf8) {
+				n = 5;
+			} else if ((*b & 0xfe) == 0xfc) {
+				n = 6;
+			} else {		/* Illegal utf8 */
+				n = 1;
+			}
+			a0 = a; /* unless we want to delete CJK spaces too */
+			for (i = 0; i < n && *b; i += 1) {
+				*(a++) = *(b++);
+			}
 		}
 	}
 	*a = '\0';