changeset 89064:2ac5108292b2

*** empty log message ***
author Kenichi Handa <handa@m17n.org>
date Tue, 03 Sep 2002 04:11:28 +0000
parents 2054397a36be
children 62aa2d4f3773
files lisp/ChangeLog src/ChangeLog src/charset.h
diffstat 3 files changed, 115 insertions(+), 1 deletions(-) [+]
line wrap: on
line diff
--- a/lisp/ChangeLog	Tue Sep 03 04:10:19 2002 +0000
+++ b/lisp/ChangeLog	Tue Sep 03 04:11:28 2002 +0000
@@ -1,3 +1,8 @@
+2002-09-03  Kenichi Handa  <handa@etl.go.jp>
+
+	* international/mule-conf.el: Don't define the charset iso-8859-1
+	here, just setup its properties.
+
 2002-08-21  Kenichi Handa  <handa@etl.go.jp>
 
 	* international/mule-conf.el (utf-8): Give :mime-charset property.
--- a/src/ChangeLog	Tue Sep 03 04:10:19 2002 +0000
+++ b/src/ChangeLog	Tue Sep 03 04:11:28 2002 +0000
@@ -1,3 +1,111 @@
+2002-09-03  Kenichi Handa  <handa@etl.go.jp>
+
+	The following changes (and some of 2002-08-20 changes of mine) are
+	for handling syntax, category, and case conversion for unibyte
+	characters by converting them to multibyte on the fly.  With these
+	changes, we don't have to setup syntax and case tables for unibyte
+	characters in each language environment.
+
+	* abbrev.c (Fexpand_abbrev): Convert a unibyte character to
+	multibyte if necessary.
+
+	* bytecode.c (Fbyte_code): Likewise.
+
+	* character.h (LEADING_CODE_LATIN_1_MIN) 
+	(LEADING_CODE_LATIN_1_MAX): New macros.
+	(unibyte_to_multibyte_table): Extern it.
+	(unibyte_char_to_multibyte): New macro.
+	(MAKE_CHAR_MULTIBYTE): Use unibyte_to_multibyte_table.
+	(CHAR_LEADING_CODE): New macro.
+	(FETCH_STRING_CHAR_AS_MULTIBYTE_ADVANCE): New macro.
+
+	* character.c (unibyte_to_multibyte_table): New variable.
+	(unibyte_char_to_multibyte): Move to character.h and defined as
+	macro.
+	(multibyte_char_to_unibyte): If C is an eight-bit character,
+	convert it to the corresponding byte value.
+
+	* charset.c (Fset_unibyte_charset): If the dimension of CHARSET is
+	not 1, singals an error.  Update the elements of
+	unibyte_to_multibyte_table.
+	(init_charset_once): Initialize unibyte_to_multibyte_table.
+	(syms_of_charset): Define the charset `iso-8859-1'.
+
+	* casefiddle.c (casify_object): Fix previous change.
+
+	* cmds.c (internal_self_insert): In a multibyte buffer, insert C
+	as is without converting it to unibyte.  In a unibyte buffer,
+	convert C to multibyte before checking the syntax.
+
+	* lisp.h (unibyte_char_to_multibyte): Extern deleted.
+
+	* minibuf.c (Fminibuffer_complete_word): Use the macro
+	FETCH_STRING_CHAR_AS_MULTIBYTE_ADVANCE.
+
+	* regex.h (struct re_pattern_buffer): New member target_multibyte.
+
+	* regex.c (RE_TARGET_MULTIBYTE_P): New macro.
+	(GET_CHAR_BEFORE_2): Check target_multibyte, not multibyte.  If
+	that is zero, convert an eight-bit char to multibyte.
+	(MAKE_CHAR_MULTIBYTE, CHAR_LEADING_CODE): New dummy new macros for
+	non-emacs case.
+	(PATFETCH): Convert an eight-bit char to multibyte.
+	(HANDLE_UNIBYTE_RANGE): New macro.
+	(regex_compile): Setup the compiled pattern for multibyte chars
+	even if the given regex string is unibyte.  Use PATFETCH_RAW
+	instead of PATFETCH in many places.  To handle `charset'
+	specification of unibyte, call HANDLE_UNIBYTE_RANGE.   Use bitmap
+	only for ASCII chars.
+	(analyse_first) <exactn>: Simplified because the compiled pattern
+	is multibyte.
+	<charset_not>: Setup fastmap from bitmap only for ASCII chars.
+	<charset>: Use CHAR_LEADING_CODE to get leading codes.
+	<categoryspec>: If multibyte, setup fastmap only for ASCII chars
+	here.
+	(re_compile_fastmap) [emacs]: Call analyse_first with the arg
+	multibyte always 1.
+	(re_search_2) In emacs, set the locale variable multibyte to 1,
+	otherwise to 0.  New local variable target_multibyte.  Check it
+	to decide the multibyteness of STR1 and STR2.  If
+	target_multibyte is zero, convert unibyte chars to multibyte
+	before translating and checking fastmap.
+	(TARGET_CHAR_AND_LENGTH): New macro.
+	(re_match_2_internal): In emacs, set the locale variable multibyte
+	to 1, otherwise to 0.  New local variable target_multibyte.  Check
+	it to decide the multibyteness of STR1 and STR2.  Use
+	TARGET_CHAR_AND_LENGTH to fetch a character from D.
+	<charset, charset_not>: If multibyte is nonzero, check fastmap
+	only for ASCII chars.   Call bcmp_translate with
+	target_multibyte, not with multibyte.
+	<begline>: Declare the local variable C as `unsigned'.
+	(bcmp_translate): Change the last arg name to target_multibyte.
+
+	* search.c (compile_pattern_1): Don't adjust the multibyteness of
+	the regexp pattern and the matching target.  Set cp->buf.multibyte
+	to the multibyteness of the regexp pattern.  Set
+	cp->but.target_multibyte to the multibyteness of the matching
+	target.
+	(wordify): Use FETCH_STRING_CHAR_AS_MULTIBYTE_ADVANCE instead of
+	FETCH_STRING_CHAR_ADVANCE.
+	(Freplace_match): Convert unibyte chars to multibyte.
+
+	* syntax.c (char_quoted): Use FETCH_CHAR_AS_MULTIBYTE to convert
+	unibyte chars to multibyte.
+	(back_comment): Likewise.
+	(scan_words): Likewise.
+	(skip_chars): The arg syntaxp is deleted, and the code for
+	handling syntaxes is moved to skip_syntaxes.  Callers changed.
+	Fix the case that the multibyteness of STRING and the current
+	buffer doesn't match.
+	(skip_syntaxes): New function.
+	(SYNTAX_WITH_MULTIBYTE_CHECK): Check C by ASCII_CHAR_P, not by
+	SINGLE_BYTE_CHAR_P.
+	(Fforward_comment): Use FETCH_CHAR_AS_MULTIBYTE to convert unibyte
+	chars to multibyte.
+	(scan_lists): Likewise.
+	(Fbackward_prefix_chars): Likewise.
+	(scan_sexps_forward): Likewise.
+
 2002-08-23  Kenichi Handa  <handa@etl.go.jp>
 
 	* xfaces.c (QCfontset): New variable.
--- a/src/charset.h	Tue Sep 03 04:10:19 2002 +0000
+++ b/src/charset.h	Tue Sep 03 04:11:28 2002 +0000
@@ -511,11 +511,12 @@
 extern Lisp_Object Qascii, Qunicode;
 extern int charset_ascii, charset_eight_bit;
 extern int charset_iso_8859_1;
-extern int charset_unibyte;
 extern int charset_jisx0201_roman;
 extern int charset_jisx0208_1978;
 extern int charset_jisx0208;
 
+extern int charset_unibyte;
+
 extern struct charset *char_charset P_ ((int, Lisp_Object, unsigned *));
 extern Lisp_Object charset_attributes P_ ((int));