view etc/BABYL @ 29005:b396df3a5181

(ONE_MORE_BYTE, TWO_MORE_BYTES): Set coding->resutl to CODING_FINISH_INSUFFICIENT_SRC if there's not enough source. (ONE_MORE_CHAR, EMIT_CHAR, EMIT_ONE_BYTE, EMIT_TWO_BYTE, EMIT_BYTES): New macros. (THREE_MORE_BYTES, DECODE_CHARACTER_ASCII, DECODE_CHARACTER_DIMENSION1, DECODE_CHARACTER_DIMENSION2): These macros deleted. (CHECK_CODE_RANGE_A0_FF): This macro deleted. (detect_coding_emacs_mule): Use UNIBYTE_STR_AS_MULTIBYTE_P to check the validity of multibyte sequence. (decode_coding_emacs_mule): New function. (encode_coding_emacs_mule): New macro. (detect_coding_iso2022): Use ONE_MORE_BYTE to fetch a byte from the source. (DECODE_ISO_CHARACTER): Just return a character code. (DECODE_COMPOSITION_START): Set coding->result instead of result. (decode_coding_iso2022, decode_coding_sjis_big5, decode_eol): Use EMIT_CHAR to produced decoded characters. Exit the loop only by macros ONE_MORE_BYTE or EMIT_CHAR. Don't handle the case of last block here. (ENCODE_ISO_CHARACTER): Don't translate character here. Produce only position codes for an invalid character. (encode_designation_at_bol): Return new destination pointer. 5th arg DSTP is changed to DST. (encode_coding_iso2022, decode_coding_sjis_big5): Get a character from the source by ONE_MORE_CHAR. Don't handle the case of last block here. (DECODE_SJIS_BIG5_CHARACTER, ENCODE_SJIS_BIG5_CHARACTER): These macros deleted. (detect_coding_sjis, detect_coding_big5, detect_coding_utf_8, detect_coding_utf_16, detect_coding_ccl): Use ONE_MORE_BYTE and TWO_MORE_BYTES to fetch a byte from the source. (encode_eol): Pay attention to coding->src_multibyte. (detect_coding, detect_eol): Preserve members src_multibyte and dst_multibyte. (DECODING_BUFFER_MAG): Return 2 even for coding_type_raw_text. (encoding_buffer_size): Set magnification to 3 for all coding systems that require encoding. (ccl_coding_driver): For decoding, be sure that the result is valid multibyte sequence. (decode_coding): Initialize coding->errors and coding->result. For emacs-mule, call decode_coding_emacs_mule. For no-conversion and raw-text, always call decode_eol. Handle the case of last block here. If not coding->dst_multibyte, convert the resulting sequence to unibyte. (encode_coding): Initialize coding->errors and coding->result. For emacs-mule, call encode_coding_emacs_mule. For no-conversion and raw-text, always call encode_eol. Handle the case of last block here. (shrink_decoding_region, shrink_encoding_region): Detect cases that we can't skip data more rigidly. (code_convert_region): Setup src_multibyte and dst_multibyte members of coding. For decoding, if the buffer is multibyte, convert the source sequence to unibyte in advance. For encoding, if the buffer is multibyte, convert the resulting sequence to multibyte afterward. (run_pre_post_conversion_on_str): New function. (code_convert_string): Deleted and divided into the following two. (decode_coding_string, encode_coding_string): New functions. (code_convert_string1, code_convert_string_norecord): Call one of above. (Fdecode_sjis_char, Fdecode_big5_char): Use MAKE_CHAR instead of MAKE_NON_ASCII_CHAR. (Fset_terminal_coding_system_internal, Fset_safe_terminal_coding_system_internal): Setup src_multibyte and dst_multibyte members. (init_coding_once): Initialize iso_code_class with new enum ISO_control_0 and ISO_control_1.
author Kenichi Handa <handa@m17n.org>
date Fri, 19 May 2000 23:54:56 +0000
parents e96ffe544684
children 89895e7b4ac6
line wrap: on
line source

Format of Version 5 Babyl Files:

Warning:

    This was written Tuesday, 12 April 1983 (by Eugene Ciccarelli),
based on looking at a particular Babyl file and recalling various
issues.  Therefore it is not guaranteed to be complete, but it is a
start, and I will try to point the reader to various Babyl functions
that will serve to clarify certain format questions.

    Also note that this file will not contain control-characters,
but instead have two-character sequences starting with Uparrow.
Unless otherwise stated, an Uparrow <character> is to be read as
Control-<character>, e.g. ^L is a Control-L.

Versions:

    First, note that each Babyl file contains in its BABYL OPTIONS
section the version for the Babyl file format.  In principle, the
format can be changed in any way as long as we increment the format
version number; then programs can support both old and new formats.

    In practice, version 5 is the only format version used, and the
previous versions have been obsolete for so long that Emacs does not
support them.


Overall Babyl File Structure:

    A Babyl file consists of a BABYL OPTIONS section followed by
0 or more message sections.  The BABYL OPTIONS section starts
with the line "BABYL OPTIONS:".  Message sections start with
Control-Underscore Control-L Newline.  Each section ends
with a Control-Underscore.  (That is also the first character
of the starter for the next section, if any.)  Thus, a three
message Babyl file looks like:

BABYL OPTIONS:
...the stuff within the Babyl Options section...
^_^L
...the stuff within the 1st message section...
^_^L
...the stuff within the 2nd message section...
^_^L
...the stuff within the last message section...
^_

    Babyl is tolerant about some whitespace at the end of the
file -- the file may end with the final ^_ or it may have some
whitespace, e.g. a newline, after it.


The BABYL OPTIONS Section:

    Each Babyl option is specified on one line (thus restricting
string values these options can currently have).  Values are
either numbers or strings.  The format is name, colon, and the
value, with whitespace after the colon ignored, e.g.:

Mail: ~/special-inbox

    Unrecognized options are ignored.

    Here are those options and the kind of values currently expected:

    MAIL		Filename, the input mail file for this
			Babyl file.  You may also use several file names
			separated by commas.
    Version		Number.  This should always be 5.
    Labels		String, list of labels, separated by commas.


Message Sections:

    A message section contains one message and information
associated with it.  The first line is the "status line", which
contains a bit (0 or 1 character) saying whether the message has
been reformed yet, and a list of the labels attached to this
message.  Certain labels, called basic labels, are built into
Babyl in a fundamental way, and are separated in the status line
for convenience of operation.  For example, consider the status
line:

1, answered,, zval, bug,

    The 1 means this message has been reformed.  This message is
labeled "answered", "zval", and "bug".  The first, "answered", is
a basic label, and the other two are user labels.  The basic
labels come before the double-comma in the line.  Each label is
preceded by ", " and followed by ",".  (The last basic label is
in fact followed by ",,".)  If this message had no labels at all,
it would look like:

1,,

    Or, if it had two basic labels, "answered" and "deleted", it
would look like:

1, answered, deleted,, zval, bug,

    The & Label Babyl Message knows which are the basic labels.
Currently they are:  deleted, unseen, recent, and answered.

    After the status line comes the original header if any.
Following that is the EOOH line, which contains exactly the
characters "*** EOOH ***" (which stands for "end of original
header").  Note that the original header, if a network format
header, includes the trailing newline.  And finally, following the
EOOH line is the visible message, header and text.  For example,
here is a complete message section, starting with the message
starter, and ending with the terminator:

^_^L
1,, wordab, eccmacs,
Date: 11 May 1982 21:40-EDT
From: Eugene C. Ciccarelli <ECC at MIT-AI>
Subject: notes
To: ECC at MIT-AI

*** EOOH ***
Date: Tuesday, 11 May 1982  21:40-EDT
From: Eugene C. Ciccarelli <ECC>
To:   ECC
Re:   notes

Remember to pickup check at cashier's office, and deposit it
soon.  Pay rent.
^_

;;; Babyl File BNF:

;;; Overall Babyl file structure:


Babyl-File	::= Babyl-Options-Section  (Message-Section)*


;;; Babyl Options section:


Babyl-Options-Section
		::= "BABYL OPTIONS:" newline (Babyl-Option)* Terminator

Babyl-Option	::= Option-Name ":" Horiz-Whitespace BOptValue newline

BOptValue	::= Number | 1-Line-String



;;; Message section:


Message-Section	::= Message-Starter  Status-Line  Orig-Header
		    EOOH-Line  Message  Terminator

Message-Starter	::= "^L" newline

Status-Line	::= Bit-Char  ","  (Basic-Label)* "," (User-Label)* newline

Basic-Label	::= Space  BLabel-Name  ","

User-Label	::= Space  ULabel-Name  ","

EOOH-Line	::= "*** EOOH ***" newline

Message		::= Visible-Header  Message-Text


;;; Utilities:

Terminator	::= "^_"

Horiz-Whitespace
		::= (Space | Tab)*

Bit-Char	::= "0" | "1"