Mercurial > emacs
comparison lispref/files.texi @ 80890:6b44d05a5f0b
* elisp.texi (Top): Remove "Saving Properties" from detailed menu.
* files.texi (Format Conversion): Expand intro; add menu.
(Format Conversion Overview, Format Conversion Round-Trip)
(Format Conversion Piecemeal): New nodes/subsections.
* hooks.texi: Xref "Format Conversion" , not "Saving Properties".
* text.texi (Text Properties): Remove "Saving Properties" from menu.
(Saving Properties): Delete node/subsection.
author | Thien-Thi Nguyen <ttn@gnuvola.org> |
---|---|
date | Thu, 10 May 2007 08:43:12 +0000 |
parents | 916f8aa2138d |
children | 776cb0a1bb24 |
comparison
equal
deleted
inserted
replaced
80889:8938fa90afdb | 80890:6b44d05a5f0b |
---|---|
372 @var{filename}. If the buffer is not visiting a file, it uses the | 372 @var{filename}. If the buffer is not visiting a file, it uses the |
373 buffer name instead. | 373 buffer name instead. |
374 @end deffn | 374 @end deffn |
375 | 375 |
376 Saving a buffer runs several hooks. It also performs format | 376 Saving a buffer runs several hooks. It also performs format |
377 conversion (@pxref{Format Conversion}), and may save text properties in | 377 conversion (@pxref{Format Conversion}). |
378 ``annotations'' (@pxref{Saving Properties}). | |
379 | 378 |
380 @defvar write-file-functions | 379 @defvar write-file-functions |
381 The value of this variable is a list of functions to be called before | 380 The value of this variable is a list of functions to be called before |
382 writing out a buffer to its visited file. If one of them returns | 381 writing out a buffer to its visited file. If one of them returns |
383 non-@code{nil}, the file is considered already written and the rest of | 382 non-@code{nil}, the file is considered already written and the rest of |
494 and the length of the data inserted. An error is signaled if | 493 and the length of the data inserted. An error is signaled if |
495 @var{filename} is not the name of a file that can be read. | 494 @var{filename} is not the name of a file that can be read. |
496 | 495 |
497 The function @code{insert-file-contents} checks the file contents | 496 The function @code{insert-file-contents} checks the file contents |
498 against the defined file formats, and converts the file contents if | 497 against the defined file formats, and converts the file contents if |
499 appropriate. @xref{Format Conversion}. It also calls the functions in | 498 appropriate and also calls the functions in |
500 the list @code{after-insert-file-functions}; see @ref{Saving | 499 the list @code{after-insert-file-functions}. @xref{Format Conversion}. |
501 Properties}. Normally, one of the functions in the | 500 Normally, one of the functions in the |
502 @code{after-insert-file-functions} list determines the coding system | 501 @code{after-insert-file-functions} list determines the coding system |
503 (@pxref{Coding Systems}) used for decoding the file's contents, | 502 (@pxref{Coding Systems}) used for decoding the file's contents, |
504 including end-of-line conversion. | 503 including end-of-line conversion. |
505 | 504 |
506 If @var{visit} is non-@code{nil}, this function additionally marks the | 505 If @var{visit} is non-@code{nil}, this function additionally marks the |
618 The optional argument @var{lockname}, if non-@code{nil}, specifies the | 617 The optional argument @var{lockname}, if non-@code{nil}, specifies the |
619 file name to use for purposes of locking and unlocking, overriding | 618 file name to use for purposes of locking and unlocking, overriding |
620 @var{filename} and @var{visit} for that purpose. | 619 @var{filename} and @var{visit} for that purpose. |
621 | 620 |
622 The function @code{write-region} converts the data which it writes to | 621 The function @code{write-region} converts the data which it writes to |
623 the appropriate file formats specified by @code{buffer-file-format}. | 622 the appropriate file formats specified by @code{buffer-file-format} |
624 @xref{Format Conversion}. It also calls the functions in the list | 623 and also calls the functions in the list |
625 @code{write-region-annotate-functions}; see @ref{Saving Properties}. | 624 @code{write-region-annotate-functions}. |
625 @xref{Format Conversion}. | |
626 | 626 |
627 Normally, @code{write-region} displays the message @samp{Wrote | 627 Normally, @code{write-region} displays the message @samp{Wrote |
628 @var{filename}} in the echo area. If @var{visit} is neither @code{t} | 628 @var{filename}} in the echo area. If @var{visit} is neither @code{t} |
629 nor @code{nil} nor a string, then this message is inhibited. This | 629 nor @code{nil} nor a string, then this message is inhibited. This |
630 feature is useful for programs that use files for internal purposes, | 630 feature is useful for programs that use files for internal purposes, |
2800 @section File Format Conversion | 2800 @section File Format Conversion |
2801 | 2801 |
2802 @cindex file format conversion | 2802 @cindex file format conversion |
2803 @cindex encoding file formats | 2803 @cindex encoding file formats |
2804 @cindex decoding file formats | 2804 @cindex decoding file formats |
2805 The variable @code{format-alist} defines a list of @dfn{file formats}, | 2805 @cindex text properties in files |
2806 which describe textual representations used in files for the data (text, | 2806 @cindex saving text properties |
2807 text-properties, and possibly other information) in an Emacs buffer. | 2807 Emacs performs several steps to convert the data in a buffer (text, |
2808 Emacs performs format conversion if appropriate when reading and writing | 2808 text properties, and possibly other information) to and from a |
2809 files. | 2809 representation suitable for storing into a file. This section describes |
2810 the fundamental functions that perform this @dfn{format conversion}, | |
2811 namely @code{insert-file-contents} for reading a file into a buffer, | |
2812 and @code{write-region} for writing a buffer into a file. | |
2813 | |
2814 @menu | |
2815 * Overview: Format Conversion Overview. @code{insert-file-contents} and @code{write-region} | |
2816 * Round-Trip: Format Conversion Round-Trip. Using @code{format-alist}. | |
2817 * Piecemeal: Format Conversion Piecemeal. Specifying non-paired conversion. | |
2818 @end menu | |
2819 | |
2820 @node Format Conversion Overview | |
2821 @subsection Overview | |
2822 @noindent | |
2823 The function @code{insert-file-contents}: | |
2824 | |
2825 @itemize | |
2826 @item initially, inserts bytes from the file into the buffer; | |
2827 @item decodes bytes to characters as appropriate; | |
2828 @item processes formats as defined by entries in @code{format-alist}; and | |
2829 @item calls functions in @code{after-insert-file-functions}. | |
2830 @end itemize | |
2831 | |
2832 @noindent | |
2833 The function @code{write-region}: | |
2834 | |
2835 @itemize | |
2836 @item initially, calls functions in @code{write-region-annotate-functions}; | |
2837 @item processes formats as defined by entries in @code{format-alist}; | |
2838 @item encodes characters to bytes as appropriate; and | |
2839 @item modifies the file with the bytes. | |
2840 @end itemize | |
2841 | |
2842 This shows the symmetry of the lowest-level operations; reading and | |
2843 writing handle things in opposite order. The rest of this section | |
2844 describes the two facilities surrounding the three variables named | |
2845 above, as well as some related functions. @ref{Coding Systems}, for | |
2846 details on character encoding and decoding. | |
2847 | |
2848 @node Format Conversion Round-Trip | |
2849 @subsection Round-Trip Specification | |
2850 | |
2851 The most general of the two facilities is controlled by the variable | |
2852 @code{format-alist}, a list of @dfn{file format} specifications, which | |
2853 describe textual representations used in files for the data in an Emacs | |
2854 buffer. The descriptions for reading and writing are paired, which is | |
2855 why we call this ``round-trip'' specification | |
2856 (@pxref{Format Conversion Piecemeal}, for non-paired specification). | |
2810 | 2857 |
2811 @defvar format-alist | 2858 @defvar format-alist |
2812 This list contains one format definition for each defined file format. | 2859 This list contains one format definition for each defined file format. |
2860 Each format definition is a list of this form: | |
2861 | |
2862 @example | |
2863 (@var{name} @var{doc-string} @var{regexp} @var{from-fn} @var{to-fn} @var{modify} @var{mode-fn}) | |
2864 @end example | |
2813 @end defvar | 2865 @end defvar |
2814 | 2866 |
2815 @cindex format definition | 2867 @cindex format definition |
2816 Each format definition is a list of this form: | 2868 @noindent |
2817 | |
2818 @example | |
2819 (@var{name} @var{doc-string} @var{regexp} @var{from-fn} @var{to-fn} @var{modify} @var{mode-fn}) | |
2820 @end example | |
2821 | |
2822 Here is what the elements in a format definition mean: | 2869 Here is what the elements in a format definition mean: |
2823 | 2870 |
2824 @table @var | 2871 @table @var |
2825 @item name | 2872 @item name |
2826 The name of this format. | 2873 The name of this format. |
2954 is @code{t}, the default, auto-saving uses the same format as a | 3001 is @code{t}, the default, auto-saving uses the same format as a |
2955 regular save in the same buffer. This variable is always buffer-local | 3002 regular save in the same buffer. This variable is always buffer-local |
2956 in all buffers. | 3003 in all buffers. |
2957 @end defvar | 3004 @end defvar |
2958 | 3005 |
3006 @node Format Conversion Piecemeal | |
3007 @subsection Piecemeal Specification | |
3008 | |
3009 In contrast to the round-trip specification described in the previous | |
3010 subsection (@pxref{Format Conversion Round-Trip}), you can use the variables | |
3011 @code{after-insert-file-functions} and @code{write-region-annotate-functions} | |
3012 to separately control the respective reading and writing conversions. | |
3013 | |
3014 Conversion starts with one representation and produces another | |
3015 representation. When there is only one conversion to do, there is no | |
3016 conflict about what to start with. However, when there are multiple | |
3017 conversions involved, conflict may arise when two conversions need to | |
3018 start with the same data. | |
3019 | |
3020 This situation is best understood in the context of converting text | |
3021 properties during @code{write-region}. For example, the character at | |
3022 position 42 in a buffer is @samp{X} with a text property @code{foo}. If | |
3023 the conversion for @code{foo} is done by inserting into the buffer, say, | |
3024 @samp{FOO:}, then that changes the character at position 42 from | |
3025 @samp{X} to @samp{F}. The next conversion will start with the wrong | |
3026 data straight away. | |
3027 | |
3028 To avoid conflict, cooperative conversions do not modify the buffer, | |
3029 but instead specify @dfn{annotations}, a list of elements of the form | |
3030 @code{(@var{position} . @var{string})}, sorted in order of increasing | |
3031 @var{position}. | |
3032 | |
3033 If there is more than one conversion, @code{write-region} merges their | |
3034 annotations destructively into one sorted list. Later, when the text | |
3035 from the buffer is actually written to the file, it intermixes the | |
3036 specified annotations at the corresponding positions. All this takes | |
3037 place without modifying the buffer. | |
3038 | |
3039 @c ??? What about ``overriding'' conversions like those allowed | |
3040 @c ??? for `write-region-annotate-functions', below? --ttn | |
3041 | |
3042 In contrast, when reading, the annotations intermixed with the text | |
3043 are handled immediately. @code{insert-file-contents} sets point to the | |
3044 beginning of some text to be converted, then calls the conversion | |
3045 functions with the length of that text. These functions should always | |
3046 return with point at the beginning of the inserted text. This approach | |
3047 makes sense for reading because annotations removed by the first | |
3048 converter can't be mistakenly processed by a later converter. | |
3049 | |
3050 Each conversion function should scan for the annotations it | |
3051 recognizes, remove the annotation, modify the buffer text (to set a text | |
3052 property, for example), and return the updated length of the text, as it | |
3053 stands after those changes. The value returned by one function becomes | |
3054 the argument to the next function. | |
3055 | |
3056 @defvar write-region-annotate-functions | |
3057 A list of functions for @code{write-region} to call. Each function in | |
3058 the list is called with two arguments: the start and end of the region | |
3059 to be written. These functions should not alter the contents of the | |
3060 buffer. Instead, they should return annotations. | |
3061 | |
3062 @c ??? Following adapted from comment in `build_annotations' (fileio.c). | |
3063 @c ??? Perhaps this is intended for internal use only? | |
3064 @c ??? Someone who understands this, please reword it. --ttn | |
3065 As a special case, if a function returns with a different buffer | |
3066 current, Emacs takes it to mean the current buffer contains altered text | |
3067 to be output, and discards all previous annotations because they should | |
3068 have been dealt with by this function. | |
3069 @end defvar | |
3070 | |
3071 @defvar after-insert-file-functions | |
3072 Each function in this list is called by @code{insert-file-contents} | |
3073 with one argument, the number of characters inserted, and should | |
3074 return the new character count, leaving point the same. | |
3075 @c ??? The docstring mentions a handler from `file-name-handler-alist' | |
3076 @c "intercepting" `insert-file-contents'. Hmmm. --ttn | |
3077 @end defvar | |
3078 | |
3079 We invite users to write Lisp programs to store and retrieve text | |
3080 properties in files, using these hooks, and thus to experiment with | |
3081 various data formats and find good ones. Eventually we hope users | |
3082 will produce good, general extensions we can install in Emacs. | |
3083 | |
3084 We suggest not trying to handle arbitrary Lisp objects as text property | |
3085 names or values---because a program that general is probably difficult | |
3086 to write, and slow. Instead, choose a set of possible data types that | |
3087 are reasonably flexible, and not too hard to encode. | |
3088 | |
2959 @ignore | 3089 @ignore |
2960 arch-tag: 141f74ce-6ae3-40dc-a6c4-ef83fc4ec35c | 3090 arch-tag: 141f74ce-6ae3-40dc-a6c4-ef83fc4ec35c |
2961 @end ignore | 3091 @end ignore |