Mercurial > emacs
changeset 73591:b214bd8be620
info/eintr-2: Updated Info file to Third Edition for
`Introduction to Programming in Emacs Lisp'
author | Robert J. Chassell <bob@rattlesnake.com> |
---|---|
date | Tue, 31 Oct 2006 17:00:32 +0000 |
parents | dcc218a536a8 |
children | 01443e9bbbf5 |
files | info/eintr-2 |
diffstat | 1 files changed, 7502 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/info/eintr-2 Tue Oct 31 17:00:32 2006 +0000 @@ -0,0 +1,7502 @@ +This is ../info/eintr, produced by makeinfo version 4.8 from +emacs-lisp-intro.texi. + +INFO-DIR-SECTION Emacs +START-INFO-DIR-ENTRY +* Emacs Lisp Intro: (eintr). + A simple introduction to Emacs Lisp programming. +END-INFO-DIR-ENTRY + +This is an `Introduction to Programming in Emacs Lisp', for people who +are not programmers. + +Edition 3.00, 2006 Oct 31 + +Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1997, 2001, 2002, +2003, 2004, 2005, 2006 Free Software Foundation, Inc. + +Published by the: + + GNU Press, Website: http://www.gnupress.org + a division of the General: press@gnu.org + Free Software Foundation, Inc. Orders: sales@gnu.org + 51 Franklin Street, Fifth Floor Tel: +1 (617) 542-5942 + Boston, MA 02110-1301 USA Fax: +1 (617) 542-2652 + + +ISBN 1-882114-43-4 + +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, Version 1.2 or +any later version published by the Free Software Foundation; there +being no Invariant Section, with the Front-Cover Texts being "A GNU +Manual", and with the Back-Cover Texts as in (a) below. A copy of the +license is included in the section entitled "GNU Free Documentation +License". + +(a) The FSF's Back-Cover Text is: "You have freedom to copy and modify +this GNU Manual, like GNU software. Copies published by the Free +Software Foundation raise funds for GNU development." + + +File: eintr, Node: defvar and asterisk, Prev: See variable current value, Up: defvar + +8.5.1 `defvar' and an asterisk +------------------------------ + +In the past, Emacs used the `defvar' special form both for internal +variables that you would not expect a user to change and for variables +that you do expect a user to change. Although you can still use +`defvar' for user customizable variables, please use `defcustom' +instead, since that special form provides a path into the Customization +commands. (*Note Specifying Variables using `defcustom': defcustom.) + +When you specified a variable using the `defvar' special form, you +could distinguish a readily settable variable from others by typing an +asterisk, `*', in the first column of its documentation string. For +example: + + (defvar shell-command-default-error-buffer nil + "*Buffer name for `shell-command' ... error output. + ... ") + +You could (and still can) use the `set-variable' command to change the +value of `shell-command-default-error-buffer' temporarily. However, +options set using `set-variable' are set only for the duration of your +editing session. The new values are not saved between sessions. Each +time Emacs starts, it reads the original value, unless you change the +value within your `.emacs' file, either by setting it manually or by +using `customize'. *Note Your `.emacs' File: Emacs Initialization. + +For me, the major use of the `set-variable' command is to suggest +variables that I might want to set in my `.emacs' file. There are now +more than 700 such variables -- far too many to remember readily. +Fortunately, you can press <TAB> after calling the `M-x set-variable' +command to see the list of variables. (*Note Examining and Setting +Variables: (emacs)Examining.) + + +File: eintr, Node: cons & search-fwd Review, Next: search Exercises, Prev: defvar, Up: Cutting & Storing Text + +8.6 Review +========== + +Here is a brief summary of some recently introduced functions. + +`car' +`cdr' + `car' returns the first element of a list; `cdr' returns the + second and subsequent elements of a list. + + For example: + + (car '(1 2 3 4 5 6 7)) + => 1 + (cdr '(1 2 3 4 5 6 7)) + => (2 3 4 5 6 7) + +`cons' + `cons' constructs a list by prepending its first argument to its + second argument. + + For example: + + (cons 1 '(2 3 4)) + => (1 2 3 4) + +`nthcdr' + Return the result of taking CDR `n' times on a list. The `rest of + the rest', as it were. + + For example: + + (nthcdr 3 '(1 2 3 4 5 6 7)) + => (4 5 6 7) + +`setcar' +`setcdr' + `setcar' changes the first element of a list; `setcdr' changes the + second and subsequent elements of a list. + + For example: + + (setq triple '(1 2 3)) + + (setcar triple '37) + + triple + => (37 2 3) + + (setcdr triple '("foo" "bar")) + + triple + => (37 "foo" "bar") + +`progn' + Evaluate each argument in sequence and then return the value of the + last. + + For example: + + (progn 1 2 3 4) + => 4 + +`save-restriction' + Record whatever narrowing is in effect in the current buffer, if + any, and restore that narrowing after evaluating the arguments. + +`search-forward' + Search for a string, and if the string is found, move point. + + Takes four arguments: + + 1. The string to search for. + + 2. Optionally, the limit of the search. + + 3. Optionally, what to do if the search fails, return `nil' or an + error message. + + 4. Optionally, how many times to repeat the search; if negative, + the search goes backwards. + +`kill-region' +`delete-and-extract-region' +`copy-region-as-kill' + `kill-region' cuts the text between point and mark from the buffer + and stores that text in the kill ring, so you can get it back by + yanking. + + `copy-region-as-kill' copies the text between point and mark into + the kill ring, from which you can get it by yanking. The function + does not cut or remove the text from the buffer. + +`delete-and-extract-region' removes the text between point and mark +from the buffer and throws it away. You cannot get it back. (This is +not an interactive command.) + + +File: eintr, Node: search Exercises, Prev: cons & search-fwd Review, Up: Cutting & Storing Text + +8.7 Searching Exercises +======================= + + * Write an interactive function that searches for a string. If the + search finds the string, leave point after it and display a message + that says "Found!". (Do not use `search-forward' for the name of + this function; if you do, you will overwrite the existing version + of `search-forward' that comes with Emacs. Use a name such as + `test-search' instead.) + + * Write a function that prints the third element of the kill ring in + the echo area, if any; if the kill ring does not contain a third + element, print an appropriate message. + + +File: eintr, Node: List Implementation, Next: Yanking, Prev: Cutting & Storing Text, Up: Top + +9 How Lists are Implemented +*************************** + +In Lisp, atoms are recorded in a straightforward fashion; if the +implementation is not straightforward in practice, it is, nonetheless, +straightforward in theory. The atom `rose', for example, is recorded +as the four contiguous letters `r', `o', `s', `e'. A list, on the +other hand, is kept differently. The mechanism is equally simple, but +it takes a moment to get used to the idea. A list is kept using a +series of pairs of pointers. In the series, the first pointer in each +pair points to an atom or to another list, and the second pointer in +each pair points to the next pair, or to the symbol `nil', which marks +the end of the list. + +A pointer itself is quite simply the electronic address of what is +pointed to. Hence, a list is kept as a series of electronic addresses. + +* Menu: + +* Lists diagrammed:: +* Symbols as Chest:: +* List Exercise:: + + +File: eintr, Node: Lists diagrammed, Next: Symbols as Chest, Prev: List Implementation, Up: List Implementation + +Lists diagrammed +================ + +For example, the list `(rose violet buttercup)' has three elements, +`rose', `violet', and `buttercup'. In the computer, the electronic +address of `rose' is recorded in a segment of computer memory along +with the address that gives the electronic address of where the atom +`violet' is located; and that address (the one that tells where +`violet' is located) is kept along with an address that tells where the +address for the atom `buttercup' is located. + +This sounds more complicated than it is and is easier seen in a diagram: + + ___ ___ ___ ___ ___ ___ + |___|___|--> |___|___|--> |___|___|--> nil + | | | + | | | + --> rose --> violet --> buttercup + + + +In the diagram, each box represents a word of computer memory that +holds a Lisp object, usually in the form of a memory address. The +boxes, i.e. the addresses, are in pairs. Each arrow points to what the +address is the address of, either an atom or another pair of addresses. +The first box is the electronic address of `rose' and the arrow points +to `rose'; the second box is the address of the next pair of boxes, the +first part of which is the address of `violet' and the second part of +which is the address of the next pair. The very last box points to the +symbol `nil', which marks the end of the list. + +When a variable is set to a list with a function such as `setq', it +stores the address of the first box in the variable. Thus, evaluation +of the expression + + (setq bouquet '(rose violet buttercup)) + +creates a situation like this: + + bouquet + | + | ___ ___ ___ ___ ___ ___ + --> |___|___|--> |___|___|--> |___|___|--> nil + | | | + | | | + --> rose --> violet --> buttercup + + + +In this example, the symbol `bouquet' holds the address of the first +pair of boxes. + +This same list can be illustrated in a different sort of box notation +like this: + + bouquet + | + | -------------- --------------- ---------------- + | | car | cdr | | car | cdr | | car | cdr | + -->| rose | o------->| violet | o------->| butter- | nil | + | | | | | | | cup | | + -------------- --------------- ---------------- + + + +(Symbols consist of more than pairs of addresses, but the structure of +a symbol is made up of addresses. Indeed, the symbol `bouquet' +consists of a group of address-boxes, one of which is the address of +the printed word `bouquet', a second of which is the address of a +function definition attached to the symbol, if any, a third of which is +the address of the first pair of address-boxes for the list `(rose +violet buttercup)', and so on. Here we are showing that the symbol's +third address-box points to the first pair of address-boxes for the +list.) + +If a symbol is set to the CDR of a list, the list itself is not +changed; the symbol simply has an address further down the list. (In +the jargon, CAR and CDR are `non-destructive'.) Thus, evaluation of +the following expression + + (setq flowers (cdr bouquet)) + +produces this: + + + bouquet flowers + | | + | ___ ___ | ___ ___ ___ ___ + --> | | | --> | | | | | | + |___|___|----> |___|___|--> |___|___|--> nil + | | | + | | | + --> rose --> violet --> buttercup + + + + +The value of `flowers' is `(violet buttercup)', which is to say, the +symbol `flowers' holds the address of the pair of address-boxes, the +first of which holds the address of `violet', and the second of which +holds the address of `buttercup'. + +A pair of address-boxes is called a "cons cell" or "dotted pair". +*Note Cons Cell and List Types: (elisp)Cons Cell Type, and *Note Dotted +Pair Notation: (elisp)Dotted Pair Notation, for more information about +cons cells and dotted pairs. + +The function `cons' adds a new pair of addresses to the front of a +series of addresses like that shown above. For example, evaluating the +expression + + (setq bouquet (cons 'lily bouquet)) + +produces: + + + bouquet flowers + | | + | ___ ___ ___ ___ | ___ ___ ___ ___ + --> | | | | | | --> | | | | | | + |___|___|----> |___|___|----> |___|___|---->|___|___|--> nil + | | | | + | | | | + --> lily --> rose --> violet --> buttercup + + + + +However, this does not change the value of the symbol `flowers', as you +can see by evaluating the following, + + (eq (cdr (cdr bouquet)) flowers) + +which returns `t' for true. + +Until it is reset, `flowers' still has the value `(violet buttercup)'; +that is, it has the address of the cons cell whose first address is of +`violet'. Also, this does not alter any of the pre-existing cons +cells; they are all still there. + +Thus, in Lisp, to get the CDR of a list, you just get the address of +the next cons cell in the series; to get the CAR of a list, you get the +address of the first element of the list; to `cons' a new element on a +list, you add a new cons cell to the front of the list. That is all +there is to it! The underlying structure of Lisp is brilliantly simple! + +And what does the last address in a series of cons cells refer to? It +is the address of the empty list, of `nil'. + +In summary, when a Lisp variable is set to a value, it is provided with +the address of the list to which the variable refers. + + +File: eintr, Node: Symbols as Chest, Next: List Exercise, Prev: Lists diagrammed, Up: List Implementation + +9.1 Symbols as a Chest of Drawers +================================= + +In an earlier section, I suggested that you might imagine a symbol as +being a chest of drawers. The function definition is put in one +drawer, the value in another, and so on. What is put in the drawer +holding the value can be changed without affecting the contents of the +drawer holding the function definition, and vice-verse. + +Actually, what is put in each drawer is the address of the value or +function definition. It is as if you found an old chest in the attic, +and in one of its drawers you found a map giving you directions to +where the buried treasure lies. + +(In addition to its name, symbol definition, and variable value, a +symbol has a `drawer' for a "property list" which can be used to record +other information. Property lists are not discussed here; see *Note +Property Lists: (elisp)Property Lists.) + +Here is a fanciful representation: + + + Chest of Drawers Contents of Drawers + + __ o0O0o __ + / \ + --------------------- + | directions to | [map to] + | symbol name | bouquet + | | + +---------------------+ + | directions to | + | symbol definition | [none] + | | + +---------------------+ + | directions to | [map to] + | variable value | (rose violet buttercup) + | | + +---------------------+ + | directions to | + | property list | [not described here] + | | + +---------------------+ + |/ \| + + + + + +File: eintr, Node: List Exercise, Prev: Symbols as Chest, Up: List Implementation + +9.2 Exercise +============ + +Set `flowers' to `violet' and `buttercup'. Cons two more flowers on to +this list and set this new list to `more-flowers'. Set the CAR of +`flowers' to a fish. What does the `more-flowers' list now contain? + + +File: eintr, Node: Yanking, Next: Loops & Recursion, Prev: List Implementation, Up: Top + +10 Yanking Text Back +******************** + +Whenever you cut text out of a buffer with a `kill' command in GNU +Emacs, you can bring it back with a `yank' command. The text that is +cut out of the buffer is put in the kill ring and the yank commands +insert the appropriate contents of the kill ring back into a buffer +(not necessarily the original buffer). + +A simple `C-y' (`yank') command inserts the first item from the kill +ring into the current buffer. If the `C-y' command is followed +immediately by `M-y', the first element is replaced by the second +element. Successive `M-y' commands replace the second element with the +third, fourth, or fifth element, and so on. When the last element in +the kill ring is reached, it is replaced by the first element and the +cycle is repeated. (Thus the kill ring is called a `ring' rather than +just a `list'. However, the actual data structure that holds the text +is a list. *Note Handling the Kill Ring: Kill Ring, for the details of +how the list is handled as a ring.) + +* Menu: + +* Kill Ring Overview:: +* kill-ring-yank-pointer:: +* yank nthcdr Exercises:: + + +File: eintr, Node: Kill Ring Overview, Next: kill-ring-yank-pointer, Prev: Yanking, Up: Yanking + +10.1 Kill Ring Overview +======================= + +The kill ring is a list of textual strings. This is what it looks like: + + ("some text" "a different piece of text" "yet more text") + +If this were the contents of my kill ring and I pressed `C-y', the +string of characters saying `some text' would be inserted in this +buffer where my cursor is located. + +The `yank' command is also used for duplicating text by copying it. +The copied text is not cut from the buffer, but a copy of it is put on +the kill ring and is inserted by yanking it back. + +Three functions are used for bringing text back from the kill ring: +`yank', which is usually bound to `C-y'; `yank-pop', which is usually +bound to `M-y'; and `rotate-yank-pointer', which is used by the two +other functions. + +These functions refer to the kill ring through a variable called the +`kill-ring-yank-pointer'. Indeed, the insertion code for both the +`yank' and `yank-pop' functions is: + + (insert (car kill-ring-yank-pointer)) + +(Well, no more. In GNU Emacs 22, the function has been replaced by +`insert-for-yank' which calls `insert-for-yank-1' repetitively for each +`yank-handler' segment. In turn, `insert-for-yank-1' strips text +properties from the inserted text according to +`yank-excluded-properties'. Otherwise, it is just like `insert'. We +will stick with plain `insert' since it is easier to understand.) + +To begin to understand how `yank' and `yank-pop' work, it is first +necessary to look at the `kill-ring-yank-pointer' variable and the +`rotate-yank-pointer' function. + + +File: eintr, Node: kill-ring-yank-pointer, Next: yank nthcdr Exercises, Prev: Kill Ring Overview, Up: Yanking + +10.2 The `kill-ring-yank-pointer' Variable +========================================== + +`kill-ring-yank-pointer' is a variable, just as `kill-ring' is a +variable. It points to something by being bound to the value of what +it points to, like any other Lisp variable. + +Thus, if the value of the kill ring is: + + ("some text" "a different piece of text" "yet more text") + +and the `kill-ring-yank-pointer' points to the second clause, the value +of `kill-ring-yank-pointer' is: + + ("a different piece of text" "yet more text") + +As explained in the previous chapter (*note List Implementation::), the +computer does not keep two different copies of the text being pointed to +by both the `kill-ring' and the `kill-ring-yank-pointer'. The words "a +different piece of text" and "yet more text" are not duplicated. +Instead, the two Lisp variables point to the same pieces of text. Here +is a diagram: + + kill-ring kill-ring-yank-pointer + | | + | ___ ___ | ___ ___ ___ ___ + ---> | | | --> | | | | | | + |___|___|----> |___|___|--> |___|___|--> nil + | | | + | | | + | | --> "yet more text" + | | + | --> "a different piece of text + | + --> "some text" + + + + +Both the variable `kill-ring' and the variable `kill-ring-yank-pointer' +are pointers. But the kill ring itself is usually described as if it +were actually what it is composed of. The `kill-ring' is spoken of as +if it were the list rather than that it points to the list. +Conversely, the `kill-ring-yank-pointer' is spoken of as pointing to a +list. + +These two ways of talking about the same thing sound confusing at first +but make sense on reflection. The kill ring is generally thought of as +the complete structure of data that holds the information of what has +recently been cut out of the Emacs buffers. The +`kill-ring-yank-pointer' on the other hand, serves to indicate--that +is, to `point to'--that part of the kill ring of which the first +element (the CAR) will be inserted. + + +File: eintr, Node: yank nthcdr Exercises, Prev: kill-ring-yank-pointer, Up: Yanking + +10.3 Exercises with `yank' and `nthcdr' +======================================= + + * Using `C-h v' (`describe-variable'), look at the value of your + kill ring. Add several items to your kill ring; look at its value + again. Using `M-y' (`yank-pop)', move all the way around the kill + ring. How many items were in your kill ring? Find the value of + `kill-ring-max'. Was your kill ring full, or could you have kept + more blocks of text within it? + + * Using `nthcdr' and `car', construct a series of expressions to + return the first, second, third, and fourth elements of a list. + + +File: eintr, Node: Loops & Recursion, Next: Regexp Search, Prev: Yanking, Up: Top + +11 Loops and Recursion +********************** + +Emacs Lisp has two primary ways to cause an expression, or a series of +expressions, to be evaluated repeatedly: one uses a `while' loop, and +the other uses "recursion". + +Repetition can be very valuable. For example, to move forward four +sentences, you need only write a program that will move forward one +sentence and then repeat the process four times. Since a computer does +not get bored or tired, such repetitive action does not have the +deleterious effects that excessive or the wrong kinds of repetition can +have on humans. + +People mostly write Emacs Lisp functions using `while' loops and their +kin; but you can use recursion, which provides a very powerful way to +think about and then to solve problems(1). + +* Menu: + +* while:: +* dolist dotimes:: +* Recursion:: +* Looping exercise:: + +---------- Footnotes ---------- + +(1) You can write recursive functions to be frugal or wasteful of +mental or computer resources; as it happens, methods that people find +easy--that are frugal of `mental resources'--sometimes use considerable +computer resources. Emacs was designed to run on machines that we now +consider limited and its default settings are conservative. You may +want to increase the values of `max-specpdl-size' and +`max-lisp-eval-depth'. In my `.emacs' file, I set them to 15 and 30 +times their default value. + + +File: eintr, Node: while, Next: dolist dotimes, Prev: Loops & Recursion, Up: Loops & Recursion + +11.1 `while' +============ + +The `while' special form tests whether the value returned by evaluating +its first argument is true or false. This is similar to what the Lisp +interpreter does with an `if'; what the interpreter does next, however, +is different. + +In a `while' expression, if the value returned by evaluating the first +argument is false, the Lisp interpreter skips the rest of the +expression (the "body" of the expression) and does not evaluate it. +However, if the value is true, the Lisp interpreter evaluates the body +of the expression and then again tests whether the first argument to +`while' is true or false. If the value returned by evaluating the +first argument is again true, the Lisp interpreter again evaluates the +body of the expression. + +The template for a `while' expression looks like this: + + (while TRUE-OR-FALSE-TEST + BODY...) + +* Menu: + +* Looping with while:: +* Loop Example:: +* print-elements-of-list:: +* Incrementing Loop:: +* Decrementing Loop:: + + +File: eintr, Node: Looping with while, Next: Loop Example, Prev: while, Up: while + +Looping with `while' +-------------------- + +So long as the true-or-false-test of the `while' expression returns a +true value when it is evaluated, the body is repeatedly evaluated. +This process is called a loop since the Lisp interpreter repeats the +same thing again and again, like an airplane doing a loop. When the +result of evaluating the true-or-false-test is false, the Lisp +interpreter does not evaluate the rest of the `while' expression and +`exits the loop'. + +Clearly, if the value returned by evaluating the first argument to +`while' is always true, the body following will be evaluated again and +again ... and again ... forever. Conversely, if the value returned is +never true, the expressions in the body will never be evaluated. The +craft of writing a `while' loop consists of choosing a mechanism such +that the true-or-false-test returns true just the number of times that +you want the subsequent expressions to be evaluated, and then have the +test return false. + +The value returned by evaluating a `while' is the value of the +true-or-false-test. An interesting consequence of this is that a +`while' loop that evaluates without error will return `nil' or false +regardless of whether it has looped 1 or 100 times or none at all. A +`while' expression that evaluates successfully never returns a true +value! What this means is that `while' is always evaluated for its +side effects, which is to say, the consequences of evaluating the +expressions within the body of the `while' loop. This makes sense. It +is not the mere act of looping that is desired, but the consequences of +what happens when the expressions in the loop are repeatedly evaluated. + + +File: eintr, Node: Loop Example, Next: print-elements-of-list, Prev: Looping with while, Up: while + +11.1.1 A `while' Loop and a List +-------------------------------- + +A common way to control a `while' loop is to test whether a list has +any elements. If it does, the loop is repeated; but if it does not, +the repetition is ended. Since this is an important technique, we will +create a short example to illustrate it. + +A simple way to test whether a list has elements is to evaluate the +list: if it has no elements, it is an empty list and will return the +empty list, `()', which is a synonym for `nil' or false. On the other +hand, a list with elements will return those elements when it is +evaluated. Since Emacs Lisp considers as true any value that is not +`nil', a list that returns elements will test true in a `while' loop. + +For example, you can set the variable `empty-list' to `nil' by +evaluating the following `setq' expression: + + (setq empty-list ()) + +After evaluating the `setq' expression, you can evaluate the variable +`empty-list' in the usual way, by placing the cursor after the symbol +and typing `C-x C-e'; `nil' will appear in your echo area: + + empty-list + +On the other hand, if you set a variable to be a list with elements, the +list will appear when you evaluate the variable, as you can see by +evaluating the following two expressions: + + (setq animals '(gazelle giraffe lion tiger)) + + animals + +Thus, to create a `while' loop that tests whether there are any items +in the list `animals', the first part of the loop will be written like +this: + + (while animals + ... + +When the `while' tests its first argument, the variable `animals' is +evaluated. It returns a list. So long as the list has elements, the +`while' considers the results of the test to be true; but when the list +is empty, it considers the results of the test to be false. + +To prevent the `while' loop from running forever, some mechanism needs +to be provided to empty the list eventually. An oft-used technique is +to have one of the subsequent forms in the `while' expression set the +value of the list to be the CDR of the list. Each time the `cdr' +function is evaluated, the list will be made shorter, until eventually +only the empty list will be left. At this point, the test of the +`while' loop will return false, and the arguments to the `while' will +no longer be evaluated. + +For example, the list of animals bound to the variable `animals' can be +set to be the CDR of the original list with the following expression: + + (setq animals (cdr animals)) + +If you have evaluated the previous expressions and then evaluate this +expression, you will see `(giraffe lion tiger)' appear in the echo +area. If you evaluate the expression again, `(lion tiger)' will appear +in the echo area. If you evaluate it again and yet again, `(tiger)' +appears and then the empty list, shown by `nil'. + +A template for a `while' loop that uses the `cdr' function repeatedly +to cause the true-or-false-test eventually to test false looks like +this: + + (while TEST-WHETHER-LIST-IS-EMPTY + BODY... + SET-LIST-TO-CDR-OF-LIST) + +This test and use of `cdr' can be put together in a function that goes +through a list and prints each element of the list on a line of its own. + + +File: eintr, Node: print-elements-of-list, Next: Incrementing Loop, Prev: Loop Example, Up: while + +11.1.2 An Example: `print-elements-of-list' +------------------------------------------- + +The `print-elements-of-list' function illustrates a `while' loop with a +list. + +The function requires several lines for its output. If you are reading +this in a recent instance of GNU Emacs, you can evaluate the following +expression inside of Info, as usual. + +If you are using an earlier version of Emacs, you need to copy the +necessary expressions to your `*scratch*' buffer and evaluate them +there. This is because the echo area had only one line in the earlier +versions. + +You can copy the expressions by marking the beginning of the region +with `C-<SPC>' (`set-mark-command'), moving the cursor to the end of +the region and then copying the region using `M-w' (`kill-ring-save', +which calls `copy-region-as-kill' and then provides visual feedback). +In the `*scratch*' buffer, you can yank the expressions back by typing +`C-y' (`yank'). + +After you have copied the expressions to the `*scratch*' buffer, +evaluate each expression in turn. Be sure to evaluate the last +expression, `(print-elements-of-list animals)', by typing `C-u C-x +C-e', that is, by giving an argument to `eval-last-sexp'. This will +cause the result of the evaluation to be printed in the `*scratch*' +buffer instead of being printed in the echo area. (Otherwise you will +see something like this in your echo area: +`^Jgazelle^J^Jgiraffe^J^Jlion^J^Jtiger^Jnil', in which each `^J' stands +for a `newline'.) + +In a recent instance of GNU Emacs, you can evaluate these expressions +directly in the Info buffer, and the echo area will grow to show the +results. + + (setq animals '(gazelle giraffe lion tiger)) + + (defun print-elements-of-list (list) + "Print each element of LIST on a line of its own." + (while list + (print (car list)) + (setq list (cdr list)))) + + (print-elements-of-list animals) + +When you evaluate the three expressions in sequence, you will see this: + + gazelle + + giraffe + + lion + + tiger + nil + +Each element of the list is printed on a line of its own (that is what +the function `print' does) and then the value returned by the function +is printed. Since the last expression in the function is the `while' +loop, and since `while' loops always return `nil', a `nil' is printed +after the last element of the list. + + +File: eintr, Node: Incrementing Loop, Next: Decrementing Loop, Prev: print-elements-of-list, Up: while + +11.1.3 A Loop with an Incrementing Counter +------------------------------------------ + +A loop is not useful unless it stops when it ought. Besides +controlling a loop with a list, a common way of stopping a loop is to +write the first argument as a test that returns false when the correct +number of repetitions are complete. This means that the loop must have +a counter--an expression that counts how many times the loop repeats +itself. + +The test can be an expression such as `(< count desired-number)' which +returns `t' for true if the value of `count' is less than the +`desired-number' of repetitions and `nil' for false if the value of +`count' is equal to or is greater than the `desired-number'. The +expression that increments the count can be a simple `setq' such as +`(setq count (1+ count))', where `1+' is a built-in function in Emacs +Lisp that adds 1 to its argument. (The expression `(1+ count)' has the +same result as `(+ count 1)', but is easier for a human to read.) + +The template for a `while' loop controlled by an incrementing counter +looks like this: + + SET-COUNT-TO-INITIAL-VALUE + (while (< count desired-number) ; true-or-false-test + BODY... + (setq count (1+ count))) ; incrementer + +Note that you need to set the initial value of `count'; usually it is +set to 1. + +* Menu: + +* Incrementing Example:: +* Inc Example parts:: +* Inc Example altogether:: + + +File: eintr, Node: Incrementing Example, Next: Inc Example parts, Prev: Incrementing Loop, Up: Incrementing Loop + +Example with incrementing counter +................................. + +Suppose you are playing on the beach and decide to make a triangle of +pebbles, putting one pebble in the first row, two in the second row, +three in the third row and so on, like this: + + + * + * * + * * * + * * * * + + +(About 2500 years ago, Pythagoras and others developed the beginnings of +number theory by considering questions such as this.) + +Suppose you want to know how many pebbles you will need to make a +triangle with 7 rows? + +Clearly, what you need to do is add up the numbers from 1 to 7. There +are two ways to do this; start with the smallest number, one, and add up +the list in sequence, 1, 2, 3, 4 and so on; or start with the largest +number and add the list going down: 7, 6, 5, 4 and so on. Because both +mechanisms illustrate common ways of writing `while' loops, we will +create two examples, one counting up and the other counting down. In +this first example, we will start with 1 and add 2, 3, 4 and so on. + +If you are just adding up a short list of numbers, the easiest way to do +it is to add up all the numbers at once. However, if you do not know +ahead of time how many numbers your list will have, or if you want to be +prepared for a very long list, then you need to design your addition so +that what you do is repeat a simple process many times instead of doing +a more complex process once. + +For example, instead of adding up all the pebbles all at once, what you +can do is add the number of pebbles in the first row, 1, to the number +in the second row, 2, and then add the total of those two rows to the +third row, 3. Then you can add the number in the fourth row, 4, to the +total of the first three rows; and so on. + +The critical characteristic of the process is that each repetitive +action is simple. In this case, at each step we add only two numbers, +the number of pebbles in the row and the total already found. This +process of adding two numbers is repeated again and again until the last +row has been added to the total of all the preceding rows. In a more +complex loop the repetitive action might not be so simple, but it will +be simpler than doing everything all at once. + + +File: eintr, Node: Inc Example parts, Next: Inc Example altogether, Prev: Incrementing Example, Up: Incrementing Loop + +The parts of the function definition +.................................... + +The preceding analysis gives us the bones of our function definition: +first, we will need a variable that we can call `total' that will be +the total number of pebbles. This will be the value returned by the +function. + +Second, we know that the function will require an argument: this +argument will be the total number of rows in the triangle. It can be +called `number-of-rows'. + +Finally, we need a variable to use as a counter. We could call this +variable `counter', but a better name is `row-number'. That is because +what the counter does in this function is count rows, and a program +should be written to be as understandable as possible. + +When the Lisp interpreter first starts evaluating the expressions in the +function, the value of `total' should be set to zero, since we have not +added anything to it. Then the function should add the number of +pebbles in the first row to the total, and then add the number of +pebbles in the second to the total, and then add the number of pebbles +in the third row to the total, and so on, until there are no more rows +left to add. + +Both `total' and `row-number' are used only inside the function, so +they can be declared as local variables with `let' and given initial +values. Clearly, the initial value for `total' should be 0. The +initial value of `row-number' should be 1, since we start with the +first row. This means that the `let' statement will look like this: + + (let ((total 0) + (row-number 1)) + BODY...) + +After the internal variables are declared and bound to their initial +values, we can begin the `while' loop. The expression that serves as +the test should return a value of `t' for true so long as the +`row-number' is less than or equal to the `number-of-rows'. (If the +expression tests true only so long as the row number is less than the +number of rows in the triangle, the last row will never be added to the +total; hence the row number has to be either less than or equal to the +number of rows.) + +Lisp provides the `<=' function that returns true if the value of its +first argument is less than or equal to the value of its second +argument and false otherwise. So the expression that the `while' will +evaluate as its test should look like this: + + (<= row-number number-of-rows) + +The total number of pebbles can be found by repeatedly adding the number +of pebbles in a row to the total already found. Since the number of +pebbles in the row is equal to the row number, the total can be found by +adding the row number to the total. (Clearly, in a more complex +situation, the number of pebbles in the row might be related to the row +number in a more complicated way; if this were the case, the row number +would be replaced by the appropriate expression.) + + (setq total (+ total row-number)) + +What this does is set the new value of `total' to be equal to the sum +of adding the number of pebbles in the row to the previous total. + +After setting the value of `total', the conditions need to be +established for the next repetition of the loop, if there is one. This +is done by incrementing the value of the `row-number' variable, which +serves as a counter. After the `row-number' variable has been +incremented, the true-or-false-test at the beginning of the `while' +loop tests whether its value is still less than or equal to the value +of the `number-of-rows' and if it is, adds the new value of the +`row-number' variable to the `total' of the previous repetition of the +loop. + +The built-in Emacs Lisp function `1+' adds 1 to a number, so the +`row-number' variable can be incremented with this expression: + + (setq row-number (1+ row-number)) + + +File: eintr, Node: Inc Example altogether, Prev: Inc Example parts, Up: Incrementing Loop + +Putting the function definition together +........................................ + +We have created the parts for the function definition; now we need to +put them together. + +First, the contents of the `while' expression: + + (while (<= row-number number-of-rows) ; true-or-false-test + (setq total (+ total row-number)) + (setq row-number (1+ row-number))) ; incrementer + +Along with the `let' expression varlist, this very nearly completes the +body of the function definition. However, it requires one final +element, the need for which is somewhat subtle. + +The final touch is to place the variable `total' on a line by itself +after the `while' expression. Otherwise, the value returned by the +whole function is the value of the last expression that is evaluated in +the body of the `let', and this is the value returned by the `while', +which is always `nil'. + +This may not be evident at first sight. It almost looks as if the +incrementing expression is the last expression of the whole function. +But that expression is part of the body of the `while'; it is the last +element of the list that starts with the symbol `while'. Moreover, the +whole of the `while' loop is a list within the body of the `let'. + +In outline, the function will look like this: + + (defun NAME-OF-FUNCTION (ARGUMENT-LIST) + "DOCUMENTATION..." + (let (VARLIST) + (while (TRUE-OR-FALSE-TEST) + BODY-OF-WHILE... ) + ... )) ; Need final expression here. + +The result of evaluating the `let' is what is going to be returned by +the `defun' since the `let' is not embedded within any containing list, +except for the `defun' as a whole. However, if the `while' is the last +element of the `let' expression, the function will always return `nil'. +This is not what we want! Instead, what we want is the value of the +variable `total'. This is returned by simply placing the symbol as the +last element of the list starting with `let'. It gets evaluated after +the preceding elements of the list are evaluated, which means it gets +evaluated after it has been assigned the correct value for the total. + +It may be easier to see this by printing the list starting with `let' +all on one line. This format makes it evident that the VARLIST and +`while' expressions are the second and third elements of the list +starting with `let', and the `total' is the last element: + + (let (VARLIST) (while (TRUE-OR-FALSE-TEST) BODY-OF-WHILE... ) total) + +Putting everything together, the `triangle' function definition looks +like this: + + (defun triangle (number-of-rows) ; Version with + ; incrementing counter. + "Add up the number of pebbles in a triangle. + The first row has one pebble, the second row two pebbles, + the third row three pebbles, and so on. + The argument is NUMBER-OF-ROWS." + (let ((total 0) + (row-number 1)) + (while (<= row-number number-of-rows) + (setq total (+ total row-number)) + (setq row-number (1+ row-number))) + total)) + +After you have installed `triangle' by evaluating the function, you can +try it out. Here are two examples: + + (triangle 4) + + (triangle 7) + +The sum of the first four numbers is 10 and the sum of the first seven +numbers is 28. + + +File: eintr, Node: Decrementing Loop, Prev: Incrementing Loop, Up: while + +11.1.4 Loop with a Decrementing Counter +--------------------------------------- + +Another common way to write a `while' loop is to write the test so that +it determines whether a counter is greater than zero. So long as the +counter is greater than zero, the loop is repeated. But when the +counter is equal to or less than zero, the loop is stopped. For this +to work, the counter has to start out greater than zero and then be +made smaller and smaller by a form that is evaluated repeatedly. + +The test will be an expression such as `(> counter 0)' which returns +`t' for true if the value of `counter' is greater than zero, and `nil' +for false if the value of `counter' is equal to or less than zero. The +expression that makes the number smaller and smaller can be a simple +`setq' such as `(setq counter (1- counter))', where `1-' is a built-in +function in Emacs Lisp that subtracts 1 from its argument. + +The template for a decrementing `while' loop looks like this: + + (while (> counter 0) ; true-or-false-test + BODY... + (setq counter (1- counter))) ; decrementer + +* Menu: + +* Decrementing Example:: +* Dec Example parts:: +* Dec Example altogether:: + + +File: eintr, Node: Decrementing Example, Next: Dec Example parts, Prev: Decrementing Loop, Up: Decrementing Loop + +Example with decrementing counter +................................. + +To illustrate a loop with a decrementing counter, we will rewrite the +`triangle' function so the counter decreases to zero. + +This is the reverse of the earlier version of the function. In this +case, to find out how many pebbles are needed to make a triangle with 3 +rows, add the number of pebbles in the third row, 3, to the number in +the preceding row, 2, and then add the total of those two rows to the +row that precedes them, which is 1. + +Likewise, to find the number of pebbles in a triangle with 7 rows, add +the number of pebbles in the seventh row, 7, to the number in the +preceding row, which is 6, and then add the total of those two rows to +the row that precedes them, which is 5, and so on. As in the previous +example, each addition only involves adding two numbers, the total of +the rows already added up and the number of pebbles in the row that is +being added to the total. This process of adding two numbers is +repeated again and again until there are no more pebbles to add. + +We know how many pebbles to start with: the number of pebbles in the +last row is equal to the number of rows. If the triangle has seven +rows, the number of pebbles in the last row is 7. Likewise, we know how +many pebbles are in the preceding row: it is one less than the number in +the row. + + +File: eintr, Node: Dec Example parts, Next: Dec Example altogether, Prev: Decrementing Example, Up: Decrementing Loop + +The parts of the function definition +.................................... + +We start with three variables: the total number of rows in the +triangle; the number of pebbles in a row; and the total number of +pebbles, which is what we want to calculate. These variables can be +named `number-of-rows', `number-of-pebbles-in-row', and `total', +respectively. + +Both `total' and `number-of-pebbles-in-row' are used only inside the +function and are declared with `let'. The initial value of `total' +should, of course, be zero. However, the initial value of +`number-of-pebbles-in-row' should be equal to the number of rows in the +triangle, since the addition will start with the longest row. + +This means that the beginning of the `let' expression will look like +this: + + (let ((total 0) + (number-of-pebbles-in-row number-of-rows)) + BODY...) + +The total number of pebbles can be found by repeatedly adding the number +of pebbles in a row to the total already found, that is, by repeatedly +evaluating the following expression: + + (setq total (+ total number-of-pebbles-in-row)) + +After the `number-of-pebbles-in-row' is added to the `total', the +`number-of-pebbles-in-row' should be decremented by one, since the next +time the loop repeats, the preceding row will be added to the total. + +The number of pebbles in a preceding row is one less than the number of +pebbles in a row, so the built-in Emacs Lisp function `1-' can be used +to compute the number of pebbles in the preceding row. This can be +done with the following expression: + + (setq number-of-pebbles-in-row + (1- number-of-pebbles-in-row)) + +Finally, we know that the `while' loop should stop making repeated +additions when there are no pebbles in a row. So the test for the +`while' loop is simply: + + (while (> number-of-pebbles-in-row 0) + + +File: eintr, Node: Dec Example altogether, Prev: Dec Example parts, Up: Decrementing Loop + +Putting the function definition together +........................................ + +We can put these expressions together to create a function definition +that works. However, on examination, we find that one of the local +variables is unneeded! + +The function definition looks like this: + + ;;; First subtractive version. + (defun triangle (number-of-rows) + "Add up the number of pebbles in a triangle." + (let ((total 0) + (number-of-pebbles-in-row number-of-rows)) + (while (> number-of-pebbles-in-row 0) + (setq total (+ total number-of-pebbles-in-row)) + (setq number-of-pebbles-in-row + (1- number-of-pebbles-in-row))) + total)) + +As written, this function works. + +However, we do not need `number-of-pebbles-in-row'. + +When the `triangle' function is evaluated, the symbol `number-of-rows' +will be bound to a number, giving it an initial value. That number can +be changed in the body of the function as if it were a local variable, +without any fear that such a change will effect the value of the +variable outside of the function. This is a very useful characteristic +of Lisp; it means that the variable `number-of-rows' can be used +anywhere in the function where `number-of-pebbles-in-row' is used. + +Here is a second version of the function written a bit more cleanly: + + (defun triangle (number) ; Second version. + "Return sum of numbers 1 through NUMBER inclusive." + (let ((total 0)) + (while (> number 0) + (setq total (+ total number)) + (setq number (1- number))) + total)) + +In brief, a properly written `while' loop will consist of three parts: + + 1. A test that will return false after the loop has repeated itself + the correct number of times. + + 2. An expression the evaluation of which will return the value desired + after being repeatedly evaluated. + + 3. An expression to change the value passed to the true-or-false-test + so that the test returns false after the loop has repeated itself + the right number of times. + + +File: eintr, Node: dolist dotimes, Next: Recursion, Prev: while, Up: Loops & Recursion + +11.2 Save your time: `dolist' and `dotimes' +=========================================== + +In addition to `while', both `dolist' and `dotimes' provide for +looping. Sometimes these are quicker to write than the equivalent +`while' loop. Both are Lisp macros. (*Note Macros: (elisp)Macros. ) + +`dolist' works like a `while' loop that `CDRs down a list': `dolist' +automatically shortens the list each time it loops--takes the CDR of +the list--and binds the CAR of each shorter version of the list to the +first of its arguments. + +`dotimes' loops a specific number of times: you specify the number. + +* Menu: + +* dolist:: +* dotimes:: + + +File: eintr, Node: dolist, Next: dotimes, Prev: dolist dotimes, Up: dolist dotimes + +The `dolist' Macro +.................. + +Suppose, for example, you want to reverse a list, so that "first" +"second" "third" becomes "third" "second" "first". + +In practice, you would use the `reverse' function, like this: + + (setq animals '(gazelle giraffe lion tiger)) + + (reverse animals) + +Here is how you could reverse the list using a `while' loop: + + (setq animals '(gazelle giraffe lion tiger)) + + (defun reverse-list-with-while (list) + "Using while, reverse the order of LIST." + (let (value) ; make sure list starts empty + (while list + (setq value (cons (car list) value)) + (setq list (cdr list))) + value)) + + (reverse-list-with-while animals) + +And here is how you could use the `dolist' macro: + + (setq animals '(gazelle giraffe lion tiger)) + + (defun reverse-list-with-dolist (list) + "Using dolist, reverse the order of LIST." + (let (value) ; make sure list starts empty + (dolist (element list value) + (setq value (cons element value))))) + + (reverse-list-with-dolist animals) + +In Info, you can place your cursor after the closing parenthesis of +each expression and type `C-x C-e'; in each case, you should see + + (tiger lion giraffe gazelle) + +in the echo area. + +For this example, the existing `reverse' function is obviously best. +The `while' loop is just like our first example (*note A `while' Loop +and a List: Loop Example.). The `while' first checks whether the list +has elements; if so, it constructs a new list by adding the first +element of the list to the existing list (which in the first iteration +of the loop is `nil'). Since the second element is prepended in front +of the first element, and the third element is prepended in front of +the second element, the list is reversed. + +In the expression using a `while' loop, the `(setq list (cdr list))' +expression shortens the list, so the `while' loop eventually stops. In +addition, it provides the `cons' expression with a new first element by +creating a new and shorter list at each repetition of the loop. + +The `dolist' expression does very much the same as the `while' +expression, except that the `dolist' macro does some of the work you +have to do when writing a `while' expression. + +Like a `while' loop, a `dolist' loops. What is different is that it +automatically shortens the list each time it loops -- it `CDRs down the +list' on its own -- and it automatically binds the CAR of each shorter +version of the list to the first of its arguments. + +In the example, the CAR of each shorter version of the list is referred +to using the symbol `element', the list itself is called `list', and +the value returned is called `value'. The remainder of the `dolist' +expression is the body. + +The `dolist' expression binds the CAR of each shorter version of the +list to `element' and then evaluates the body of the expression; and +repeats the loop. The result is returned in `value'. + + +File: eintr, Node: dotimes, Prev: dolist, Up: dolist dotimes + +The `dotimes' Macro +................... + +The `dotimes' macro is similar to `dolist', except that it loops a +specific number of times. + +The first argument to `dotimes' is assigned the numbers 0, 1, 2 and so +forth each time around the loop, and the value of the third argument is +returned. You need to provide the value of the second argument, which +is how many times the macro loops. + +For example, the following binds the numbers from 0 up to, but not +including, the number 3 to the first argument, NUMBER, and then +constructs a list of the three numbers. (The first number is 0, the +second number is 1, and the third number is 2; this makes a total of +three numbers in all, starting with zero as the first number.) + + (let (value) ; otherwise a value is a void variable + (dotimes (number 3 value) + (setq value (cons number value)))) + + => (2 1 0) + +`dotimes' returns `value', so the way to use `dotimes' is to operate on +some expression NUMBER number of times and then return the result, +either as a list or an atom. + +Here is an example of a `defun' that uses `dotimes' to add up the +number of pebbles in a triangle. + + (defun triangle-using-dotimes (number-of-rows) + "Using dotimes, add up the number of pebbles in a triangle." + (let ((total 0)) ; otherwise a total is a void variable + (dotimes (number number-of-rows total) + (setq total (+ total (1+ number)))))) + + (triangle-using-dotimes 4) + + +File: eintr, Node: Recursion, Next: Looping exercise, Prev: dolist dotimes, Up: Loops & Recursion + +11.3 Recursion +============== + +A recursive function contains code that tells the Lisp interpreter to +call a program that runs exactly like itself, but with slightly +different arguments. The code runs exactly the same because it has the +same name. However, even though the program has the same name, it is +not the same entity. It is different. In the jargon, it is a +different `instance'. + +Eventually, if the program is written correctly, the `slightly +different arguments' will become sufficiently different from the first +arguments that the final instance will stop. + +* Menu: + +* Building Robots:: +* Recursive Definition Parts:: +* Recursion with list:: +* Recursive triangle function:: +* Recursion with cond:: +* Recursive Patterns:: +* No Deferment:: +* No deferment solution:: + + +File: eintr, Node: Building Robots, Next: Recursive Definition Parts, Prev: Recursion, Up: Recursion + +11.3.1 Building Robots: Extending the Metaphor +---------------------------------------------- + +It is sometimes helpful to think of a running program as a robot that +does a job. In doing its job, a recursive function calls on a second +robot to help it. The second robot is identical to the first in every +way, except that the second robot helps the first and has been passed +different arguments than the first. + +In a recursive function, the second robot may call a third; and the +third may call a fourth, and so on. Each of these is a different +entity; but all are clones. + +Since each robot has slightly different instructions--the arguments +will differ from one robot to the next--the last robot should know when +to stop. + +Let's expand on the metaphor in which a computer program is a robot. + +A function definition provides the blueprints for a robot. When you +install a function definition, that is, when you evaluate a `defun' +special form, you install the necessary equipment to build robots. It +is as if you were in a factory, setting up an assembly line. Robots +with the same name are built according to the same blueprints. So they +have, as it were, the same `model number', but a different `serial +number'. + +We often say that a recursive function `calls itself'. What we mean is +that the instructions in a recursive function cause the Lisp +interpreter to run a different function that has the same name and does +the same job as the first, but with different arguments. + +It is important that the arguments differ from one instance to the +next; otherwise, the process will never stop. + + +File: eintr, Node: Recursive Definition Parts, Next: Recursion with list, Prev: Building Robots, Up: Recursion + +11.3.2 The Parts of a Recursive Definition +------------------------------------------ + +A recursive function typically contains a conditional expression which +has three parts: + + 1. A true-or-false-test that determines whether the function is called + again, here called the "do-again-test". + + 2. The name of the function. When this name is called, a new + instance of the function--a new robot, as it were--is created and + told what to do. + + 3. An expression that returns a different value each time the + function is called, here called the "next-step-expression". + Consequently, the argument (or arguments) passed to the new + instance of the function will be different from that passed to the + previous instance. This causes the conditional expression, the + "do-again-test", to test false after the correct number of + repetitions. + +Recursive functions can be much simpler than any other kind of +function. Indeed, when people first start to use them, they often look +so mysteriously simple as to be incomprehensible. Like riding a +bicycle, reading a recursive function definition takes a certain knack +which is hard at first but then seems simple. + +There are several different common recursive patterns. A very simple +pattern looks like this: + + (defun NAME-OF-RECURSIVE-FUNCTION (ARGUMENT-LIST) + "DOCUMENTATION..." + (if DO-AGAIN-TEST + BODY... + (NAME-OF-RECURSIVE-FUNCTION + NEXT-STEP-EXPRESSION))) + +Each time a recursive function is evaluated, a new instance of it is +created and told what to do. The arguments tell the instance what to +do. + +An argument is bound to the value of the next-step-expression. Each +instance runs with a different value of the next-step-expression. + +The value in the next-step-expression is used in the do-again-test. + +The value returned by the next-step-expression is passed to the new +instance of the function, which evaluates it (or some +transmogrification of it) to determine whether to continue or stop. +The next-step-expression is designed so that the do-again-test returns +false when the function should no longer be repeated. + +The do-again-test is sometimes called the "stop condition", since it +stops the repetitions when it tests false. + + +File: eintr, Node: Recursion with list, Next: Recursive triangle function, Prev: Recursive Definition Parts, Up: Recursion + +11.3.3 Recursion with a List +---------------------------- + +The example of a `while' loop that printed the elements of a list of +numbers can be written recursively. Here is the code, including an +expression to set the value of the variable `animals' to a list. + +If you are using GNU Emacs 20 or before, this example must be copied to +the `*scratch*' buffer and each expression must be evaluated there. +Use `C-u C-x C-e' to evaluate the `(print-elements-recursively +animals)' expression so that the results are printed in the buffer; +otherwise the Lisp interpreter will try to squeeze the results into the +one line of the echo area. + +Also, place your cursor immediately after the last closing parenthesis +of the `print-elements-recursively' function, before the comment. +Otherwise, the Lisp interpreter will try to evaluate the comment. + +If you are using a more recent version, you can evaluate this +expression directly in Info. + + (setq animals '(gazelle giraffe lion tiger)) + + (defun print-elements-recursively (list) + "Print each element of LIST on a line of its own. + Uses recursion." + (if list ; do-again-test + (progn + (print (car list)) ; body + (print-elements-recursively ; recursive call + (cdr list))))) ; next-step-expression + + (print-elements-recursively animals) + +The `print-elements-recursively' function first tests whether there is +any content in the list; if there is, the function prints the first +element of the list, the CAR of the list. Then the function `invokes +itself', but gives itself as its argument, not the whole list, but the +second and subsequent elements of the list, the CDR of the list. + +Put another way, if the list is not empty, the function invokes another +instance of code that is similar to the initial code, but is a +different thread of execution, with different arguments than the first +instance. + +Put in yet another way, if the list is not empty, the first robot +assemblies a second robot and tells it what to do; the second robot is +a different individual from the first, but is the same model. + +When the second evaluation occurs, the `if' expression is evaluated and +if true, prints the first element of the list it receives as its +argument (which is the second element of the original list). Then the +function `calls itself' with the CDR of the list it is invoked with, +which (the second time around) is the CDR of the CDR of the original +list. + +Note that although we say that the function `calls itself', what we +mean is that the Lisp interpreter assembles and instructs a new +instance of the program. The new instance is a clone of the first, but +is a separate individual. + +Each time the function `invokes itself', it invokes itself on a shorter +version of the original list. It creates a new instance that works on +a shorter list. + +Eventually, the function invokes itself on an empty list. It creates a +new instance whose argument is `nil'. The conditional expression tests +the value of `list'. Since the value of `list' is `nil', the `if' +expression tests false so the then-part is not evaluated. The function +as a whole then returns `nil'. + +When you evaluate `(print-elements-recursively animals)' in the +`*scratch*' buffer, you see this result: + + gazelle + + giraffe + + lion + + tiger + nil + + +File: eintr, Node: Recursive triangle function, Next: Recursion with cond, Prev: Recursion with list, Up: Recursion + +11.3.4 Recursion in Place of a Counter +-------------------------------------- + +The `triangle' function described in a previous section can also be +written recursively. It looks like this: + + (defun triangle-recursively (number) + "Return the sum of the numbers 1 through NUMBER inclusive. + Uses recursion." + (if (= number 1) ; do-again-test + 1 ; then-part + (+ number ; else-part + (triangle-recursively ; recursive call + (1- number))))) ; next-step-expression + + (triangle-recursively 7) + +You can install this function by evaluating it and then try it by +evaluating `(triangle-recursively 7)'. (Remember to put your cursor +immediately after the last parenthesis of the function definition, +before the comment.) The function evaluates to 28. + +To understand how this function works, let's consider what happens in +the various cases when the function is passed 1, 2, 3, or 4 as the +value of its argument. + +* Menu: + +* Recursive Example arg of 1 or 2:: +* Recursive Example arg of 3 or 4:: + + +File: eintr, Node: Recursive Example arg of 1 or 2, Next: Recursive Example arg of 3 or 4, Prev: Recursive triangle function, Up: Recursive triangle function + +An argument of 1 or 2 +..................... + +First, what happens if the value of the argument is 1? + +The function has an `if' expression after the documentation string. It +tests whether the value of `number' is equal to 1; if so, Emacs +evaluates the then-part of the `if' expression, which returns the +number 1 as the value of the function. (A triangle with one row has +one pebble in it.) + +Suppose, however, that the value of the argument is 2. In this case, +Emacs evaluates the else-part of the `if' expression. + +The else-part consists of an addition, the recursive call to +`triangle-recursively' and a decrementing action; and it looks like +this: + + (+ number (triangle-recursively (1- number))) + +When Emacs evaluates this expression, the innermost expression is +evaluated first; then the other parts in sequence. Here are the steps +in detail: + +Step 1 Evaluate the innermost expression. + The innermost expression is `(1- number)' so Emacs decrements the + value of `number' from 2 to 1. + +Step 2 Evaluate the `triangle-recursively' function. + The Lisp interpreter creates an individual instance of + `triangle-recursively'. It does not matter that this function is + contained within itself. Emacs passes the result Step 1 as the + argument used by this instance of the `triangle-recursively' + function + + In this case, Emacs evaluates `triangle-recursively' with an + argument of 1. This means that this evaluation of + `triangle-recursively' returns 1. + +Step 3 Evaluate the value of `number'. + The variable `number' is the second element of the list that + starts with `+'; its value is 2. + +Step 4 Evaluate the `+' expression. + The `+' expression receives two arguments, the first from the + evaluation of `number' (Step 3) and the second from the evaluation + of `triangle-recursively' (Step 2). + + The result of the addition is the sum of 2 plus 1, and the number + 3 is returned, which is correct. A triangle with two rows has + three pebbles in it. + + +File: eintr, Node: Recursive Example arg of 3 or 4, Prev: Recursive Example arg of 1 or 2, Up: Recursive triangle function + +An argument of 3 or 4 +..................... + +Suppose that `triangle-recursively' is called with an argument of 3. + +Step 1 Evaluate the do-again-test. + The `if' expression is evaluated first. This is the do-again test + and returns false, so the else-part of the `if' expression is + evaluated. (Note that in this example, the do-again-test causes + the function to call itself when it tests false, not when it tests + true.) + +Step 2 Evaluate the innermost expression of the else-part. + The innermost expression of the else-part is evaluated, which + decrements 3 to 2. This is the next-step-expression. + +Step 3 Evaluate the `triangle-recursively' function. + The number 2 is passed to the `triangle-recursively' function. + + We know what happens when Emacs evaluates `triangle-recursively' + with an argument of 2. After going through the sequence of + actions described earlier, it returns a value of 3. So that is + what will happen here. + +Step 4 Evaluate the addition. + 3 will be passed as an argument to the addition and will be added + to the number with which the function was called, which is 3. + +The value returned by the function as a whole will be 6. + +Now that we know what will happen when `triangle-recursively' is called +with an argument of 3, it is evident what will happen if it is called +with an argument of 4: + + In the recursive call, the evaluation of + + (triangle-recursively (1- 4)) + + will return the value of evaluating + + (triangle-recursively 3) + + which is 6 and this value will be added to 4 by the addition in the + third line. + +The value returned by the function as a whole will be 10. + +Each time `triangle-recursively' is evaluated, it evaluates a version +of itself--a different instance of itself--with a smaller argument, +until the argument is small enough so that it does not evaluate itself. + +Note that this particular design for a recursive function requires that +operations be deferred. + +Before `(triangle-recursively 7)' can calculate its answer, it must +call `(triangle-recursively 6)'; and before `(triangle-recursively 6)' +can calculate its answer, it must call `(triangle-recursively 5)'; and +so on. That is to say, the calculation that `(triangle-recursively 7)' +makes must be deferred until `(triangle-recursively 6)' makes its +calculation; and `(triangle-recursively 6)' must defer until +`(triangle-recursively 5)' completes; and so on. + +If each of these instances of `triangle-recursively' are thought of as +different robots, the first robot must wait for the second to complete +its job, which must wait until the third completes, and so on. + +There is a way around this kind of waiting, which we will discuss in +*Note Recursion without Deferments: No Deferment. + + +File: eintr, Node: Recursion with cond, Next: Recursive Patterns, Prev: Recursive triangle function, Up: Recursion + +11.3.5 Recursion Example Using `cond' +------------------------------------- + +The version of `triangle-recursively' described earlier is written with +the `if' special form. It can also be written using another special +form called `cond'. The name of the special form `cond' is an +abbreviation of the word `conditional'. + +Although the `cond' special form is not used as often in the Emacs Lisp +sources as `if', it is used often enough to justify explaining it. + +The template for a `cond' expression looks like this: + + (cond + BODY...) + +where the BODY is a series of lists. + +Written out more fully, the template looks like this: + + (cond + (FIRST-TRUE-OR-FALSE-TEST FIRST-CONSEQUENT) + (SECOND-TRUE-OR-FALSE-TEST SECOND-CONSEQUENT) + (THIRD-TRUE-OR-FALSE-TEST THIRD-CONSEQUENT) + ...) + +When the Lisp interpreter evaluates the `cond' expression, it evaluates +the first element (the CAR or true-or-false-test) of the first +expression in a series of expressions within the body of the `cond'. + +If the true-or-false-test returns `nil' the rest of that expression, +the consequent, is skipped and the true-or-false-test of the next +expression is evaluated. When an expression is found whose +true-or-false-test returns a value that is not `nil', the consequent of +that expression is evaluated. The consequent can be one or more +expressions. If the consequent consists of more than one expression, +the expressions are evaluated in sequence and the value of the last one +is returned. If the expression does not have a consequent, the value +of the true-or-false-test is returned. + +If none of the true-or-false-tests test true, the `cond' expression +returns `nil'. + +Written using `cond', the `triangle' function looks like this: + + (defun triangle-using-cond (number) + (cond ((<= number 0) 0) + ((= number 1) 1) + ((> number 1) + (+ number (triangle-using-cond (1- number)))))) + +In this example, the `cond' returns 0 if the number is less than or +equal to 0, it returns 1 if the number is 1 and it evaluates `(+ number +(triangle-using-cond (1- number)))' if the number is greater than 1. + + +File: eintr, Node: Recursive Patterns, Next: No Deferment, Prev: Recursion with cond, Up: Recursion + +11.3.6 Recursive Patterns +------------------------- + +Here are three common recursive patterns. Each involves a list. +Recursion does not need to involve lists, but Lisp is designed for lists +and this provides a sense of its primal capabilities. + +* Menu: + +* Every:: +* Accumulate:: +* Keep:: + + +File: eintr, Node: Every, Next: Accumulate, Prev: Recursive Patterns, Up: Recursive Patterns + +Recursive Pattern: _every_ +.......................... + +In the `every' recursive pattern, an action is performed on every +element of a list. + +The basic pattern is: + + * If a list be empty, return `nil'. + + * Else, act on the beginning of the list (the CAR of the list) + - through a recursive call by the function on the rest (the + CDR) of the list, + + - and, optionally, combine the acted-on element, using + `cons', with the results of acting on the rest. + +Here is example: + + (defun square-each (numbers-list) + "Square each of a NUMBERS LIST, recursively." + (if (not numbers-list) ; do-again-test + nil + (cons + (* (car numbers-list) (car numbers-list)) + (square-each (cdr numbers-list))))) ; next-step-expression + + (square-each '(1 2 3)) + => (1 4 9) + +If `numbers-list' is empty, do nothing. But if it has content, +construct a list combining the square of the first number in the list +with the result of the recursive call. + +(The example follows the pattern exactly: `nil' is returned if the +numbers' list is empty. In practice, you would write the conditional +so it carries out the action when the numbers' list is not empty.) + +The `print-elements-recursively' function (*note Recursion with a List: +Recursion with list.) is another example of an `every' pattern, except +in this case, rather than bring the results together using `cons', we +print each element of output. + +The `print-elements-recursively' function looks like this: + + (setq animals '(gazelle giraffe lion tiger)) + + (defun print-elements-recursively (list) + "Print each element of LIST on a line of its own. + Uses recursion." + (if list ; do-again-test + (progn + (print (car list)) ; body + (print-elements-recursively ; recursive call + (cdr list))))) ; next-step-expression + + (print-elements-recursively animals) + +The pattern for `print-elements-recursively' is: + + * If the list be empty, do nothing. + + * But if the list has at least one element, + - act on the beginning of the list (the CAR of the list), + + - and make a recursive call on the rest (the CDR) of the + list. + + +File: eintr, Node: Accumulate, Next: Keep, Prev: Every, Up: Recursive Patterns + +Recursive Pattern: _accumulate_ +............................... + +Another recursive pattern is called the `accumulate' pattern. In the +`accumulate' recursive pattern, an action is performed on every element +of a list and the result of that action is accumulated with the results +of performing the action on the other elements. + +This is very like the `every' pattern using `cons', except that `cons' +is not used, but some other combiner. + +The pattern is: + + * If a list be empty, return zero or some other constant. + + * Else, act on the beginning of the list (the CAR of the list), + - and combine that acted-on element, using `+' or some + other combining function, with + + - a recursive call by the function on the rest (the CDR) of + the list. + +Here is an example: + + (defun add-elements (numbers-list) + "Add the elements of NUMBERS-LIST together." + (if (not numbers-list) + 0 + (+ (car numbers-list) (add-elements (cdr numbers-list))))) + + (add-elements '(1 2 3 4)) + => 10 + +*Note Making a List of Files: Files List, for an example of the +accumulate pattern. + + +File: eintr, Node: Keep, Prev: Accumulate, Up: Recursive Patterns + +Recursive Pattern: _keep_ +......................... + +A third recursive pattern is called the `keep' pattern. In the `keep' +recursive pattern, each element of a list is tested; the element is +acted on and the results are kept only if the element meets a criterion. + +Again, this is very like the `every' pattern, except the element is +skipped unless it meets a criterion. + +The pattern has three parts: + + * If a list be empty, return `nil'. + + * Else, if the beginning of the list (the CAR of the list) passes + a test + - act on that element and combine it, using `cons' with + + - a recursive call by the function on the rest (the CDR) of + the list. + + * Otherwise, if the beginning of the list (the CAR of the list) fails + the test + - skip on that element, + + - and, recursively call the function on the rest (the CDR) + of the list. + +Here is an example that uses `cond': + + (defun keep-three-letter-words (word-list) + "Keep three letter words in WORD-LIST." + (cond + ;; First do-again-test: stop-condition + ((not word-list) nil) + + ;; Second do-again-test: when to act + ((eq 3 (length (symbol-name (car word-list)))) + ;; combine acted-on element with recursive call on shorter list + (cons (car word-list) (keep-three-letter-words (cdr word-list)))) + + ;; Third do-again-test: when to skip element; + ;; recursively call shorter list with next-step expression + (t (keep-three-letter-words (cdr word-list))))) + + (keep-three-letter-words '(one two three four five six)) + => (one two six) + +It goes without saying that you need not use `nil' as the test for when +to stop; and you can, of course, combine these patterns. + + +File: eintr, Node: No Deferment, Next: No deferment solution, Prev: Recursive Patterns, Up: Recursion + +11.3.7 Recursion without Deferments +----------------------------------- + +Let's consider again what happens with the `triangle-recursively' +function. We will find that the intermediate calculations are deferred +until all can be done. + +Here is the function definition: + + (defun triangle-recursively (number) + "Return the sum of the numbers 1 through NUMBER inclusive. + Uses recursion." + (if (= number 1) ; do-again-test + 1 ; then-part + (+ number ; else-part + (triangle-recursively ; recursive call + (1- number))))) ; next-step-expression + +What happens when we call this function with a argument of 7? + +The first instance of the `triangle-recursively' function adds the +number 7 to the value returned by a second instance of +`triangle-recursively', an instance that has been passed an argument of +6. That is to say, the first calculation is: + + (+ 7 (triangle-recursively 6)) + +The first instance of `triangle-recursively'--you may want to think of +it as a little robot--cannot complete its job. It must hand off the +calculation for `(triangle-recursively 6)' to a second instance of the +program, to a second robot. This second individual is completely +different from the first one; it is, in the jargon, a `different +instantiation'. Or, put another way, it is a different robot. It is +the same model as the first; it calculates triangle numbers +recursively; but it has a different serial number. + +And what does `(triangle-recursively 6)' return? It returns the number +6 added to the value returned by evaluating `triangle-recursively' with +an argument of 5. Using the robot metaphor, it asks yet another robot +to help it. + +Now the total is: + + (+ 7 6 (triangle-recursively 5)) + +And what happens next? + + (+ 7 6 5 (triangle-recursively 4)) + +Each time `triangle-recursively' is called, except for the last time, +it creates another instance of the program--another robot--and asks it +to make a calculation. + +Eventually, the full addition is set up and performed: + + (+ 7 6 5 4 3 2 1) + +This design for the function defers the calculation of the first step +until the second can be done, and defers that until the third can be +done, and so on. Each deferment means the computer must remember what +is being waited on. This is not a problem when there are only a few +steps, as in this example. But it can be a problem when there are more +steps. + + +File: eintr, Node: No deferment solution, Prev: No Deferment, Up: Recursion + +11.3.8 No Deferment Solution +---------------------------- + +The solution to the problem of deferred operations is to write in a +manner that does not defer operations(1). This requires writing to a +different pattern, often one that involves writing two function +definitions, an `initialization' function and a `helper' function. + +The `initialization' function sets up the job; the `helper' function +does the work. + +Here are the two function definitions for adding up numbers. They are +so simple, I find them hard to understand. + + (defun triangle-initialization (number) + "Return the sum of the numbers 1 through NUMBER inclusive. + This is the `initialization' component of a two function + duo that uses recursion." + (triangle-recursive-helper 0 0 number)) + + (defun triangle-recursive-helper (sum counter number) + "Return SUM, using COUNTER, through NUMBER inclusive. + This is the `helper' component of a two function duo + that uses recursion." + (if (> counter number) + sum + (triangle-recursive-helper (+ sum counter) ; sum + (1+ counter) ; counter + number))) ; number + +Install both function definitions by evaluating them, then call +`triangle-initialization' with 2 rows: + + (triangle-initialization 2) + => 3 + +The `initialization' function calls the first instance of the `helper' +function with three arguments: zero, zero, and a number which is the +number of rows in the triangle. + +The first two arguments passed to the `helper' function are +initialization values. These values are changed when +`triangle-recursive-helper' invokes new instances.(2) + +Let's see what happens when we have a triangle that has one row. (This +triangle will have one pebble in it!) + +`triangle-initialization' will call its helper with the arguments +`0 0 1'. That function will run the conditional test whether `(> +counter number)': + + (> 0 1) + +and find that the result is false, so it will invoke the else-part of +the `if' clause: + + (triangle-recursive-helper + (+ sum counter) ; sum plus counter => sum + (1+ counter) ; increment counter => counter + number) ; number stays the same + +which will first compute: + + (triangle-recursive-helper (+ 0 0) ; sum + (1+ 0) ; counter + 1) ; number +which is: + + (triangle-recursive-helper 0 1 1) + +Again, `(> counter number)' will be false, so again, the Lisp +interpreter will evaluate `triangle-recursive-helper', creating a new +instance with new arguments. + +This new instance will be; + + (triangle-recursive-helper + (+ sum counter) ; sum plus counter => sum + (1+ counter) ; increment counter => counter + number) ; number stays the same + +which is: + + (triangle-recursive-helper 1 2 1) + +In this case, the `(> counter number)' test will be true! So the +instance will return the value of the sum, which will be 1, as expected. + +Now, let's pass `triangle-initialization' an argument of 2, to find out +how many pebbles there are in a triangle with two rows. + +That function calls `(triangle-recursive-helper 0 0 2)'. + +In stages, the instances called will be: + + sum counter number + (triangle-recursive-helper 0 1 2) + + (triangle-recursive-helper 1 2 2) + + (triangle-recursive-helper 3 3 2) + +When the last instance is called, the `(> counter number)' test will be +true, so the instance will return the value of `sum', which will be 3. + +This kind of pattern helps when you are writing functions that can use +many resources in a computer. + +---------- Footnotes ---------- + +(1) The phrase "tail recursive" is used to describe such a process, one +that uses `constant space'. + +(2) The jargon is mildly confusing: `triangle-recursive-helper' uses a +process that is iterative in a procedure that is recursive. The +process is called iterative because the computer need only record the +three values, `sum', `counter', and `number'; the procedure is +recursive because the function `calls itself'. On the other hand, both +the process and the procedure used by `triangle-recursively' are called +recursive. The word `recursive' has different meanings in the two +contexts. + + +File: eintr, Node: Looping exercise, Prev: Recursion, Up: Loops & Recursion + +11.4 Looping Exercise +===================== + + * Write a function similar to `triangle' in which each row has a + value which is the square of the row number. Use a `while' loop. + + * Write a function similar to `triangle' that multiplies instead of + adds the values. + + * Rewrite these two functions recursively. Rewrite these functions + using `cond'. + + * Write a function for Texinfo mode that creates an index entry at + the beginning of a paragraph for every `@dfn' within the paragraph. + (In a Texinfo file, `@dfn' marks a definition. This book is + written in Texinfo.) + + Many of the functions you will need are described in two of the + previous chapters, *Note Cutting and Storing Text: Cutting & + Storing Text, and *Note Yanking Text Back: Yanking. If you use + `forward-paragraph' to put the index entry at the beginning of the + paragraph, you will have to use `C-h f' (`describe-function') to + find out how to make the command go backwards. + + For more information, see *Note Indicating Definitions: + (texinfo)Indicating. + + +File: eintr, Node: Regexp Search, Next: Counting Words, Prev: Loops & Recursion, Up: Top + +12 Regular Expression Searches +****************************** + +Regular expression searches are used extensively in GNU Emacs. The two +functions, `forward-sentence' and `forward-paragraph', illustrate these +searches well. They use regular expressions to find where to move +point. The phrase `regular expression' is often written as `regexp'. + +Regular expression searches are described in *Note Regular Expression +Search: (emacs)Regexp Search, as well as in *Note Regular Expressions: +(elisp)Regular Expressions. In writing this chapter, I am presuming +that you have at least a mild acquaintance with them. The major point +to remember is that regular expressions permit you to search for +patterns as well as for literal strings of characters. For example, +the code in `forward-sentence' searches for the pattern of possible +characters that could mark the end of a sentence, and moves point to +that spot. + +Before looking at the code for the `forward-sentence' function, it is +worth considering what the pattern that marks the end of a sentence +must be. The pattern is discussed in the next section; following that +is a description of the regular expression search function, +`re-search-forward'. The `forward-sentence' function is described in +the section following. Finally, the `forward-paragraph' function is +described in the last section of this chapter. `forward-paragraph' is +a complex function that introduces several new features. + +* Menu: + +* sentence-end:: +* re-search-forward:: +* forward-sentence:: +* forward-paragraph:: +* etags:: +* Regexp Review:: +* re-search Exercises:: + + +File: eintr, Node: sentence-end, Next: re-search-forward, Prev: Regexp Search, Up: Regexp Search + +12.1 The Regular Expression for `sentence-end' +============================================== + +The symbol `sentence-end' is bound to the pattern that marks the end of +a sentence. What should this regular expression be? + +Clearly, a sentence may be ended by a period, a question mark, or an +exclamation mark. Indeed, only clauses that end with one of those three +characters should be considered the end of a sentence. This means that +the pattern should include the character set: + + [.?!] + +However, we do not want `forward-sentence' merely to jump to a period, +a question mark, or an exclamation mark, because such a character might +be used in the middle of a sentence. A period, for example, is used +after abbreviations. So other information is needed. + +According to convention, you type two spaces after every sentence, but +only one space after a period, a question mark, or an exclamation mark +in the body of a sentence. So a period, a question mark, or an +exclamation mark followed by two spaces is a good indicator of an end +of sentence. However, in a file, the two spaces may instead be a tab +or the end of a line. This means that the regular expression should +include these three items as alternatives. + +This group of alternatives will look like this: + + \\($\\| \\| \\) + ^ ^^ + TAB SPC + +Here, `$' indicates the end of the line, and I have pointed out where +the tab and two spaces are inserted in the expression. Both are +inserted by putting the actual characters into the expression. + +Two backslashes, `\\', are required before the parentheses and vertical +bars: the first backslash quotes the following backslash in Emacs; and +the second indicates that the following character, the parenthesis or +the vertical bar, is special. + +Also, a sentence may be followed by one or more carriage returns, like +this: + + [ + ]* + +Like tabs and spaces, a carriage return is inserted into a regular +expression by inserting it literally. The asterisk indicates that the +<RET> is repeated zero or more times. + +But a sentence end does not consist only of a period, a question mark or +an exclamation mark followed by appropriate space: a closing quotation +mark or a closing brace of some kind may precede the space. Indeed more +than one such mark or brace may precede the space. These require a +expression that looks like this: + + []\"')}]* + +In this expression, the first `]' is the first character in the +expression; the second character is `"', which is preceded by a `\' to +tell Emacs the `"' is _not_ special. The last three characters are +`'', `)', and `}'. + +All this suggests what the regular expression pattern for matching the +end of a sentence should be; and, indeed, if we evaluate `sentence-end' +we find that it returns the following value: + + sentence-end + => "[.?!][]\"')}]*\\($\\| \\| \\)[ + ]*" + +(Well, not in GNU Emacs 22; that is because of an effort to make the +process simpler. When its value is `nil', then use the value defined +by the function `sentence-end', and that returns a value constructed +from the variables `sentence-end-base', `sentence-end-double-space', +`sentence-end-without-period', and `sentence-end-without-space'. The +critical variable is `sentence-end-base'; its global value is similar +to the one described above but it also contains two additional +quotation marks. These have differing degrees of curliness. The +`sentence-end-without-period' variable, when true, tells Emacs that a +sentence may end without a period, such as text in Thai.) + + +File: eintr, Node: re-search-forward, Next: forward-sentence, Prev: sentence-end, Up: Regexp Search + +12.2 The `re-search-forward' Function +===================================== + +The `re-search-forward' function is very like the `search-forward' +function. (*Note The `search-forward' Function: search-forward.) + +`re-search-forward' searches for a regular expression. If the search +is successful, it leaves point immediately after the last character in +the target. If the search is backwards, it leaves point just before +the first character in the target. You may tell `re-search-forward' to +return `t' for true. (Moving point is therefore a `side effect'.) + +Like `search-forward', the `re-search-forward' function takes four +arguments: + + 1. The first argument is the regular expression that the function + searches for. The regular expression will be a string between + quotations marks. + + 2. The optional second argument limits how far the function will + search; it is a bound, which is specified as a position in the + buffer. + + 3. The optional third argument specifies how the function responds to + failure: `nil' as the third argument causes the function to signal + an error (and print a message) when the search fails; any other + value causes it to return `nil' if the search fails and `t' if the + search succeeds. + + 4. The optional fourth argument is the repeat count. A negative + repeat count causes `re-search-forward' to search backwards. + +The template for `re-search-forward' looks like this: + + (re-search-forward "REGULAR-EXPRESSION" + LIMIT-OF-SEARCH + WHAT-TO-DO-IF-SEARCH-FAILS + REPEAT-COUNT) + +The second, third, and fourth arguments are optional. However, if you +want to pass a value to either or both of the last two arguments, you +must also pass a value to all the preceding arguments. Otherwise, the +Lisp interpreter will mistake which argument you are passing the value +to. + +In the `forward-sentence' function, the regular expression will be the +value of the variable `sentence-end'. In simple form, that is: + + "[.?!][]\"')}]*\\($\\| \\| \\)[ + ]*" + +The limit of the search will be the end of the paragraph (since a +sentence cannot go beyond a paragraph). If the search fails, the +function will return `nil'; and the repeat count will be provided by +the argument to the `forward-sentence' function. + + +File: eintr, Node: forward-sentence, Next: forward-paragraph, Prev: re-search-forward, Up: Regexp Search + +12.3 `forward-sentence' +======================= + +The command to move the cursor forward a sentence is a straightforward +illustration of how to use regular expression searches in Emacs Lisp. +Indeed, the function looks longer and more complicated than it is; this +is because the function is designed to go backwards as well as forwards; +and, optionally, over more than one sentence. The function is usually +bound to the key command `M-e'. + +* Menu: + +* Complete forward-sentence:: +* fwd-sentence while loops:: +* fwd-sentence re-search:: + + +File: eintr, Node: Complete forward-sentence, Next: fwd-sentence while loops, Prev: forward-sentence, Up: forward-sentence + +Complete `forward-sentence' function definition +----------------------------------------------- + +Here is the code for `forward-sentence': + + (defun forward-sentence (&optional arg) + "Move forward to next `sentence-end'. With argument, repeat. + With negative argument, move backward repeatedly to `sentence-beginning'. + + The variable `sentence-end' is a regular expression that matches ends of + sentences. Also, every paragraph boundary terminates sentences as well." + (interactive "p") + (or arg (setq arg 1)) + (let ((opoint (point)) + (sentence-end (sentence-end))) + (while (< arg 0) + (let ((pos (point)) + (par-beg (save-excursion (start-of-paragraph-text) (point)))) + (if (and (re-search-backward sentence-end par-beg t) + (or (< (match-end 0) pos) + (re-search-backward sentence-end par-beg t))) + (goto-char (match-end 0)) + (goto-char par-beg))) + (setq arg (1+ arg))) + (while (> arg 0) + (let ((par-end (save-excursion (end-of-paragraph-text) (point)))) + (if (re-search-forward sentence-end par-end t) + (skip-chars-backward " \t\n") + (goto-char par-end))) + (setq arg (1- arg))) + (constrain-to-field nil opoint t))) + +The function looks long at first sight and it is best to look at its +skeleton first, and then its muscle. The way to see the skeleton is to +look at the expressions that start in the left-most columns: + + (defun forward-sentence (&optional arg) + "DOCUMENTATION..." + (interactive "p") + (or arg (setq arg 1)) + (let ((opoint (point)) (sentence-end (sentence-end))) + (while (< arg 0) + (let ((pos (point)) + (par-beg (save-excursion (start-of-paragraph-text) (point)))) + REST-OF-BODY-OF-WHILE-LOOP-WHEN-GOING-BACKWARDS + (while (> arg 0) + (let ((par-end (save-excursion (end-of-paragraph-text) (point)))) + REST-OF-BODY-OF-WHILE-LOOP-WHEN-GOING-FORWARDS + HANDLE-FORMS-AND-EQUIVALENT + +This looks much simpler! The function definition consists of +documentation, an `interactive' expression, an `or' expression, a `let' +expression, and `while' loops. + +Let's look at each of these parts in turn. + +We note that the documentation is thorough and understandable. + +The function has an `interactive "p"' declaration. This means that the +processed prefix argument, if any, is passed to the function as its +argument. (This will be a number.) If the function is not passed an +argument (it is optional) then the argument `arg' will be bound to 1. + +When `forward-sentence' is called non-interactively without an +argument, `arg' is bound to `nil'. The `or' expression handles this. +What it does is either leave the value of `arg' as it is, but only if +`arg' is bound to a value; or it sets the value of `arg' to 1, in the +case when `arg' is bound to `nil'. + +Next is a `let'. That specifies the values of two local variables, +`point' and `sentence-end'. The local value of point, from before the +search, is used in the `constrain-to-field' function which handles +forms and equivalents. The `sentence-end' variable is set by the +`sentence-end' function. + + +File: eintr, Node: fwd-sentence while loops, Next: fwd-sentence re-search, Prev: Complete forward-sentence, Up: forward-sentence + +The `while' loops +----------------- + +Two `while' loops follow. The first `while' has a true-or-false-test +that tests true if the prefix argument for `forward-sentence' is a +negative number. This is for going backwards. The body of this loop +is similar to the body of the second `while' clause, but it is not +exactly the same. We will skip this `while' loop and concentrate on +the second `while' loop. + +The second `while' loop is for moving point forward. Its skeleton +looks like this: + + (while (> arg 0) ; true-or-false-test + (let VARLIST + (if (TRUE-OR-FALSE-TEST) + THEN-PART + ELSE-PART + (setq arg (1- arg)))) ; `while' loop decrementer + +The `while' loop is of the decrementing kind. (*Note A Loop with a +Decrementing Counter: Decrementing Loop.) It has a true-or-false-test +that tests true so long as the counter (in this case, the variable +`arg') is greater than zero; and it has a decrementer that subtracts 1 +from the value of the counter every time the loop repeats. + +If no prefix argument is given to `forward-sentence', which is the most +common way the command is used, this `while' loop will run once, since +the value of `arg' will be 1. + +The body of the `while' loop consists of a `let' expression, which +creates and binds a local variable, and has, as its body, an `if' +expression. + +The body of the `while' loop looks like this: + + (let ((par-end + (save-excursion (end-of-paragraph-text) (point)))) + (if (re-search-forward sentence-end par-end t) + (skip-chars-backward " \t\n") + (goto-char par-end))) + +The `let' expression creates and binds the local variable `par-end'. +As we shall see, this local variable is designed to provide a bound or +limit to the regular expression search. If the search fails to find a +proper sentence ending in the paragraph, it will stop on reaching the +end of the paragraph. + +But first, let us examine how `par-end' is bound to the value of the +end of the paragraph. What happens is that the `let' sets the value of +`par-end' to the value returned when the Lisp interpreter evaluates the +expression + + (save-excursion (end-of-paragraph-text) (point)) + +In this expression, `(end-of-paragraph-text)' moves point to the end of +the paragraph, `(point)' returns the value of point, and then +`save-excursion' restores point to its original position. Thus, the +`let' binds `par-end' to the value returned by the `save-excursion' +expression, which is the position of the end of the paragraph. (The +`(end-of-paragraph-text)' function uses `forward-paragraph', which we +will discuss shortly.) + +Emacs next evaluates the body of the `let', which is an `if' expression +that looks like this: + + (if (re-search-forward sentence-end par-end t) ; if-part + (skip-chars-backward " \t\n") ; then-part + (goto-char par-end))) ; else-part + +The `if' tests whether its first argument is true and if so, evaluates +its then-part; otherwise, the Emacs Lisp interpreter evaluates the +else-part. The true-or-false-test of the `if' expression is the +regular expression search. + +It may seem odd to have what looks like the `real work' of the +`forward-sentence' function buried here, but this is a common way this +kind of operation is carried out in Lisp. + + +File: eintr, Node: fwd-sentence re-search, Prev: fwd-sentence while loops, Up: forward-sentence + +The regular expression search +----------------------------- + +The `re-search-forward' function searches for the end of the sentence, +that is, for the pattern defined by the `sentence-end' regular +expression. If the pattern is found--if the end of the sentence is +found--then the `re-search-forward' function does two things: + + 1. The `re-search-forward' function carries out a side effect, which + is to move point to the end of the occurrence found. + + 2. The `re-search-forward' function returns a value of true. This is + the value received by the `if', and means that the search was + successful. + +The side effect, the movement of point, is completed before the `if' +function is handed the value returned by the successful conclusion of +the search. + +When the `if' function receives the value of true from a successful +call to `re-search-forward', the `if' evaluates the then-part, which is +the expression `(skip-chars-backward " \t\n")'. This expression moves +backwards over any blank spaces, tabs or carriage returns until a +printed character is found and then leaves point after the character. +Since point has already been moved to the end of the pattern that marks +the end of the sentence, this action leaves point right after the +closing printed character of the sentence, which is usually a period. + +On the other hand, if the `re-search-forward' function fails to find a +pattern marking the end of the sentence, the function returns false. +The false then causes the `if' to evaluate its third argument, which is +`(goto-char par-end)': it moves point to the end of the paragraph. + +(And if the text is in a form or equivalent, and point may not move +fully, then the `constrain-to-field' function comes into play.) + +Regular expression searches are exceptionally useful and the pattern +illustrated by `re-search-forward', in which the search is the test of +an `if' expression, is handy. You will see or write code incorporating +this pattern often. + + +File: eintr, Node: forward-paragraph, Next: etags, Prev: forward-sentence, Up: Regexp Search + +12.4 `forward-paragraph': a Goldmine of Functions +================================================= + +The `forward-paragraph' function moves point forward to the end of the +paragraph. It is usually bound to `M-}' and makes use of a number of +functions that are important in themselves, including `let*', +`match-beginning', and `looking-at'. + +The function definition for `forward-paragraph' is considerably longer +than the function definition for `forward-sentence' because it works +with a paragraph, each line of which may begin with a fill prefix. + +A fill prefix consists of a string of characters that are repeated at +the beginning of each line. For example, in Lisp code, it is a +convention to start each line of a paragraph-long comment with `;;; '. +In Text mode, four blank spaces make up another common fill prefix, +creating an indented paragraph. (*Note Fill Prefix: (emacs)Fill +Prefix, for more information about fill prefixes.) + +The existence of a fill prefix means that in addition to being able to +find the end of a paragraph whose lines begin on the left-most column, +the `forward-paragraph' function must be able to find the end of a +paragraph when all or many of the lines in the buffer begin with the +fill prefix. + +Moreover, it is sometimes practical to ignore a fill prefix that +exists, especially when blank lines separate paragraphs. This is an +added complication. + +* Menu: + +* forward-paragraph in brief:: +* fwd-para let:: +* fwd-para while:: + + +File: eintr, Node: forward-paragraph in brief, Next: fwd-para let, Prev: forward-paragraph, Up: forward-paragraph + +Shortened `forward-paragraph' function definition +------------------------------------------------- + +Rather than print all of the `forward-paragraph' function, we will only +print parts of it. Read without preparation, the function can be +daunting! + +In outline, the function looks like this: + + (defun forward-paragraph (&optional arg) + "DOCUMENTATION..." + (interactive "p") + (or arg (setq arg 1)) + (let* + VARLIST + (while (and (< arg 0) (not (bobp))) ; backward-moving-code + ... + (while (and (> arg 0) (not (eobp))) ; forward-moving-code + ... + +The first parts of the function are routine: the function's argument +list consists of one optional argument. Documentation follows. + +The lower case `p' in the `interactive' declaration means that the +processed prefix argument, if any, is passed to the function. This +will be a number, and is the repeat count of how many paragraphs point +will move. The `or' expression in the next line handles the common +case when no argument is passed to the function, which occurs if the +function is called from other code rather than interactively. This +case was described earlier. (*Note The `forward-sentence' function: +forward-sentence.) Now we reach the end of the familiar part of this +function. + + +File: eintr, Node: fwd-para let, Next: fwd-para while, Prev: forward-paragraph in brief, Up: forward-paragraph + +The `let*' expression +--------------------- + +The next line of the `forward-paragraph' function begins a `let*' +expression. This is a different than `let'. The symbol is `let*' not +`let'. + +The `let*' special form is like `let' except that Emacs sets each +variable in sequence, one after another, and variables in the latter +part of the varlist can make use of the values to which Emacs set +variables in the earlier part of the varlist. + +(*Note `save-excursion' in `append-to-buffer': append save-excursion.) + +In the `let*' expression in this function, Emacs binds a total of seven +variables: `opoint', `fill-prefix-regexp', `parstart', `parsep', +`sp-parstart', `start', and `found-start'. + +The variable `parsep' appears twice, first, to remove instances of `^', +and second, to handle fill prefixes. + +The variable `opoint' is just the value of `point'. As you can guess, +it is used in a `constrain-to-field' expression, just as in +`forward-sentence'. + +The variable `fill-prefix-regexp' is set to the value returned by +evaluating the following list: + + (and fill-prefix + (not (equal fill-prefix "")) + (not paragraph-ignore-fill-prefix) + (regexp-quote fill-prefix)) + +This is an expression whose first element is the `and' special form. + +As we learned earlier (*note The `kill-new' function: kill-new +function.), the `and' special form evaluates each of its arguments +until one of the arguments returns a value of `nil', in which case the +`and' expression returns `nil'; however, if none of the arguments +returns a value of `nil', the value resulting from evaluating the last +argument is returned. (Since such a value is not `nil', it is +considered true in Lisp.) In other words, an `and' expression returns +a true value only if all its arguments are true. + +In this case, the variable `fill-prefix-regexp' is bound to a non-`nil' +value only if the following four expressions produce a true (i.e., a +non-`nil') value when they are evaluated; otherwise, +`fill-prefix-regexp' is bound to `nil'. + +`fill-prefix' + When this variable is evaluated, the value of the fill prefix, if + any, is returned. If there is no fill prefix, this variable + returns `nil'. + +`(not (equal fill-prefix "")' + This expression checks whether an existing fill prefix is an empty + string, that is, a string with no characters in it. An empty + string is not a useful fill prefix. + +`(not paragraph-ignore-fill-prefix)' + This expression returns `nil' if the variable + `paragraph-ignore-fill-prefix' has been turned on by being set to a + true value such as `t'. + +`(regexp-quote fill-prefix)' + This is the last argument to the `and' special form. If all the + arguments to the `and' are true, the value resulting from + evaluating this expression will be returned by the `and' expression + and bound to the variable `fill-prefix-regexp', + +The result of evaluating this `and' expression successfully is that +`fill-prefix-regexp' will be bound to the value of `fill-prefix' as +modified by the `regexp-quote' function. What `regexp-quote' does is +read a string and return a regular expression that will exactly match +the string and match nothing else. This means that +`fill-prefix-regexp' will be set to a value that will exactly match the +fill prefix if the fill prefix exists. Otherwise, the variable will be +set to `nil'. + +The next two local variables in the `let*' expression are designed to +remove instances of `^' from `parstart' and `parsep', the local +variables indicate the paragraph start and the paragraph separator. +The next expression sets `parsep' again. That is to handle fill +prefixes. + +This is the setting that requires the definition call `let*' rather +than `let'. The true-or-false-test for the `if' depends on whether the +variable `fill-prefix-regexp' evaluates to `nil' or some other value. + +If `fill-prefix-regexp' does not have a value, Emacs evaluates the +else-part of the `if' expression and binds `parsep' to its local value. +(`parsep' is a regular expression that matches what separates +paragraphs.) + +But if `fill-prefix-regexp' does have a value, Emacs evaluates the +then-part of the `if' expression and binds `parsep' to a regular +expression that includes the `fill-prefix-regexp' as part of the +pattern. + +Specifically, `parsep' is set to the original value of the paragraph +separate regular expression concatenated with an alternative expression +that consists of the `fill-prefix-regexp' followed by optional +whitespace to the end of the line. The whitespace is defined by +`"[ \t]*$"'.) The `\\|' defines this portion of the regexp as an +alternative to `parsep'. + +According to a comment in the code, the next local variable, +`sp-parstart', is used for searching, and then the final two, `start' +and `found-start', are set to `nil'. + +Now we get into the body of the `let*'. The first part of the body of +the `let*' deals with the case when the function is given a negative +argument and is therefore moving backwards. We will skip this section. + + +File: eintr, Node: fwd-para while, Prev: fwd-para let, Up: forward-paragraph + +The forward motion `while' loop +------------------------------- + +The second part of the body of the `let*' deals with forward motion. +It is a `while' loop that repeats itself so long as the value of `arg' +is greater than zero. In the most common use of the function, the +value of the argument is 1, so the body of the `while' loop is +evaluated exactly once, and the cursor moves forward one paragraph. + +This part handles three situations: when point is between paragraphs, +when there is a fill prefix and when there is no fill prefix. + +The `while' loop looks like this: + + ;; going forwards and not at the end of the buffer + (while (and (> arg 0) (not (eobp))) + + ;; between paragraphs + ;; Move forward over separator lines... + (while (and (not (eobp)) + (progn (move-to-left-margin) (not (eobp))) + (looking-at parsep)) + (forward-line 1)) + ;; This decrements the loop + (unless (eobp) (setq arg (1- arg))) + ;; ... and one more line. + (forward-line 1) + + (if fill-prefix-regexp + ;; There is a fill prefix; it overrides parstart; + ;; we go forward line by line + (while (and (not (eobp)) + (progn (move-to-left-margin) (not (eobp))) + (not (looking-at parsep)) + (looking-at fill-prefix-regexp)) + (forward-line 1)) + + ;; There is no fill prefix; + ;; we go forward character by character + (while (and (re-search-forward sp-parstart nil 1) + (progn (setq start (match-beginning 0)) + (goto-char start) + (not (eobp))) + (progn (move-to-left-margin) + (not (looking-at parsep))) + (or (not (looking-at parstart)) + (and use-hard-newlines + (not (get-text-property (1- start) 'hard))))) + (forward-char 1)) + + ;; and if there is no fill prefix and if we are not at the end, + ;; go to whatever was found in the regular expression search + ;; for sp-parstart + (if (< (point) (point-max)) + (goto-char start)))) + +We can see that this is a decrementing counter `while' loop, using the +expression `(setq arg (1- arg))' as the decrementer. That expression +is not far from the `while', but is hidden in another Lisp macro, an +`unless' macro. Unless we are at the end of the buffer -- that is what +the `eobp' function determines; it is an abbreviation of `End Of Buffer +P' -- we decrease the value of `arg' by one. + +(If we are at the end of the buffer, we cannot go forward any more and +the next loop of the `while' expression will test false since the test +is an `and' with `(not (eobp))'. The `not' function means exactly as +you expect; it is another name for `null', a function that returns true +when its argument is false.) + +Interestingly, the loop count is not decremented until we leave the +space between paragraphs, unless we come to the end of buffer or stop +seeing the local value of the paragraph separator. + +That second `while' also has a `(move-to-left-margin)' expression. The +function is self-explanatory. It is inside a `progn' expression and +not the last element of its body, so it is only invoked for its side +effect, which is to move point to the left margin of the current line. + +The `looking-at' function is also self-explanatory; it returns true if +the text after point matches the regular expression given as its +argument. + +The rest of the body of the loop looks difficult at first, but makes +sense as you come to understand it. + +First consider what happens if there is a fill prefix: + + (if fill-prefix-regexp + ;; There is a fill prefix; it overrides parstart; + ;; we go forward line by line + (while (and (not (eobp)) + (progn (move-to-left-margin) (not (eobp))) + (not (looking-at parsep)) + (looking-at fill-prefix-regexp)) + (forward-line 1)) + +This expression moves point forward line by line so long as four +conditions are true: + + 1. Point is not at the end of the buffer. + + 2. We can move to the left margin of the text and are not at the end + of the buffer. + + 3. The text following point does not separate paragraphs. + + 4. The pattern following point is the fill prefix regular expression. + +The last condition may be puzzling, until you remember that point was +moved to the beginning of the line early in the `forward-paragraph' +function. This means that if the text has a fill prefix, the +`looking-at' function will see it. + +Consider what happens when there is no fill prefix. + + (while (and (re-search-forward sp-parstart nil 1) + (progn (setq start (match-beginning 0)) + (goto-char start) + (not (eobp))) + (progn (move-to-left-margin) + (not (looking-at parsep))) + (or (not (looking-at parstart)) + (and use-hard-newlines + (not (get-text-property (1- start) 'hard))))) + (forward-char 1)) + +This `while' loop has us searching forward for `sp-parstart', which is +the combination of possible whitespace with a the local value of the +start of a paragraph or of a paragraph separator. (The latter two are +within an expression starting `\(?:' so that they are not referenced by +the `match-beginning' function.) + +The two expressions, + + (setq start (match-beginning 0)) + (goto-char start) + +mean go to the start of the text matched by the regular expression +search. + +The `(match-beginning 0)' expression is new. It returns a number +specifying the location of the start of the text that was matched by +the last search. + +The `match-beginning' function is used here because of a characteristic +of a forward search: a successful forward search, regardless of whether +it is a plain search or a regular expression search, moves point to the +end of the text that is found. In this case, a successful search moves +point to the end of the pattern for `sp-parstart'. + +However, we want to put point at the end of the current paragraph, not +somewhere else. Indeed, since the search possibly includes the +paragraph separator, point may end up at the beginning of the next one +unless we use an expression that includes `match-beginning'. + +When given an argument of 0, `match-beginning' returns the position +that is the start of the text matched by the most recent search. In +this case, the most recent search looks for `sp-parstart'. The +`(match-beginning 0)' expression returns the beginning position of that +pattern, rather than the end position of that pattern. + +(Incidentally, when passed a positive number as an argument, the +`match-beginning' function returns the location of point at that +parenthesized expression in the last search unless that parenthesized +expression begins with `\(?:'. I don't know why `\(?:' appears here +since the argument is 0.) + +The last expression when there is no fill prefix is + + (if (< (point) (point-max)) + (goto-char start)))) + +This says that if there is no fill prefix and if we are not at the end, +point should move to the beginning of whatever was found by the regular +expression search for `sp-parstart'. + +The full definition for the `forward-paragraph' function not only +includes code for going forwards, but also code for going backwards. + +If you are reading this inside of GNU Emacs and you want to see the +whole function, you can type `C-h f' (`describe-function') and the name +of the function. This gives you the function documentation and the +name of the library containing the function's source. Place point over +the name of the library and press the RET key; you will be taken +directly to the source. (Be sure to install your sources! Without +them, you are like a person who tries to drive a car with his eyes +shut!) + + +File: eintr, Node: etags, Next: Regexp Review, Prev: forward-paragraph, Up: Regexp Search + +12.5 Create Your Own `TAGS' File +================================ + +Besides `C-h f' (`describe-function'), another way to see the source of +a function is to type `M-.' (`find-tag') and the name of the function +when prompted for it. This is a good habit to get into. This will +take you directly to the source. If the `find-tag' function first asks +you for the name of a `TAGS' table, give it the name of a `TAGS' file +such as `/usr/local/src/emacs/src/TAGS'. (The exact path to your +`TAGS' file depends on how your copy of Emacs was installed. I just +told you the location that provides both my C and my Emacs Lisp +sources.) + +You can also create your own `TAGS' file for directories that lack one. + +The `M-.' (`find-tag') command takes you directly to the source for a +function, variable, node, or other source. The function depends on +tags tables to tell it where to go. + +You often need to build and install tags tables yourself. They are not +built automatically. A tags table is called a `TAGS' file; the name is +in upper case letters. + +You can create a `TAGS' file by calling the `etags' program that comes +as a part of the Emacs distribution. Usually, `etags' is compiled and +installed when Emacs is built. (`etags' is not an Emacs Lisp function +or a part of Emacs; it is a C program.) + +To create a `TAGS' file, first switch to the directory in which you +want to create the file. In Emacs you can do this with the `M-x cd' +command, or by visiting a file in the directory, or by listing the +directory with `C-x d' (`dired'). Then run the compile command, with +`etags *.el' as the command to execute + + M-x compile RET etags *.el RET + +to create a `TAGS' file. + +For example, if you have a large number of files in your `~/emacs' +directory, as I do--I have 137 `.el' files in it, of which I load +12--you can create a `TAGS' file for the Emacs Lisp files in that +directory. + +The `etags' program takes all the usual shell `wildcards'. For +example, if you have two directories for which you want a single `TAGS +file', type `etags *.el ../elisp/*.el', where `../elisp/' is the second +directory: + + M-x compile RET etags *.el ../elisp/*.el RET + +Type + + M-x compile RET etags --help RET + +to see a list of the options accepted by `etags' as well as a list of +supported languages. + +The `etags' program handles more than 20 languages, including Emacs +Lisp, Common Lisp, Scheme, C, C++, Ada, Fortran, Java, LaTeX, Pascal, +Perl, Python, Texinfo, makefiles, and most assemblers. The program has +no switches for specifying the language; it recognizes the language in +an input file according to its file name and contents. + +`etags' is very helpful when you are writing code yourself and want to +refer back to functions you have already written. Just run `etags' +again at intervals as you write new functions, so they become part of +the `TAGS' file. + +If you think an appropriate `TAGS' file already exists for what you +want, but do not know where it is, you can use the `locate' program to +attempt to find it. + +Type `M-x locate <RET> TAGS <RET>' and Emacs will list for you the full +path names of all your `TAGS' files. On my system, this command lists +34 `TAGS' files. On the other hand, a `plain vanilla' system I +recently installed did not contain any `TAGS' files. + +If the tags table you want has been created, you can use the `M-x +visit-tags-table' command to specify it. Otherwise, you will need to +create the tag table yourself and then use `M-x visit-tags-table'. + +Building Tags in the Emacs sources +.................................. + +The GNU Emacs sources come with a `Makefile' that contains a +sophisticated `etags' command that creates, collects, and merges tags +tables from all over the Emacs sources and puts the information into +one `TAGS' file in the `src/' directory below the top level of your +Emacs source directory. + +To build this `TAGS' file, go to the top level of your Emacs source +directory and run the compile command `make tags': + + M-x compile RET make tags RET + +(The `make tags' command works well with the GNU Emacs sources, as well +as with some other source packages.) + +For more information, see *Note Tag Tables: (emacs)Tags. + + +File: eintr, Node: Regexp Review, Next: re-search Exercises, Prev: etags, Up: Regexp Search + +12.6 Review +=========== + +Here is a brief summary of some recently introduced functions. + +`while' + Repeatedly evaluate the body of the expression so long as the first + element of the body tests true. Then return `nil'. (The + expression is evaluated only for its side effects.) + + For example: + + (let ((foo 2)) + (while (> foo 0) + (insert (format "foo is %d.\n" foo)) + (setq foo (1- foo)))) + + => foo is 2. + foo is 1. + nil + + (The `insert' function inserts its arguments at point; the + `format' function returns a string formatted from its arguments + the way `message' formats its arguments; `\n' produces a new line.) + +`re-search-forward' + Search for a pattern, and if the pattern is found, move point to + rest just after it. + + Takes four arguments, like `search-forward': + + 1. A regular expression that specifies the pattern to search for. + (Remember to put quotation marks around this argument!) + + 2. Optionally, the limit of the search. + + 3. Optionally, what to do if the search fails, return `nil' or an + error message. + + 4. Optionally, how many times to repeat the search; if negative, + the search goes backwards. + +`let*' + Bind some variables locally to particular values, and then + evaluate the remaining arguments, returning the value of the last + one. While binding the local variables, use the local values of + variables bound earlier, if any. + + For example: + + (let* ((foo 7) + (bar (* 3 foo))) + (message "`bar' is %d." bar)) + => `bar' is 21. + +`match-beginning' + Return the position of the start of the text found by the last + regular expression search. + +`looking-at' + Return `t' for true if the text after point matches the argument, + which should be a regular expression. + +`eobp' + Return `t' for true if point is at the end of the accessible part + of a buffer. The end of the accessible part is the end of the + buffer if the buffer is not narrowed; it is the end of the + narrowed part if the buffer is narrowed. + + +File: eintr, Node: re-search Exercises, Prev: Regexp Review, Up: Regexp Search + +12.7 Exercises with `re-search-forward' +======================================= + + * Write a function to search for a regular expression that matches + two or more blank lines in sequence. + + * Write a function to search for duplicated words, such as `the the'. + *Note Syntax of Regular Expressions: (emacs)Regexps, for + information on how to write a regexp (a regular expression) to + match a string that is composed of two identical halves. You can + devise several regexps; some are better than others. The function + I use is described in an appendix, along with several regexps. + *Note `the-the' Duplicated Words Function: the-the. + + +File: eintr, Node: Counting Words, Next: Words in a defun, Prev: Regexp Search, Up: Top + +13 Counting: Repetition and Regexps +*********************************** + +Repetition and regular expression searches are powerful tools that you +often use when you write code in Emacs Lisp. This chapter illustrates +the use of regular expression searches through the construction of word +count commands using `while' loops and recursion. + +* Menu: + +* Why Count Words:: +* count-words-region:: +* recursive-count-words:: +* Counting Exercise:: + + +File: eintr, Node: Why Count Words, Next: count-words-region, Prev: Counting Words, Up: Counting Words + +Counting words +============== + +The standard Emacs distribution contains a function for counting the +number of lines within a region. However, there is no corresponding +function for counting words. + +Certain types of writing ask you to count words. Thus, if you write an +essay, you may be limited to 800 words; if you write a novel, you may +discipline yourself to write 1000 words a day. It seems odd to me that +Emacs lacks a word count command. Perhaps people use Emacs mostly for +code or types of documentation that do not require word counts; or +perhaps they restrict themselves to the operating system word count +command, `wc'. Alternatively, people may follow the publishers' +convention and compute a word count by dividing the number of +characters in a document by five. In any event, here are commands to +count words. + + +File: eintr, Node: count-words-region, Next: recursive-count-words, Prev: Why Count Words, Up: Counting Words + +13.1 The `count-words-region' Function +====================================== + +A word count command could count words in a line, paragraph, region, or +buffer. What should the command cover? You could design the command +to count the number of words in a complete buffer. However, the Emacs +tradition encourages flexibility--you may want to count words in just a +section, rather than all of a buffer. So it makes more sense to design +the command to count the number of words in a region. Once you have a +`count-words-region' command, you can, if you wish, count words in a +whole buffer by marking it with `C-x h' (`mark-whole-buffer'). + +Clearly, counting words is a repetitive act: starting from the +beginning of the region, you count the first word, then the second +word, then the third word, and so on, until you reach the end of the +region. This means that word counting is ideally suited to recursion +or to a `while' loop. + +* Menu: + +* Design count-words-region:: +* Whitespace Bug:: + + +File: eintr, Node: Design count-words-region, Next: Whitespace Bug, Prev: count-words-region, Up: count-words-region + +Designing `count-words-region' +------------------------------ + +First, we will implement the word count command with a `while' loop, +then with recursion. The command will, of course, be interactive. + +The template for an interactive function definition is, as always: + + (defun NAME-OF-FUNCTION (ARGUMENT-LIST) + "DOCUMENTATION..." + (INTERACTIVE-EXPRESSION...) + BODY...) + +What we need to do is fill in the slots. + +The name of the function should be self-explanatory and similar to the +existing `count-lines-region' name. This makes the name easier to +remember. `count-words-region' is a good choice. + +The function counts words within a region. This means that the +argument list must contain symbols that are bound to the two positions, +the beginning and end of the region. These two positions can be called +`beginning' and `end' respectively. The first line of the +documentation should be a single sentence, since that is all that is +printed as documentation by a command such as `apropos'. The +interactive expression will be of the form `(interactive "r")', since +that will cause Emacs to pass the beginning and end of the region to +the function's argument list. All this is routine. + +The body of the function needs to be written to do three tasks: first, +to set up conditions under which the `while' loop can count words, +second, to run the `while' loop, and third, to send a message to the +user. + +When a user calls `count-words-region', point may be at the beginning +or the end of the region. However, the counting process must start at +the beginning of the region. This means we will want to put point +there if it is not already there. Executing `(goto-char beginning)' +ensures this. Of course, we will want to return point to its expected +position when the function finishes its work. For this reason, the +body must be enclosed in a `save-excursion' expression. + +The central part of the body of the function consists of a `while' loop +in which one expression jumps point forward word by word, and another +expression counts those jumps. The true-or-false-test of the `while' +loop should test true so long as point should jump forward, and false +when point is at the end of the region. + +We could use `(forward-word 1)' as the expression for moving point +forward word by word, but it is easier to see what Emacs identifies as a +`word' if we use a regular expression search. + +A regular expression search that finds the pattern for which it is +searching leaves point after the last character matched. This means +that a succession of successful word searches will move point forward +word by word. + +As a practical matter, we want the regular expression search to jump +over whitespace and punctuation between words as well as over the words +themselves. A regexp that refuses to jump over interword whitespace +would never jump more than one word! This means that the regexp should +include the whitespace and punctuation that follows a word, if any, as +well as the word itself. (A word may end a buffer and not have any +following whitespace or punctuation, so that part of the regexp must be +optional.) + +Thus, what we want for the regexp is a pattern defining one or more +word constituent characters followed, optionally, by one or more +characters that are not word constituents. The regular expression for +this is: + + \w+\W* + +The buffer's syntax table determines which characters are and are not +word constituents. (*Note What Constitutes a Word or Symbol?: Syntax, +for more about syntax. Also, see *Note Syntax: (emacs)Syntax, and +*Note Syntax Tables: (elisp)Syntax Tables.) + +The search expression looks like this: + + (re-search-forward "\\w+\\W*") + +(Note that paired backslashes precede the `w' and `W'. A single +backslash has special meaning to the Emacs Lisp interpreter. It +indicates that the following character is interpreted differently than +usual. For example, the two characters, `\n', stand for `newline', +rather than for a backslash followed by `n'. Two backslashes in a row +stand for an ordinary, `unspecial' backslash, which in this case is +followed by a letter, the combination of which is important to +`re-search-forward'.) + +We need a counter to count how many words there are; this variable must +first be set to 0 and then incremented each time Emacs goes around the +`while' loop. The incrementing expression is simply: + + (setq count (1+ count)) + +Finally, we want to tell the user how many words there are in the +region. The `message' function is intended for presenting this kind of +information to the user. The message has to be phrased so that it +reads properly regardless of how many words there are in the region: we +don't want to say that "there are 1 words in the region". The conflict +between singular and plural is ungrammatical. We can solve this +problem by using a conditional expression that evaluates different +messages depending on the number of words in the region. There are +three possibilities: no words in the region, one word in the region, +and more than one word. This means that the `cond' special form is +appropriate. + +All this leads to the following function definition: + + ;;; First version; has bugs! + (defun count-words-region (beginning end) + "Print number of words in the region. + Words are defined as at least one word-constituent + character followed by at least one character that + is not a word-constituent. The buffer's syntax + table determines which characters these are." + (interactive "r") + (message "Counting words in region ... ") + + ;;; 1. Set up appropriate conditions. + (save-excursion + (goto-char beginning) + (let ((count 0)) + + ;;; 2. Run the while loop. + (while (< (point) end) + (re-search-forward "\\w+\\W*") + (setq count (1+ count))) + + ;;; 3. Send a message to the user. + (cond ((zerop count) + (message + "The region does NOT have any words.")) + ((= 1 count) + (message + "The region has 1 word.")) + (t + (message + "The region has %d words." count)))))) + +As written, the function works, but not in all circumstances. + + +File: eintr, Node: Whitespace Bug, Prev: Design count-words-region, Up: count-words-region + +13.1.1 The Whitespace Bug in `count-words-region' +------------------------------------------------- + +The `count-words-region' command described in the preceding section has +two bugs, or rather, one bug with two manifestations. First, if you +mark a region containing only whitespace in the middle of some text, +the `count-words-region' command tells you that the region contains one +word! Second, if you mark a region containing only whitespace at the +end of the buffer or the accessible portion of a narrowed buffer, the +command displays an error message that looks like this: + + Search failed: "\\w+\\W*" + +If you are reading this in Info in GNU Emacs, you can test for these +bugs yourself. + +First, evaluate the function in the usual manner to install it. Here +is a copy of the definition. Place your cursor after the closing +parenthesis and type `C-x C-e' to install it. + + ;; First version; has bugs! + (defun count-words-region (beginning end) + "Print number of words in the region. + Words are defined as at least one word-constituent character followed + by at least one character that is not a word-constituent. The buffer's + syntax table determines which characters these are." + (interactive "r") + (message "Counting words in region ... ") + + ;;; 1. Set up appropriate conditions. + (save-excursion + (goto-char beginning) + (let ((count 0)) + + ;;; 2. Run the while loop. + (while (< (point) end) + (re-search-forward "\\w+\\W*") + (setq count (1+ count))) + + ;;; 3. Send a message to the user. + (cond ((zerop count) + (message "The region does NOT have any words.")) + ((= 1 count) (message "The region has 1 word.")) + (t (message "The region has %d words." count)))))) + +If you wish, you can also install this keybinding by evaluating it: + + (global-set-key "\C-c=" 'count-words-region) + +To conduct the first test, set mark and point to the beginning and end +of the following line and then type `C-c =' (or `M-x +count-words-region' if you have not bound `C-c ='): + + one two three + +Emacs will tell you, correctly, that the region has three words. + +Repeat the test, but place mark at the beginning of the line and place +point just _before_ the word `one'. Again type the command `C-c =' (or +`M-x count-words-region'). Emacs should tell you that the region has +no words, since it is composed only of the whitespace at the beginning +of the line. But instead Emacs tells you that the region has one word! + +For the third test, copy the sample line to the end of the `*scratch*' +buffer and then type several spaces at the end of the line. Place mark +right after the word `three' and point at the end of line. (The end of +the line will be the end of the buffer.) Type `C-c =' (or `M-x +count-words-region') as you did before. Again, Emacs should tell you +that the region has no words, since it is composed only of the +whitespace at the end of the line. Instead, Emacs displays an error +message saying `Search failed'. + +The two bugs stem from the same problem. + +Consider the first manifestation of the bug, in which the command tells +you that the whitespace at the beginning of the line contains one word. +What happens is this: The `M-x count-words-region' command moves point +to the beginning of the region. The `while' tests whether the value of +point is smaller than the value of `end', which it is. Consequently, +the regular expression search looks for and finds the first word. It +leaves point after the word. `count' is set to one. The `while' loop +repeats; but this time the value of point is larger than the value of +`end', the loop is exited; and the function displays a message saying +the number of words in the region is one. In brief, the regular +expression search looks for and finds the word even though it is outside +the marked region. + +In the second manifestation of the bug, the region is whitespace at the +end of the buffer. Emacs says `Search failed'. What happens is that +the true-or-false-test in the `while' loop tests true, so the search +expression is executed. But since there are no more words in the +buffer, the search fails. + +In both manifestations of the bug, the search extends or attempts to +extend outside of the region. + +The solution is to limit the search to the region--this is a fairly +simple action, but as you may have come to expect, it is not quite as +simple as you might think. + +As we have seen, the `re-search-forward' function takes a search +pattern as its first argument. But in addition to this first, +mandatory argument, it accepts three optional arguments. The optional +second argument bounds the search. The optional third argument, if +`t', causes the function to return `nil' rather than signal an error if +the search fails. The optional fourth argument is a repeat count. (In +Emacs, you can see a function's documentation by typing `C-h f', the +name of the function, and then <RET>.) + +In the `count-words-region' definition, the value of the end of the +region is held by the variable `end' which is passed as an argument to +the function. Thus, we can add `end' as an argument to the regular +expression search expression: + + (re-search-forward "\\w+\\W*" end) + +However, if you make only this change to the `count-words-region' +definition and then test the new version of the definition on a stretch +of whitespace, you will receive an error message saying `Search failed'. + +What happens is this: the search is limited to the region, and fails as +you expect because there are no word-constituent characters in the +region. Since it fails, we receive an error message. But we do not +want to receive an error message in this case; we want to receive the +message that "The region does NOT have any words." + +The solution to this problem is to provide `re-search-forward' with a +third argument of `t', which causes the function to return `nil' rather +than signal an error if the search fails. + +However, if you make this change and try it, you will see the message +"Counting words in region ... " and ... you will keep on seeing that +message ..., until you type `C-g' (`keyboard-quit'). + +Here is what happens: the search is limited to the region, as before, +and it fails because there are no word-constituent characters in the +region, as expected. Consequently, the `re-search-forward' expression +returns `nil'. It does nothing else. In particular, it does not move +point, which it does as a side effect if it finds the search target. +After the `re-search-forward' expression returns `nil', the next +expression in the `while' loop is evaluated. This expression +increments the count. Then the loop repeats. The true-or-false-test +tests true because the value of point is still less than the value of +end, since the `re-search-forward' expression did not move point. ... +and the cycle repeats ... + +The `count-words-region' definition requires yet another modification, +to cause the true-or-false-test of the `while' loop to test false if +the search fails. Put another way, there are two conditions that must +be satisfied in the true-or-false-test before the word count variable +is incremented: point must still be within the region and the search +expression must have found a word to count. + +Since both the first condition and the second condition must be true +together, the two expressions, the region test and the search +expression, can be joined with an `and' special form and embedded in +the `while' loop as the true-or-false-test, like this: + + (and (< (point) end) (re-search-forward "\\w+\\W*" end t)) + +(*Note The `kill-new' function: kill-new function, for information +about `and'.) + +The `re-search-forward' expression returns `t' if the search succeeds +and as a side effect moves point. Consequently, as words are found, +point is moved through the region. When the search expression fails to +find another word, or when point reaches the end of the region, the +true-or-false-test tests false, the `while' loop exits, and the +`count-words-region' function displays one or other of its messages. + +After incorporating these final changes, the `count-words-region' works +without bugs (or at least, without bugs that I have found!). Here is +what it looks like: + + ;;; Final version: `while' + (defun count-words-region (beginning end) + "Print number of words in the region." + (interactive "r") + (message "Counting words in region ... ") + + ;;; 1. Set up appropriate conditions. + (save-excursion + (let ((count 0)) + (goto-char beginning) + + ;;; 2. Run the while loop. + (while (and (< (point) end) + (re-search-forward "\\w+\\W*" end t)) + (setq count (1+ count))) + + ;;; 3. Send a message to the user. + (cond ((zerop count) + (message + "The region does NOT have any words.")) + ((= 1 count) + (message + "The region has 1 word.")) + (t + (message + "The region has %d words." count)))))) + + +File: eintr, Node: recursive-count-words, Next: Counting Exercise, Prev: count-words-region, Up: Counting Words + +13.2 Count Words Recursively +============================ + +You can write the function for counting words recursively as well as +with a `while' loop. Let's see how this is done. + +First, we need to recognize that the `count-words-region' function has +three jobs: it sets up the appropriate conditions for counting to +occur; it counts the words in the region; and it sends a message to the +user telling how many words there are. + +If we write a single recursive function to do everything, we will +receive a message for every recursive call. If the region contains 13 +words, we will receive thirteen messages, one right after the other. +We don't want this! Instead, we must write two functions to do the +job, one of which (the recursive function) will be used inside of the +other. One function will set up the conditions and display the +message; the other will return the word count. + +Let us start with the function that causes the message to be displayed. +We can continue to call this `count-words-region'. + +This is the function that the user will call. It will be interactive. +Indeed, it will be similar to our previous versions of this function, +except that it will call `recursive-count-words' to determine how many +words are in the region. + +We can readily construct a template for this function, based on our +previous versions: + + ;; Recursive version; uses regular expression search + (defun count-words-region (beginning end) + "DOCUMENTATION..." + (INTERACTIVE-EXPRESSION...) + + ;;; 1. Set up appropriate conditions. + (EXPLANATORY MESSAGE) + (SET-UP FUNCTIONS... + + ;;; 2. Count the words. + RECURSIVE CALL + + ;;; 3. Send a message to the user. + MESSAGE PROVIDING WORD COUNT)) + +The definition looks straightforward, except that somehow the count +returned by the recursive call must be passed to the message displaying +the word count. A little thought suggests that this can be done by +making use of a `let' expression: we can bind a variable in the varlist +of a `let' expression to the number of words in the region, as returned +by the recursive call; and then the `cond' expression, using binding, +can display the value to the user. + +Often, one thinks of the binding within a `let' expression as somehow +secondary to the `primary' work of a function. But in this case, what +you might consider the `primary' job of the function, counting words, +is done within the `let' expression. + +Using `let', the function definition looks like this: + + (defun count-words-region (beginning end) + "Print number of words in the region." + (interactive "r") + + ;;; 1. Set up appropriate conditions. + (message "Counting words in region ... ") + (save-excursion + (goto-char beginning) + + ;;; 2. Count the words. + (let ((count (recursive-count-words end))) + + ;;; 3. Send a message to the user. + (cond ((zerop count) + (message + "The region does NOT have any words.")) + ((= 1 count) + (message + "The region has 1 word.")) + (t + (message + "The region has %d words." count)))))) + +Next, we need to write the recursive counting function. + +A recursive function has at least three parts: the `do-again-test', the +`next-step-expression', and the recursive call. + +The do-again-test determines whether the function will or will not be +called again. Since we are counting words in a region and can use a +function that moves point forward for every word, the do-again-test can +check whether point is still within the region. The do-again-test +should find the value of point and determine whether point is before, +at, or after the value of the end of the region. We can use the +`point' function to locate point. Clearly, we must pass the value of +the end of the region to the recursive counting function as an argument. + +In addition, the do-again-test should also test whether the search +finds a word. If it does not, the function should not call itself +again. + +The next-step-expression changes a value so that when the recursive +function is supposed to stop calling itself, it stops. More precisely, +the next-step-expression changes a value so that at the right time, the +do-again-test stops the recursive function from calling itself again. +In this case, the next-step-expression can be the expression that moves +point forward, word by word. + +The third part of a recursive function is the recursive call. + +Somewhere, also, we also need a part that does the `work' of the +function, a part that does the counting. A vital part! + +But already, we have an outline of the recursive counting function: + + (defun recursive-count-words (region-end) + "DOCUMENTATION..." + DO-AGAIN-TEST + NEXT-STEP-EXPRESSION + RECURSIVE CALL) + +Now we need to fill in the slots. Let's start with the simplest cases +first: if point is at or beyond the end of the region, there cannot be +any words in the region, so the function should return zero. Likewise, +if the search fails, there are no words to count, so the function +should return zero. + +On the other hand, if point is within the region and the search +succeeds, the function should call itself again. + +Thus, the do-again-test should look like this: + + (and (< (point) region-end) + (re-search-forward "\\w+\\W*" region-end t)) + +Note that the search expression is part of the do-again-test--the +function returns `t' if its search succeeds and `nil' if it fails. +(*Note The Whitespace Bug in `count-words-region': Whitespace Bug, for +an explanation of how `re-search-forward' works.) + +The do-again-test is the true-or-false test of an `if' clause. +Clearly, if the do-again-test succeeds, the then-part of the `if' +clause should call the function again; but if it fails, the else-part +should return zero since either point is outside the region or the +search failed because there were no words to find. + +But before considering the recursive call, we need to consider the +next-step-expression. What is it? Interestingly, it is the search +part of the do-again-test. + +In addition to returning `t' or `nil' for the do-again-test, +`re-search-forward' moves point forward as a side effect of a +successful search. This is the action that changes the value of point +so that the recursive function stops calling itself when point +completes its movement through the region. Consequently, the +`re-search-forward' expression is the next-step-expression. + +In outline, then, the body of the `recursive-count-words' function +looks like this: + + (if DO-AGAIN-TEST-AND-NEXT-STEP-COMBINED + ;; then + RECURSIVE-CALL-RETURNING-COUNT + ;; else + RETURN-ZERO) + +How to incorporate the mechanism that counts? + +If you are not used to writing recursive functions, a question like +this can be troublesome. But it can and should be approached +systematically. + +We know that the counting mechanism should be associated in some way +with the recursive call. Indeed, since the next-step-expression moves +point forward by one word, and since a recursive call is made for each +word, the counting mechanism must be an expression that adds one to the +value returned by a call to `recursive-count-words'. + +Consider several cases: + + * If there are two words in the region, the function should return a + value resulting from adding one to the value returned when it + counts the first word, plus the number returned when it counts the + remaining words in the region, which in this case is one. + + * If there is one word in the region, the function should return a + value resulting from adding one to the value returned when it + counts that word, plus the number returned when it counts the + remaining words in the region, which in this case is zero. + + * If there are no words in the region, the function should return + zero. + +From the sketch we can see that the else-part of the `if' returns zero +for the case of no words. This means that the then-part of the `if' +must return a value resulting from adding one to the value returned +from a count of the remaining words. + +The expression will look like this, where `1+' is a function that adds +one to its argument. + + (1+ (recursive-count-words region-end)) + +The whole `recursive-count-words' function will then look like this: + + (defun recursive-count-words (region-end) + "DOCUMENTATION..." + + ;;; 1. do-again-test + (if (and (< (point) region-end) + (re-search-forward "\\w+\\W*" region-end t)) + + ;;; 2. then-part: the recursive call + (1+ (recursive-count-words region-end)) + + ;;; 3. else-part + 0)) + +Let's examine how this works: + +If there are no words in the region, the else part of the `if' +expression is evaluated and consequently the function returns zero. + +If there is one word in the region, the value of point is less than the +value of `region-end' and the search succeeds. In this case, the +true-or-false-test of the `if' expression tests true, and the then-part +of the `if' expression is evaluated. The counting expression is +evaluated. This expression returns a value (which will be the value +returned by the whole function) that is the sum of one added to the +value returned by a recursive call. + +Meanwhile, the next-step-expression has caused point to jump over the +first (and in this case only) word in the region. This means that when +`(recursive-count-words region-end)' is evaluated a second time, as a +result of the recursive call, the value of point will be equal to or +greater than the value of region end. So this time, +`recursive-count-words' will return zero. The zero will be added to +one, and the original evaluation of `recursive-count-words' will return +one plus zero, which is one, which is the correct amount. + +Clearly, if there are two words in the region, the first call to +`recursive-count-words' returns one added to the value returned by +calling `recursive-count-words' on a region containing the remaining +word--that is, it adds one to one, producing two, which is the correct +amount. + +Similarly, if there are three words in the region, the first call to +`recursive-count-words' returns one added to the value returned by +calling `recursive-count-words' on a region containing the remaining +two words--and so on and so on. + +With full documentation the two functions look like this: + +The recursive function: + + (defun recursive-count-words (region-end) + "Number of words between point and REGION-END." + + ;;; 1. do-again-test + (if (and (< (point) region-end) + (re-search-forward "\\w+\\W*" region-end t)) + + ;;; 2. then-part: the recursive call + (1+ (recursive-count-words region-end)) + + ;;; 3. else-part + 0)) + +The wrapper: + + ;;; Recursive version + (defun count-words-region (beginning end) + "Print number of words in the region. + + Words are defined as at least one word-constituent + character followed by at least one character that is + not a word-constituent. The buffer's syntax table + determines which characters these are." + (interactive "r") + (message "Counting words in region ... ") + (save-excursion + (goto-char beginning) + (let ((count (recursive-count-words end))) + (cond ((zerop count) + (message + "The region does NOT have any words.")) + ((= 1 count) + (message "The region has 1 word.")) + (t + (message + "The region has %d words." count)))))) + + +File: eintr, Node: Counting Exercise, Prev: recursive-count-words, Up: Counting Words + +13.3 Exercise: Counting Punctuation +=================================== + +Using a `while' loop, write a function to count the number of +punctuation marks in a region--period, comma, semicolon, colon, +exclamation mark, and question mark. Do the same using recursion. + + +File: eintr, Node: Words in a defun, Next: Readying a Graph, Prev: Counting Words, Up: Top + +14 Counting Words in a `defun' +****************************** + +Our next project is to count the number of words in a function +definition. Clearly, this can be done using some variant of +`count-word-region'. *Note Counting Words: Repetition and Regexps: +Counting Words. If we are just going to count the words in one +definition, it is easy enough to mark the definition with the `C-M-h' +(`mark-defun') command, and then call `count-word-region'. + +However, I am more ambitious: I want to count the words and symbols in +every definition in the Emacs sources and then print a graph that shows +how many functions there are of each length: how many contain 40 to 49 +words or symbols, how many contain 50 to 59 words or symbols, and so +on. I have often been curious how long a typical function is, and this +will tell. + +* Menu: + +* Divide and Conquer:: +* Words and Symbols:: +* Syntax:: +* count-words-in-defun:: +* Several defuns:: +* Find a File:: +* lengths-list-file:: +* Several files:: +* Several files recursively:: +* Prepare the data:: + + +File: eintr, Node: Divide and Conquer, Next: Words and Symbols, Prev: Words in a defun, Up: Words in a defun + +Divide and Conquer +================== + +Described in one phrase, the histogram project is daunting; but divided +into numerous small steps, each of which we can take one at a time, the +project becomes less fearsome. Let us consider what the steps must be: + + * First, write a function to count the words in one definition. This + includes the problem of handling symbols as well as words. + + * Second, write a function to list the numbers of words in each + function in a file. This function can use the + `count-words-in-defun' function. + + * Third, write a function to list the numbers of words in each + function in each of several files. This entails automatically + finding the various files, switching to them, and counting the + words in the definitions within them. + + * Fourth, write a function to convert the list of numbers that we + created in step three to a form that will be suitable for printing + as a graph. + + * Fifth, write a function to print the results as a graph. + +This is quite a project! But if we take each step slowly, it will not +be difficult. + + +File: eintr, Node: Words and Symbols, Next: Syntax, Prev: Divide and Conquer, Up: Words in a defun + +14.1 What to Count? +=================== + +When we first start thinking about how to count the words in a function +definition, the first question is (or ought to be) what are we going to +count? When we speak of `words' with respect to a Lisp function +definition, we are actually speaking, in large part, of `symbols'. For +example, the following `multiply-by-seven' function contains the five +symbols `defun', `multiply-by-seven', `number', `*', and `7'. In +addition, in the documentation string, it contains the four words +`Multiply', `NUMBER', `by', and `seven'. The symbol `number' is +repeated, so the definition contains a total of ten words and symbols. + + (defun multiply-by-seven (number) + "Multiply NUMBER by seven." + (* 7 number)) + +However, if we mark the `multiply-by-seven' definition with `C-M-h' +(`mark-defun'), and then call `count-words-region' on it, we will find +that `count-words-region' claims the definition has eleven words, not +ten! Something is wrong! + +The problem is twofold: `count-words-region' does not count the `*' as +a word, and it counts the single symbol, `multiply-by-seven', as +containing three words. The hyphens are treated as if they were +interword spaces rather than intraword connectors: `multiply-by-seven' +is counted as if it were written `multiply by seven'. + +The cause of this confusion is the regular expression search within the +`count-words-region' definition that moves point forward word by word. +In the canonical version of `count-words-region', the regexp is: + + "\\w+\\W*" + +This regular expression is a pattern defining one or more word +constituent characters possibly followed by one or more characters that +are not word constituents. What is meant by `word constituent +characters' brings us to the issue of syntax, which is worth a section +of its own. + + +File: eintr, Node: Syntax, Next: count-words-in-defun, Prev: Words and Symbols, Up: Words in a defun + +14.2 What Constitutes a Word or Symbol? +======================================= + +Emacs treats different characters as belonging to different "syntax +categories". For example, the regular expression, `\\w+', is a pattern +specifying one or more _word constituent_ characters. Word constituent +characters are members of one syntax category. Other syntax categories +include the class of punctuation characters, such as the period and the +comma, and the class of whitespace characters, such as the blank space +and the tab character. (For more information, see *Note Syntax: +(emacs)Syntax, and *Note Syntax Tables: (elisp)Syntax Tables.) + +Syntax tables specify which characters belong to which categories. +Usually, a hyphen is not specified as a `word constituent character'. +Instead, it is specified as being in the `class of characters that are +part of symbol names but not words.' This means that the +`count-words-region' function treats it in the same way it treats an +interword white space, which is why `count-words-region' counts +`multiply-by-seven' as three words. + +There are two ways to cause Emacs to count `multiply-by-seven' as one +symbol: modify the syntax table or modify the regular expression. + +We could redefine a hyphen as a word constituent character by modifying +the syntax table that Emacs keeps for each mode. This action would +serve our purpose, except that a hyphen is merely the most common +character within symbols that is not typically a word constituent +character; there are others, too. + +Alternatively, we can redefine the regular expression used in the +`count-words' definition so as to include symbols. This procedure has +the merit of clarity, but the task is a little tricky. + +The first part is simple enough: the pattern must match "at least one +character that is a word or symbol constituent". Thus: + + "\\(\\w\\|\\s_\\)+" + +The `\\(' is the first part of the grouping construct that includes the +`\\w' and the `\\s_' as alternatives, separated by the `\\|'. The +`\\w' matches any word-constituent character and the `\\s_' matches any +character that is part of a symbol name but not a word-constituent +character. The `+' following the group indicates that the word or +symbol constituent characters must be matched at least once. + +However, the second part of the regexp is more difficult to design. +What we want is to follow the first part with "optionally one or more +characters that are not constituents of a word or symbol". At first, I +thought I could define this with the following: + + "\\(\\W\\|\\S_\\)*" + +The upper case `W' and `S' match characters that are _not_ word or +symbol constituents. Unfortunately, this expression matches any +character that is either not a word constituent or not a symbol +constituent. This matches any character! + +I then noticed that every word or symbol in my test region was followed +by white space (blank space, tab, or newline). So I tried placing a +pattern to match one or more blank spaces after the pattern for one or +more word or symbol constituents. This failed, too. Words and symbols +are often separated by whitespace, but in actual code parentheses may +follow symbols and punctuation may follow words. So finally, I +designed a pattern in which the word or symbol constituents are +followed optionally by characters that are not white space and then +followed optionally by white space. + +Here is the full regular expression: + + "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" + + +File: eintr, Node: count-words-in-defun, Next: Several defuns, Prev: Syntax, Up: Words in a defun + +14.3 The `count-words-in-defun' Function +======================================== + +We have seen that there are several ways to write a `count-word-region' +function. To write a `count-words-in-defun', we need merely adapt one +of these versions. + +The version that uses a `while' loop is easy to understand, so I am +going to adapt that. Because `count-words-in-defun' will be part of a +more complex program, it need not be interactive and it need not +display a message but just return the count. These considerations +simplify the definition a little. + +On the other hand, `count-words-in-defun' will be used within a buffer +that contains function definitions. Consequently, it is reasonable to +ask that the function determine whether it is called when point is +within a function definition, and if it is, to return the count for +that definition. This adds complexity to the definition, but saves us +from needing to pass arguments to the function. + +These considerations lead us to prepare the following template: + + (defun count-words-in-defun () + "DOCUMENTATION..." + (SET UP... + (WHILE LOOP...) + RETURN COUNT) + +As usual, our job is to fill in the slots. + +First, the set up. + +We are presuming that this function will be called within a buffer +containing function definitions. Point will either be within a +function definition or not. For `count-words-in-defun' to work, point +must move to the beginning of the definition, a counter must start at +zero, and the counting loop must stop when point reaches the end of the +definition. + +The `beginning-of-defun' function searches backwards for an opening +delimiter such as a `(' at the beginning of a line, and moves point to +that position, or else to the limit of the search. In practice, this +means that `beginning-of-defun' moves point to the beginning of an +enclosing or preceding function definition, or else to the beginning of +the buffer. We can use `beginning-of-defun' to place point where we +wish to start. + +The `while' loop requires a counter to keep track of the words or +symbols being counted. A `let' expression can be used to create a +local variable for this purpose, and bind it to an initial value of +zero. + +The `end-of-defun' function works like `beginning-of-defun' except that +it moves point to the end of the definition. `end-of-defun' can be +used as part of an expression that determines the position of the end +of the definition. + +The set up for `count-words-in-defun' takes shape rapidly: first we +move point to the beginning of the definition, then we create a local +variable to hold the count, and finally, we record the position of the +end of the definition so the `while' loop will know when to stop +looping. + +The code looks like this: + + (beginning-of-defun) + (let ((count 0) + (end (save-excursion (end-of-defun) (point)))) + +The code is simple. The only slight complication is likely to concern +`end': it is bound to the position of the end of the definition by a +`save-excursion' expression that returns the value of point after +`end-of-defun' temporarily moves it to the end of the definition. + +The second part of the `count-words-in-defun', after the set up, is the +`while' loop. + +The loop must contain an expression that jumps point forward word by +word and symbol by symbol, and another expression that counts the +jumps. The true-or-false-test for the `while' loop should test true so +long as point should jump forward, and false when point is at the end +of the definition. We have already redefined the regular expression +for this (*note Syntax::), so the loop is straightforward: + + (while (and (< (point) end) + (re-search-forward + "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t) + (setq count (1+ count))) + +The third part of the function definition returns the count of words +and symbols. This part is the last expression within the body of the +`let' expression, and can be, very simply, the local variable `count', +which when evaluated returns the count. + +Put together, the `count-words-in-defun' definition looks like this: + + (defun count-words-in-defun () + "Return the number of words and symbols in a defun." + (beginning-of-defun) + (let ((count 0) + (end (save-excursion (end-of-defun) (point)))) + (while + (and (< (point) end) + (re-search-forward + "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" + end t)) + (setq count (1+ count))) + count)) + +How to test this? The function is not interactive, but it is easy to +put a wrapper around the function to make it interactive; we can use +almost the same code as for the recursive version of +`count-words-region': + + ;;; Interactive version. + (defun count-words-defun () + "Number of words and symbols in a function definition." + (interactive) + (message + "Counting words and symbols in function definition ... ") + (let ((count (count-words-in-defun))) + (cond + ((zerop count) + (message + "The definition does NOT have any words or symbols.")) + ((= 1 count) + (message + "The definition has 1 word or symbol.")) + (t + (message + "The definition has %d words or symbols." count))))) + +Let's re-use `C-c =' as a convenient keybinding: + + (global-set-key "\C-c=" 'count-words-defun) + +Now we can try out `count-words-defun': install both +`count-words-in-defun' and `count-words-defun', and set the keybinding, +and then place the cursor within the following definition: + + (defun multiply-by-seven (number) + "Multiply NUMBER by seven." + (* 7 number)) + => 10 + +Success! The definition has 10 words and symbols. + +The next problem is to count the numbers of words and symbols in +several definitions within a single file. + + +File: eintr, Node: Several defuns, Next: Find a File, Prev: count-words-in-defun, Up: Words in a defun + +14.4 Count Several `defuns' Within a File +========================================= + +A file such as `simple.el' may have a hundred or more function +definitions within it. Our long term goal is to collect statistics on +many files, but as a first step, our immediate goal is to collect +statistics on one file. + +The information will be a series of numbers, each number being the +length of a function definition. We can store the numbers in a list. + +We know that we will want to incorporate the information regarding one +file with information about many other files; this means that the +function for counting definition lengths within one file need only +return the list of lengths. It need not and should not display any +messages. + +The word count commands contain one expression to jump point forward +word by word and another expression to count the jumps. The function +to return the lengths of definitions can be designed to work the same +way, with one expression to jump point forward definition by definition +and another expression to construct the lengths' list. + +This statement of the problem makes it elementary to write the function +definition. Clearly, we will start the count at the beginning of the +file, so the first command will be `(goto-char (point-min))'. Next, we +start the `while' loop; and the true-or-false test of the loop can be a +regular expression search for the next function definition--so long as +the search succeeds, point is moved forward and then the body of the +loop is evaluated. The body needs an expression that constructs the +lengths' list. `cons', the list construction command, can be used to +create the list. That is almost all there is to it. + +Here is what this fragment of code looks like: + + (goto-char (point-min)) + (while (re-search-forward "^(defun" nil t) + (setq lengths-list + (cons (count-words-in-defun) lengths-list))) + +What we have left out is the mechanism for finding the file that +contains the function definitions. + +In previous examples, we either used this, the Info file, or we +switched back and forth to some other buffer, such as the `*scratch*' +buffer. + +Finding a file is a new process that we have not yet discussed. + + +File: eintr, Node: Find a File, Next: lengths-list-file, Prev: Several defuns, Up: Words in a defun + +14.5 Find a File +================ + +To find a file in Emacs, you use the `C-x C-f' (`find-file') command. +This command is almost, but not quite right for the lengths problem. + +Let's look at the source for `find-file': + + (defun find-file (filename) + "Edit file FILENAME. + Switch to a buffer visiting file FILENAME, + creating one if none already exists." + (interactive "FFind file: ") + (switch-to-buffer (find-file-noselect filename))) + +(The most recent version of the `find-file' function definition permits +you to specify optional wildcards visit multiple files; that makes the +definition more complex and we will not discuss it here, since it is +not relevant. You can see its source using either `M-.' (`find-tag') +or `C-h f' (`describe-function').) + +The definition I am showing possesses short but complete documentation +and an interactive specification that prompts you for a file name when +you use the command interactively. The body of the definition contains +two functions, `find-file-noselect' and `switch-to-buffer'. + +According to its documentation as shown by `C-h f' (the +`describe-function' command), the `find-file-noselect' function reads +the named file into a buffer and returns the buffer. (Its most recent +version includes an optional wildcards argument, too, as well as +another to read a file literally and an other you suppress warning +messages. These optional arguments are irrelevant.) + +However, the `find-file-noselect' function does not select the buffer +in which it puts the file. Emacs does not switch its attention (or +yours if you are using `find-file-noselect') to the named buffer. That +is what `switch-to-buffer' does: it switches the buffer to which Emacs +attention is directed; and it switches the buffer displayed in the +window to the new buffer. We have discussed buffer switching +elsewhere. (*Note Switching Buffers::.) + +In this histogram project, we do not need to display each file on the +screen as the program determines the length of each definition within +it. Instead of employing `switch-to-buffer', we can work with +`set-buffer', which redirects the attention of the computer program to +a different buffer but does not redisplay it on the screen. So instead +of calling on `find-file' to do the job, we must write our own +expression. + +The task is easy: use `find-file-noselect' and `set-buffer'. + + +File: eintr, Node: lengths-list-file, Next: Several files, Prev: Find a File, Up: Words in a defun + +14.6 `lengths-list-file' in Detail +================================== + +The core of the `lengths-list-file' function is a `while' loop +containing a function to move point forward `defun by defun' and a +function to count the number of words and symbols in each defun. This +core must be surrounded by functions that do various other tasks, +including finding the file, and ensuring that point starts out at the +beginning of the file. The function definition looks like this: + + (defun lengths-list-file (filename) + "Return list of definitions' lengths within FILE. + The returned list is a list of numbers. + Each number is the number of words or + symbols in one function definition." + (message "Working on `%s' ... " filename) + (save-excursion + (let ((buffer (find-file-noselect filename)) + (lengths-list)) + (set-buffer buffer) + (setq buffer-read-only t) + (widen) + (goto-char (point-min)) + (while (re-search-forward "^(defun" nil t) + (setq lengths-list + (cons (count-words-in-defun) lengths-list))) + (kill-buffer buffer) + lengths-list))) + +The function is passed one argument, the name of the file on which it +will work. It has four lines of documentation, but no interactive +specification. Since people worry that a computer is broken if they +don't see anything going on, the first line of the body is a message. + +The next line contains a `save-excursion' that returns Emacs' attention +to the current buffer when the function completes. This is useful in +case you embed this function in another function that presumes point is +restored to the original buffer. + +In the varlist of the `let' expression, Emacs finds the file and binds +the local variable `buffer' to the buffer containing the file. At the +same time, Emacs creates `lengths-list' as a local variable. + +Next, Emacs switches its attention to the buffer. + +In the following line, Emacs makes the buffer read-only. Ideally, this +line is not necessary. None of the functions for counting words and +symbols in a function definition should change the buffer. Besides, +the buffer is not going to be saved, even if it were changed. This +line is entirely the consequence of great, perhaps excessive, caution. +The reason for the caution is that this function and those it calls +work on the sources for Emacs and it is very inconvenient if they are +inadvertently modified. It goes without saying that I did not realize +a need for this line until an experiment went awry and started to +modify my Emacs source files ... + +Next comes a call to widen the buffer if it is narrowed. This function +is usually not needed--Emacs creates a fresh buffer if none already +exists; but if a buffer visiting the file already exists Emacs returns +that one. In this case, the buffer may be narrowed and must be +widened. If we wanted to be fully `user-friendly', we would arrange to +save the restriction and the location of point, but we won't. + +The `(goto-char (point-min))' expression moves point to the beginning +of the buffer. + +Then comes a `while' loop in which the `work' of the function is +carried out. In the loop, Emacs determines the length of each +definition and constructs a lengths' list containing the information. + +Emacs kills the buffer after working through it. This is to save space +inside of Emacs. My version of GNU Emacs 19 contained over 300 source +files of interest; GNU Emacs 22 contains over a thousand source files. +Another function will apply `lengths-list-file' to each of the files. + +Finally, the last expression within the `let' expression is the +`lengths-list' variable; its value is returned as the value of the +whole function. + +You can try this function by installing it in the usual fashion. Then +place your cursor after the following expression and type `C-x C-e' +(`eval-last-sexp'). + + (lengths-list-file + "/usr/local/share/emacs/22.0.100/lisp/emacs-lisp/debug.el") + +(You may need to change the pathname of the file; the one here is for +GNU Emacs version 22.0.100. To change the expression, copy it to the +`*scratch*' buffer and edit it. + +(Also, to see the full length of the list, rather than a truncated +version, you may have to evaluate the following: + + (custom-set-variables '(eval-expression-print-length nil)) + +(*Note Specifying Variables using `defcustom': defcustom.) Then +evaluate the `lengths-list-file' expression.) + +The lengths' list for `debug.el' takes less than a second to produce +and looks like this in GNU Emacs 22: + + (83 113 105 144 289 22 30 97 48 89 25 52 52 88 28 29 77 49 43 290 232 587) + +(Using my old machine, the version 19 lengths' list for `debug.el' took +seven seconds to produce and looked like this: + + (75 41 80 62 20 45 44 68 45 12 34 235) + +(The newer version of `debug.el' contains more defuns than the earlier +one; and my new machine is much faster than the old one.) + +Note that the length of the last definition in the file is first in the +list. + + +File: eintr, Node: Several files, Next: Several files recursively, Prev: lengths-list-file, Up: Words in a defun + +14.7 Count Words in `defuns' in Different Files +=============================================== + +In the previous section, we created a function that returns a list of +the lengths of each definition in a file. Now, we want to define a +function to return a master list of the lengths of the definitions in a +list of files. + +Working on each of a list of files is a repetitious act, so we can use +either a `while' loop or recursion. + +* Menu: + +* lengths-list-many-files:: +* append:: + + +File: eintr, Node: lengths-list-many-files, Next: append, Prev: Several files, Up: Several files + +Determine the lengths of `defuns' +--------------------------------- + +The design using a `while' loop is routine. The argument passed the +function is a list of files. As we saw earlier (*note Loop Example::), +you can write a `while' loop so that the body of the loop is evaluated +if such a list contains elements, but to exit the loop if the list is +empty. For this design to work, the body of the loop must contain an +expression that shortens the list each time the body is evaluated, so +that eventually the list is empty. The usual technique is to set the +value of the list to the value of the CDR of the list each time the +body is evaluated. + +The template looks like this: + + (while TEST-WHETHER-LIST-IS-EMPTY + BODY... + SET-LIST-TO-CDR-OF-LIST) + +Also, we remember that a `while' loop returns `nil' (the result of +evaluating the true-or-false-test), not the result of any evaluation +within its body. (The evaluations within the body of the loop are done +for their side effects.) However, the expression that sets the +lengths' list is part of the body--and that is the value that we want +returned by the function as a whole. To do this, we enclose the +`while' loop within a `let' expression, and arrange that the last +element of the `let' expression contains the value of the lengths' +list. (*Note Loop Example with an Incrementing Counter: Incrementing +Example.) + +These considerations lead us directly to the function itself: + + ;;; Use `while' loop. + (defun lengths-list-many-files (list-of-files) + "Return list of lengths of defuns in LIST-OF-FILES." + (let (lengths-list) + + ;;; true-or-false-test + (while list-of-files + (setq lengths-list + (append + lengths-list + + ;;; Generate a lengths' list. + (lengths-list-file + (expand-file-name (car list-of-files))))) + + ;;; Make files' list shorter. + (setq list-of-files (cdr list-of-files))) + + ;;; Return final value of lengths' list. + lengths-list)) + +`expand-file-name' is a built-in function that converts a file name to +the absolute, long, path name form of the directory in which the +function is called. + +Thus, if `expand-file-name' is called on `debug.el' when Emacs is +visiting the `/usr/local/share/emacs/22.0.100/lisp/emacs-lisp/' +directory, + + debug.el + +becomes + + /usr/local/share/emacs/22.0.100/lisp/emacs-lisp/debug.el + +The only other new element of this function definition is the as yet +unstudied function `append', which merits a short section for itself. + + +File: eintr, Node: append, Prev: lengths-list-many-files, Up: Several files + +14.7.1 The `append' Function +---------------------------- + +The `append' function attaches one list to another. Thus, + + (append '(1 2 3 4) '(5 6 7 8)) + +produces the list + + (1 2 3 4 5 6 7 8) + +This is exactly how we want to attach two lengths' lists produced by +`lengths-list-file' to each other. The results contrast with `cons', + + (cons '(1 2 3 4) '(5 6 7 8)) + +which constructs a new list in which the first argument to `cons' +becomes the first element of the new list: + + ((1 2 3 4) 5 6 7 8) + + +File: eintr, Node: Several files recursively, Next: Prepare the data, Prev: Several files, Up: Words in a defun + +14.8 Recursively Count Words in Different Files +=============================================== + +Besides a `while' loop, you can work on each of a list of files with +recursion. A recursive version of `lengths-list-many-files' is short +and simple. + +The recursive function has the usual parts: the `do-again-test', the +`next-step-expression', and the recursive call. The `do-again-test' +determines whether the function should call itself again, which it will +do if the `list-of-files' contains any remaining elements; the +`next-step-expression' resets the `list-of-files' to the CDR of itself, +so eventually the list will be empty; and the recursive call calls +itself on the shorter list. The complete function is shorter than this +description! + + (defun recursive-lengths-list-many-files (list-of-files) + "Return list of lengths of each defun in LIST-OF-FILES." + (if list-of-files ; do-again-test + (append + (lengths-list-file + (expand-file-name (car list-of-files))) + (recursive-lengths-list-many-files + (cdr list-of-files))))) + +In a sentence, the function returns the lengths' list for the first of +the `list-of-files' appended to the result of calling itself on the +rest of the `list-of-files'. + +Here is a test of `recursive-lengths-list-many-files', along with the +results of running `lengths-list-file' on each of the files +individually. + +Install `recursive-lengths-list-many-files' and `lengths-list-file', if +necessary, and then evaluate the following expressions. You may need +to change the files' pathnames; those here work when this Info file and +the Emacs sources are located in their customary places. To change the +expressions, copy them to the `*scratch*' buffer, edit them, and then +evaluate them. + +The results are shown after the `=>'. (These results are for files +from Emacs Version 22.0.100; files from other versions of Emacs may +produce different results.) + + (cd "/usr/local/share/emacs/22.0.100/") + + (lengths-list-file "./lisp/macros.el") + => (283 263 480 90) + + (lengths-list-file "./lisp/mail/mailalias.el") + => (38 32 29 95 178 180 321 218 324) + + (lengths-list-file "./lisp/makesum.el") + => (85 181) + + (recursive-lengths-list-many-files + '("./lisp/macros.el" + "./lisp/mail/mailalias.el" + "./lisp/makesum.el")) + => (283 263 480 90 38 32 29 95 178 180 321 218 324 85 181) + +The `recursive-lengths-list-many-files' function produces the output we +want. + +The next step is to prepare the data in the list for display in a graph. + + +File: eintr, Node: Prepare the data, Prev: Several files recursively, Up: Words in a defun + +14.9 Prepare the Data for Display in a Graph +============================================ + +The `recursive-lengths-list-many-files' function returns a list of +numbers. Each number records the length of a function definition. +What we need to do now is transform this data into a list of numbers +suitable for generating a graph. The new list will tell how many +functions definitions contain less than 10 words and symbols, how many +contain between 10 and 19 words and symbols, how many contain between +20 and 29 words and symbols, and so on. + +In brief, we need to go through the lengths' list produced by the +`recursive-lengths-list-many-files' function and count the number of +defuns within each range of lengths, and produce a list of those +numbers. + +Based on what we have done before, we can readily foresee that it +should not be too hard to write a function that `CDRs' down the +lengths' list, looks at each element, determines which length range it +is in, and increments a counter for that range. + +However, before beginning to write such a function, we should consider +the advantages of sorting the lengths' list first, so the numbers are +ordered from smallest to largest. First, sorting will make it easier +to count the numbers in each range, since two adjacent numbers will +either be in the same length range or in adjacent ranges. Second, by +inspecting a sorted list, we can discover the highest and lowest +number, and thereby determine the largest and smallest length range +that we will need. + +* Menu: + +* Sorting:: +* Files List:: +* Counting function definitions:: + + +File: eintr, Node: Sorting, Next: Files List, Prev: Prepare the data, Up: Prepare the data + +14.9.1 Sorting Lists +-------------------- + +Emacs contains a function to sort lists, called (as you might guess) +`sort'. The `sort' function takes two arguments, the list to be +sorted, and a predicate that determines whether the first of two list +elements is "less" than the second. + +As we saw earlier (*note Using the Wrong Type Object as an Argument: +Wrong Type of Argument.), a predicate is a function that determines +whether some property is true or false. The `sort' function will +reorder a list according to whatever property the predicate uses; this +means that `sort' can be used to sort non-numeric lists by non-numeric +criteria--it can, for example, alphabetize a list. + +The `<' function is used when sorting a numeric list. For example, + + (sort '(4 8 21 17 33 7 21 7) '<) + +produces this: + + (4 7 7 8 17 21 21 33) + +(Note that in this example, both the arguments are quoted so that the +symbols are not evaluated before being passed to `sort' as arguments.) + +Sorting the list returned by the `recursive-lengths-list-many-files' +function is straightforward; it uses the `<' function: + + (sort + (recursive-lengths-list-many-files + '("./lisp/macros.el" + "./lisp/mailalias.el" + "./lisp/makesum.el")) + '<) + +which produces: + + (29 32 38 85 90 95 178 180 181 218 263 283 321 324 480) + +(Note that in this example, the first argument to `sort' is not quoted, +since the expression must be evaluated so as to produce the list that +is passed to `sort'.) + + +File: eintr, Node: Files List, Next: Counting function definitions, Prev: Sorting, Up: Prepare the data + +14.9.2 Making a List of Files +----------------------------- + +The `recursive-lengths-list-many-files' function requires a list of +files as its argument. For our test examples, we constructed such a +list by hand; but the Emacs Lisp source directory is too large for us +to do for that. Instead, we will write a function to do the job for +us. In this function, we will use both a `while' loop and a recursive +call. + +We did not have to write a function like this for older versions of GNU +Emacs, since they placed all the `.el' files in one directory. +Instead, we were able to use the `directory-files' function, which +lists the names of files that match a specified pattern within a single +directory. + +However, recent versions of Emacs place Emacs Lisp files in +sub-directories of the top level `lisp' directory. This re-arrangement +eases navigation. For example, all the mail related files are in a +`lisp' sub-directory called `mail'. But at the same time, this +arrangement forces us to create a file listing function that descends +into the sub-directories. + +We can create this function, called `files-in-below-directory', using +familiar functions such as `car', `nthcdr', and `substring' in +conjunction with an existing function called +`directory-files-and-attributes'. This latter function not only lists +all the filenames in a directory, including the names of +sub-directories, but also their attributes. + +To restate our goal: to create a function that will enable us to feed +filenames to `recursive-lengths-list-many-files' as a list that looks +like this (but with more elements): + + ("./lisp/macros.el" + "./lisp/mail/rmail.el" + "./lisp/makesum.el") + +The `directory-files-and-attributes' function returns a list of lists. +Each of the lists within the main list consists of 13 elements. The +first element is a string that contains the name of the file - which, +in GNU/Linux, may be a `directory file', that is to say, a file with +the special attributes of a directory. The second element of the list +is `t' for a directory, a string for symbolic link (the string is the +name linked to), or `nil'. + +For example, the first `.el' file in the `lisp/' directory is +`abbrev.el'. Its name is +`/usr/local/share/emacs/22.0.100/lisp/abbrev.el' and it is not a +directory or a symbolic link. + +This is how `directory-files-and-attributes' lists that file and its +attributes: + + ("abbrev.el" + nil + 1 + 1000 + 100 + (17733 259) + (17491 28834) + (17596 62124) + 13157 + "-rw-rw-r--" + nil + 2971624 + 773) + +On the other hand, `mail/' is a directory within the `lisp/' directory. +The beginning of its listing looks like this: + + ("mail" + t + ... + ) + +(To learn about the different attributes, look at the documentation of +`file-attributes'. Bear in mind that the `file-attributes' function +does not list the filename, so its first element is +`directory-files-and-attributes''s second element.) + +We will want our new function, `files-in-below-directory', to list the +`.el' files in the directory it is told to check, and in any +directories below that directory. + +This gives us a hint on how to construct `files-in-below-directory': +within a directory, the function should add `.el' filenames to a list; +and if, within a directory, the function comes upon a sub-directory, it +should go into that sub-directory and repeat its actions. + +However, we should note that every directory contains a name that +refers to itself, called `.', ("dot") and a name that refers to its +parent directory, called `..' ("double dot"). (In `/', the root +directory, `..' refers to itself, since `/' has no parent.) Clearly, +we do not want our `files-in-below-directory' function to enter those +directories, since they always lead us, directly or indirectly, to the +current directory. + +Consequently, our `files-in-below-directory' function must do several +tasks: + + * Check to see whether it is looking at a filename that ends in + `.el'; and if so, add its name to a list. + + * Check to see whether it is looking at a filename that is the name + of a directory; and if so, + + - Check to see whether it is looking at `.' or `..'; and if so + skip it. + + - Or else, go into that directory and repeat the process. + +Let's write a function definition to do these tasks. We will use a +`while' loop to move from one filename to another within a directory, +checking what needs to be done; and we will use a recursive call to +repeat the actions on each sub-directory. The recursive pattern is +`accumulate' (*note Recursive Pattern: _accumulate_: Accumulate.), +using `append' as the combiner. + +Here is the function: + + (defun files-in-below-directory (directory) + "List the .el files in DIRECTORY and in its sub-directories." + ;; Although the function will be used non-interactively, + ;; it will be easier to test if we make it interactive. + ;; The directory will have a name such as + ;; "/usr/local/share/emacs/22.0.100/lisp/" + (interactive "DDirectory name: ") + (let (el-files-list + (current-directory-list + (directory-files-and-attributes directory t))) + ;; while we are in the current directory + (while current-directory-list + (cond + ;; check to see whether filename ends in `.el' + ;; and if so, append its name to a list. + ((equal ".el" (substring (car (car current-directory-list)) -3)) + (setq el-files-list + (cons (car (car current-directory-list)) el-files-list))) + ;; check whether filename is that of a directory + ((eq t (car (cdr (car current-directory-list)))) + ;; decide whether to skip or recurse + (if + (equal "." + (substring (car (car current-directory-list)) -1)) + ;; then do nothing since filename is that of + ;; current directory or parent, "." or ".." + () + ;; else descend into the directory and repeat the process + (setq el-files-list + (append + (files-in-below-directory + (car (car current-directory-list))) + el-files-list))))) + ;; move to the next filename in the list; this also + ;; shortens the list so the while loop eventually comes to an end + (setq current-directory-list (cdr current-directory-list))) + ;; return the filenames + el-files-list)) + +The `files-in-below-directory' `directory-files' function takes one +argument, the name of a directory. + +Thus, on my system, + + (length + (files-in-below-directory "/usr/local/share/emacs/22.0.100/lisp/")) + +tells me that my Lisp sources directory contains 1031 `.el' files. + +`files-in-below-directory' returns a list in reverse alphabetical +order. An expression to sort the list in alphabetical order looks like +this: + + (sort + (files-in-below-directory "/usr/local/share/emacs/22.0.100/lisp/") + 'string-lessp) + + +File: eintr, Node: Counting function definitions, Prev: Files List, Up: Prepare the data + +14.9.3 Counting function definitions +------------------------------------ + +Our immediate goal is to generate a list that tells us how many +function definitions contain fewer than 10 words and symbols, how many +contain between 10 and 19 words and symbols, how many contain between +20 and 29 words and symbols, and so on. + +With a sorted list of numbers, this is easy: count how many elements of +the list are smaller than 10, then, after moving past the numbers just +counted, count how many are smaller than 20, then, after moving past +the numbers just counted, count how many are smaller than 30, and so +on. Each of the numbers, 10, 20, 30, 40, and the like, is one larger +than the top of that range. We can call the list of such numbers the +`top-of-ranges' list. + +If we wished, we could generate this list automatically, but it is +simpler to write a list manually. Here it is: + + (defvar top-of-ranges + '(10 20 30 40 50 + 60 70 80 90 100 + 110 120 130 140 150 + 160 170 180 190 200 + 210 220 230 240 250 + 260 270 280 290 300) + "List specifying ranges for `defuns-per-range'.") + +To change the ranges, we edit this list. + +Next, we need to write the function that creates the list of the number +of definitions within each range. Clearly, this function must take the +`sorted-lengths' and the `top-of-ranges' lists as arguments. + +The `defuns-per-range' function must do two things again and again: it +must count the number of definitions within a range specified by the +current top-of-range value; and it must shift to the next higher value +in the `top-of-ranges' list after counting the number of definitions in +the current range. Since each of these actions is repetitive, we can +use `while' loops for the job. One loop counts the number of +definitions in the range defined by the current top-of-range value, and +the other loop selects each of the top-of-range values in turn. + +Several entries of the `sorted-lengths' list are counted for each +range; this means that the loop for the `sorted-lengths' list will be +inside the loop for the `top-of-ranges' list, like a small gear inside +a big gear. + +The inner loop counts the number of definitions within the range. It +is a simple counting loop of the type we have seen before. (*Note A +loop with an incrementing counter: Incrementing Loop.) The +true-or-false test of the loop tests whether the value from the +`sorted-lengths' list is smaller than the current value of the top of +the range. If it is, the function increments the counter and tests the +next value from the `sorted-lengths' list. + +The inner loop looks like this: + + (while LENGTH-ELEMENT-SMALLER-THAN-TOP-OF-RANGE + (setq number-within-range (1+ number-within-range)) + (setq sorted-lengths (cdr sorted-lengths))) + +The outer loop must start with the lowest value of the `top-of-ranges' +list, and then be set to each of the succeeding higher values in turn. +This can be done with a loop like this: + + (while top-of-ranges + BODY-OF-LOOP... + (setq top-of-ranges (cdr top-of-ranges))) + +Put together, the two loops look like this: + + (while top-of-ranges + + ;; Count the number of elements within the current range. + (while LENGTH-ELEMENT-SMALLER-THAN-TOP-OF-RANGE + (setq number-within-range (1+ number-within-range)) + (setq sorted-lengths (cdr sorted-lengths))) + + ;; Move to next range. + (setq top-of-ranges (cdr top-of-ranges))) + +In addition, in each circuit of the outer loop, Emacs should record the +number of definitions within that range (the value of +`number-within-range') in a list. We can use `cons' for this purpose. +(*Note `cons': cons.) + +The `cons' function works fine, except that the list it constructs will +contain the number of definitions for the highest range at its +beginning and the number of definitions for the lowest range at its +end. This is because `cons' attaches new elements of the list to the +beginning of the list, and since the two loops are working their way +through the lengths' list from the lower end first, the +`defuns-per-range-list' will end up largest number first. But we will +want to print our graph with smallest values first and the larger +later. The solution is to reverse the order of the +`defuns-per-range-list'. We can do this using the `nreverse' function, +which reverses the order of a list. + +For example, + + (nreverse '(1 2 3 4)) + +produces: + + (4 3 2 1) + +Note that the `nreverse' function is "destructive"--that is, it changes +the list to which it is applied; this contrasts with the `car' and +`cdr' functions, which are non-destructive. In this case, we do not +want the original `defuns-per-range-list', so it does not matter that +it is destroyed. (The `reverse' function provides a reversed copy of a +list, leaving the original list as is.) + +Put all together, the `defuns-per-range' looks like this: + + (defun defuns-per-range (sorted-lengths top-of-ranges) + "SORTED-LENGTHS defuns in each TOP-OF-RANGES range." + (let ((top-of-range (car top-of-ranges)) + (number-within-range 0) + defuns-per-range-list) + + ;; Outer loop. + (while top-of-ranges + + ;; Inner loop. + (while (and + ;; Need number for numeric test. + (car sorted-lengths) + (< (car sorted-lengths) top-of-range)) + + ;; Count number of definitions within current range. + (setq number-within-range (1+ number-within-range)) + (setq sorted-lengths (cdr sorted-lengths))) + + ;; Exit inner loop but remain within outer loop. + + (setq defuns-per-range-list + (cons number-within-range defuns-per-range-list)) + (setq number-within-range 0) ; Reset count to zero. + + ;; Move to next range. + (setq top-of-ranges (cdr top-of-ranges)) + ;; Specify next top of range value. + (setq top-of-range (car top-of-ranges))) + + ;; Exit outer loop and count the number of defuns larger than + ;; the largest top-of-range value. + (setq defuns-per-range-list + (cons + (length sorted-lengths) + defuns-per-range-list)) + + ;; Return a list of the number of definitions within each range, + ;; smallest to largest. + (nreverse defuns-per-range-list))) + +The function is straightforward except for one subtle feature. The +true-or-false test of the inner loop looks like this: + + (and (car sorted-lengths) + (< (car sorted-lengths) top-of-range)) + +instead of like this: + + (< (car sorted-lengths) top-of-range) + +The purpose of the test is to determine whether the first item in the +`sorted-lengths' list is less than the value of the top of the range. + +The simple version of the test works fine unless the `sorted-lengths' +list has a `nil' value. In that case, the `(car sorted-lengths)' +expression function returns `nil'. The `<' function cannot compare a +number to `nil', which is an empty list, so Emacs signals an error and +stops the function from attempting to continue to execute. + +The `sorted-lengths' list always becomes `nil' when the counter reaches +the end of the list. This means that any attempt to use the +`defuns-per-range' function with the simple version of the test will +fail. + +We solve the problem by using the `(car sorted-lengths)' expression in +conjunction with the `and' expression. The `(car sorted-lengths)' +expression returns a non-`nil' value so long as the list has at least +one number within it, but returns `nil' if the list is empty. The +`and' expression first evaluates the `(car sorted-lengths)' expression, +and if it is `nil', returns false _without_ evaluating the `<' +expression. But if the `(car sorted-lengths)' expression returns a +non-`nil' value, the `and' expression evaluates the `<' expression, and +returns that value as the value of the `and' expression. + +This way, we avoid an error. (*Note The `kill-new' function: kill-new +function, for information about `and'.) + +Here is a short test of the `defuns-per-range' function. First, +evaluate the expression that binds (a shortened) `top-of-ranges' list +to the list of values, then evaluate the expression for binding the +`sorted-lengths' list, and then evaluate the `defuns-per-range' +function. + + ;; (Shorter list than we will use later.) + (setq top-of-ranges + '(110 120 130 140 150 + 160 170 180 190 200)) + + (setq sorted-lengths + '(85 86 110 116 122 129 154 176 179 200 265 300 300)) + + (defuns-per-range sorted-lengths top-of-ranges) + +The list returned looks like this: + + (2 2 2 0 0 1 0 2 0 0 4) + +Indeed, there are two elements of the `sorted-lengths' list smaller +than 110, two elements between 110 and 119, two elements between 120 +and 129, and so on. There are four elements with a value of 200 or +larger. + + +File: eintr, Node: Readying a Graph, Next: Emacs Initialization, Prev: Words in a defun, Up: Top + +15 Readying a Graph +******************* + +Our goal is to construct a graph showing the numbers of function +definitions of various lengths in the Emacs lisp sources. + +As a practical matter, if you were creating a graph, you would probably +use a program such as `gnuplot' to do the job. (`gnuplot' is nicely +integrated into GNU Emacs.) In this case, however, we create one from +scratch, and in the process we will re-acquaint ourselves with some of +what we learned before and learn more. + +In this chapter, we will first write a simple graph printing function. +This first definition will be a "prototype", a rapidly written function +that enables us to reconnoiter this unknown graph-making territory. We +will discover dragons, or find that they are myth. After scouting the +terrain, we will feel more confident and enhance the function to label +the axes automatically. + +* Menu: + +* Columns of a graph:: +* graph-body-print:: +* recursive-graph-body-print:: +* Printed Axes:: +* Line Graph Exercise:: + + +File: eintr, Node: Columns of a graph, Next: graph-body-print, Prev: Readying a Graph, Up: Readying a Graph + +Printing the Columns of a Graph +=============================== + +Since Emacs is designed to be flexible and work with all kinds of +terminals, including character-only terminals, the graph will need to +be made from one of the `typewriter' symbols. An asterisk will do; as +we enhance the graph-printing function, we can make the choice of +symbol a user option. + +We can call this function `graph-body-print'; it will take a +`numbers-list' as its only argument. At this stage, we will not label +the graph, but only print its body. + +The `graph-body-print' function inserts a vertical column of asterisks +for each element in the `numbers-list'. The height of each line is +determined by the value of that element of the `numbers-list'. + +Inserting columns is a repetitive act; that means that this function can +be written either with a `while' loop or recursively. + +Our first challenge is to discover how to print a column of asterisks. +Usually, in Emacs, we print characters onto a screen horizontally, line +by line, by typing. We have two routes we can follow: write our own +column-insertion function or discover whether one exists in Emacs. + +To see whether there is one in Emacs, we can use the `M-x apropos' +command. This command is like the `C-h a' (`command-apropos') command, +except that the latter finds only those functions that are commands. +The `M-x apropos' command lists all symbols that match a regular +expression, including functions that are not interactive. + +What we want to look for is some command that prints or inserts +columns. Very likely, the name of the function will contain either the +word `print' or the word `insert' or the word `column'. Therefore, we +can simply type `M-x apropos RET print\|insert\|column RET' and look at +the result. On my system, this command once too takes quite some time, +and then produced a list of 79 functions and variables. Now it does +not take much time at all and produces a list of 211 functions and +variables. Scanning down the list, the only function that looks as if +it might do the job is `insert-rectangle'. + +Indeed, this is the function we want; its documentation says: + + insert-rectangle: + Insert text of RECTANGLE with upper left corner at point. + RECTANGLE's first line is inserted at point, + its second line is inserted at a point vertically under point, etc. + RECTANGLE should be a list of strings. + After this command, the mark is at the upper left corner + and point is at the lower right corner. + +We can run a quick test, to make sure it does what we expect of it. + +Here is the result of placing the cursor after the `insert-rectangle' +expression and typing `C-u C-x C-e' (`eval-last-sexp'). The function +inserts the strings `"first"', `"second"', and `"third"' at and below +point. Also the function returns `nil'. + + (insert-rectangle '("first" "second" "third"))first + second + thirdnil + +Of course, we won't be inserting the text of the `insert-rectangle' +expression itself into the buffer in which we are making the graph, but +will call the function from our program. We shall, however, have to +make sure that point is in the buffer at the place where the +`insert-rectangle' function will insert its column of strings. + +If you are reading this in Info, you can see how this works by +switching to another buffer, such as the `*scratch*' buffer, placing +point somewhere in the buffer, typing `M-:', typing the +`insert-rectangle' expression into the minibuffer at the prompt, and +then typing <RET>. This causes Emacs to evaluate the expression in the +minibuffer, but to use as the value of point the position of point in +the `*scratch*' buffer. (`M-:' is the keybinding for +`eval-expression'. Also, `nil' does not appear in the `*scratch*' +buffer since the expression is evaluated in the minibuffer.) + +We find when we do this that point ends up at the end of the last +inserted line--that is to say, this function moves point as a +side-effect. If we were to repeat the command, with point at this +position, the next insertion would be below and to the right of the +previous insertion. We don't want this! If we are going to make a bar +graph, the columns need to be beside each other. + +So we discover that each cycle of the column-inserting `while' loop +must reposition point to the place we want it, and that place will be +at the top, not the bottom, of the column. Moreover, we remember that +when we print a graph, we do not expect all the columns to be the same +height. This means that the top of each column may be at a different +height from the previous one. We cannot simply reposition point to the +same line each time, but moved over to the right--or perhaps we can... + +We are planning to make the columns of the bar graph out of asterisks. +The number of asterisks in the column is the number specified by the +current element of the `numbers-list'. We need to construct a list of +asterisks of the right length for each call to `insert-rectangle'. If +this list consists solely of the requisite number of asterisks, then we +will have position point the right number of lines above the base for +the graph to print correctly. This could be difficult. + +Alternatively, if we can figure out some way to pass `insert-rectangle' +a list of the same length each time, then we can place point on the +same line each time, but move it over one column to the right for each +new column. If we do this, however, some of the entries in the list +passed to `insert-rectangle' must be blanks rather than asterisks. For +example, if the maximum height of the graph is 5, but the height of the +column is 3, then `insert-rectangle' requires an argument that looks +like this: + + (" " " " "*" "*" "*") + +This last proposal is not so difficult, so long as we can determine the +column height. There are two ways for us to specify the column height: +we can arbitrarily state what it will be, which would work fine for +graphs of that height; or we can search through the list of numbers and +use the maximum height of the list as the maximum height of the graph. +If the latter operation were difficult, then the former procedure would +be easiest, but there is a function built into Emacs that determines +the maximum of its arguments. We can use that function. The function +is called `max' and it returns the largest of all its arguments, which +must be numbers. Thus, for example, + + (max 3 4 6 5 7 3) + +returns 7. (A corresponding function called `min' returns the smallest +of all its arguments.) + +However, we cannot simply call `max' on the `numbers-list'; the `max' +function expects numbers as its argument, not a list of numbers. Thus, +the following expression, + + (max '(3 4 6 5 7 3)) + +produces the following error message; + + Wrong type of argument: number-or-marker-p, (3 4 6 5 7 3) + +We need a function that passes a list of arguments to a function. This +function is `apply'. This function `applies' its first argument (a +function) to its remaining arguments, the last of which may be a list. + +For example, + + (apply 'max 3 4 7 3 '(4 8 5)) + +returns 8. + +(Incidentally, I don't know how you would learn of this function +without a book such as this. It is possible to discover other +functions, like `search-forward' or `insert-rectangle', by guessing at +a part of their names and then using `apropos'. Even though its base +in metaphor is clear--`apply' its first argument to the rest--I doubt a +novice would come up with that particular word when using `apropos' or +other aid. Of course, I could be wrong; after all, the function was +first named by someone who had to invent it.) + +The second and subsequent arguments to `apply' are optional, so we can +use `apply' to call a function and pass the elements of a list to it, +like this, which also returns 8: + + (apply 'max '(4 8 5)) + +This latter way is how we will use `apply'. The +`recursive-lengths-list-many-files' function returns a numbers' list to +which we can apply `max' (we could also apply `max' to the sorted +numbers' list; it does not matter whether the list is sorted or not.) + +Hence, the operation for finding the maximum height of the graph is +this: + + (setq max-graph-height (apply 'max numbers-list)) + +Now we can return to the question of how to create a list of strings +for a column of the graph. Told the maximum height of the graph and +the number of asterisks that should appear in the column, the function +should return a list of strings for the `insert-rectangle' command to +insert. + +Each column is made up of asterisks or blanks. Since the function is +passed the value of the height of the column and the number of +asterisks in the column, the number of blanks can be found by +subtracting the number of asterisks from the height of the column. +Given the number of blanks and the number of asterisks, two `while' +loops can be used to construct the list: + + ;;; First version. + (defun column-of-graph (max-graph-height actual-height) + "Return list of strings that is one column of a graph." + (let ((insert-list nil) + (number-of-top-blanks + (- max-graph-height actual-height))) + + ;; Fill in asterisks. + (while (> actual-height 0) + (setq insert-list (cons "*" insert-list)) + (setq actual-height (1- actual-height))) + + ;; Fill in blanks. + (while (> number-of-top-blanks 0) + (setq insert-list (cons " " insert-list)) + (setq number-of-top-blanks + (1- number-of-top-blanks))) + + ;; Return whole list. + insert-list)) + +If you install this function and then evaluate the following expression +you will see that it returns the list as desired: + + (column-of-graph 5 3) + +returns + + (" " " " "*" "*" "*") + +As written, `column-of-graph' contains a major flaw: the symbols used +for the blank and for the marked entries in the column are `hard-coded' +as a space and asterisk. This is fine for a prototype, but you, or +another user, may wish to use other symbols. For example, in testing +the graph function, you many want to use a period in place of the +space, to make sure the point is being repositioned properly each time +the `insert-rectangle' function is called; or you might want to +substitute a `+' sign or other symbol for the asterisk. You might even +want to make a graph-column that is more than one display column wide. +The program should be more flexible. The way to do that is to replace +the blank and the asterisk with two variables that we can call +`graph-blank' and `graph-symbol' and define those variables separately. + +Also, the documentation is not well written. These considerations lead +us to the second version of the function: + + (defvar graph-symbol "*" + "String used as symbol in graph, usually an asterisk.") + + (defvar graph-blank " " + "String used as blank in graph, usually a blank space. + graph-blank must be the same number of columns wide + as graph-symbol.") + +(For an explanation of `defvar', see *Note Initializing a Variable with +`defvar': defvar.) + + ;;; Second version. + (defun column-of-graph (max-graph-height actual-height) + "Return MAX-GRAPH-HEIGHT strings; ACTUAL-HEIGHT are graph-symbols. + The graph-symbols are contiguous entries at the end + of the list. + The list will be inserted as one column of a graph. + The strings are either graph-blank or graph-symbol." + + (let ((insert-list nil) + (number-of-top-blanks + (- max-graph-height actual-height))) + + ;; Fill in `graph-symbols'. + (while (> actual-height 0) + (setq insert-list (cons graph-symbol insert-list)) + (setq actual-height (1- actual-height))) + + ;; Fill in `graph-blanks'. + (while (> number-of-top-blanks 0) + (setq insert-list (cons graph-blank insert-list)) + (setq number-of-top-blanks + (1- number-of-top-blanks))) + + ;; Return whole list. + insert-list)) + +If we wished, we could rewrite `column-of-graph' a third time to +provide optionally for a line graph as well as for a bar graph. This +would not be hard to do. One way to think of a line graph is that it +is no more than a bar graph in which the part of each bar that is below +the top is blank. To construct a column for a line graph, the function +first constructs a list of blanks that is one shorter than the value, +then it uses `cons' to attach a graph symbol to the list; then it uses +`cons' again to attach the `top blanks' to the list. + +It is easy to see how to write such a function, but since we don't need +it, we will not do it. But the job could be done, and if it were done, +it would be done with `column-of-graph'. Even more important, it is +worth noting that few changes would have to be made anywhere else. The +enhancement, if we ever wish to make it, is simple. + +Now, finally, we come to our first actual graph printing function. +This prints the body of a graph, not the labels for the vertical and +horizontal axes, so we can call this `graph-body-print'. + + +File: eintr, Node: graph-body-print, Next: recursive-graph-body-print, Prev: Columns of a graph, Up: Readying a Graph + +15.1 The `graph-body-print' Function +==================================== + +After our preparation in the preceding section, the `graph-body-print' +function is straightforward. The function will print column after +column of asterisks and blanks, using the elements of a numbers' list +to specify the number of asterisks in each column. This is a +repetitive act, which means we can use a decrementing `while' loop or +recursive function for the job. In this section, we will write the +definition using a `while' loop. + +The `column-of-graph' function requires the height of the graph as an +argument, so we should determine and record that as a local variable. + +This leads us to the following template for the `while' loop version of +this function: + + (defun graph-body-print (numbers-list) + "DOCUMENTATION..." + (let ((height ... + ...)) + + (while numbers-list + INSERT-COLUMNS-AND-REPOSITION-POINT + (setq numbers-list (cdr numbers-list))))) + +We need to fill in the slots of the template. + +Clearly, we can use the `(apply 'max numbers-list)' expression to +determine the height of the graph. + +The `while' loop will cycle through the `numbers-list' one element at a +time. As it is shortened by the `(setq numbers-list (cdr +numbers-list))' expression, the CAR of each instance of the list is the +value of the argument for `column-of-graph'. + +At each cycle of the `while' loop, the `insert-rectangle' function +inserts the list returned by `column-of-graph'. Since the +`insert-rectangle' function moves point to the lower right of the +inserted rectangle, we need to save the location of point at the time +the rectangle is inserted, move back to that position after the +rectangle is inserted, and then move horizontally to the next place +from which `insert-rectangle' is called. + +If the inserted columns are one character wide, as they will be if +single blanks and asterisks are used, the repositioning command is +simply `(forward-char 1)'; however, the width of a column may be +greater than one. This means that the repositioning command should be +written `(forward-char symbol-width)'. The `symbol-width' itself is +the length of a `graph-blank' and can be found using the expression +`(length graph-blank)'. The best place to bind the `symbol-width' +variable to the value of the width of graph column is in the varlist of +the `let' expression. + +These considerations lead to the following function definition: + + (defun graph-body-print (numbers-list) + "Print a bar graph of the NUMBERS-LIST. + The numbers-list consists of the Y-axis values." + + (let ((height (apply 'max numbers-list)) + (symbol-width (length graph-blank)) + from-position) + + (while numbers-list + (setq from-position (point)) + (insert-rectangle + (column-of-graph height (car numbers-list))) + (goto-char from-position) + (forward-char symbol-width) + ;; Draw graph column by column. + (sit-for 0) + (setq numbers-list (cdr numbers-list))) + ;; Place point for X axis labels. + (forward-line height) + (insert "\n") + )) + +The one unexpected expression in this function is the `(sit-for 0)' +expression in the `while' loop. This expression makes the graph +printing operation more interesting to watch than it would be +otherwise. The expression causes Emacs to `sit' or do nothing for a +zero length of time and then redraw the screen. Placed here, it causes +Emacs to redraw the screen column by column. Without it, Emacs would +not redraw the screen until the function exits. + +We can test `graph-body-print' with a short list of numbers. + + 1. Install `graph-symbol', `graph-blank', `column-of-graph', which + are in *Note Columns of a graph::, and `graph-body-print'. + + 2. Copy the following expression: + + (graph-body-print '(1 2 3 4 6 4 3 5 7 6 5 2 3)) + + 3. Switch to the `*scratch*' buffer and place the cursor where you + want the graph to start. + + 4. Type `M-:' (`eval-expression'). + + 5. Yank the `graph-body-print' expression into the minibuffer with + `C-y' (`yank)'. + + 6. Press <RET> to evaluate the `graph-body-print' expression. + +Emacs will print a graph like this: + + * + * ** + * **** + *** **** + ********* * + ************ + ************* + + +File: eintr, Node: recursive-graph-body-print, Next: Printed Axes, Prev: graph-body-print, Up: Readying a Graph + +15.2 The `recursive-graph-body-print' Function +============================================== + +The `graph-body-print' function may also be written recursively. The +recursive solution is divided into two parts: an outside `wrapper' that +uses a `let' expression to determine the values of several variables +that need only be found once, such as the maximum height of the graph, +and an inside function that is called recursively to print the graph. + +The `wrapper' is uncomplicated: + + (defun recursive-graph-body-print (numbers-list) + "Print a bar graph of the NUMBERS-LIST. + The numbers-list consists of the Y-axis values." + (let ((height (apply 'max numbers-list)) + (symbol-width (length graph-blank)) + from-position) + (recursive-graph-body-print-internal + numbers-list + height + symbol-width))) + +The recursive function is a little more difficult. It has four parts: +the `do-again-test', the printing code, the recursive call, and the +`next-step-expression'. The `do-again-test' is an `if' expression that +determines whether the `numbers-list' contains any remaining elements; +if it does, the function prints one column of the graph using the +printing code and calls itself again. The function calls itself again +according to the value produced by the `next-step-expression' which +causes the call to act on a shorter version of the `numbers-list'. + + (defun recursive-graph-body-print-internal + (numbers-list height symbol-width) + "Print a bar graph. + Used within recursive-graph-body-print function." + + (if numbers-list + (progn + (setq from-position (point)) + (insert-rectangle + (column-of-graph height (car numbers-list))) + (goto-char from-position) + (forward-char symbol-width) + (sit-for 0) ; Draw graph column by column. + (recursive-graph-body-print-internal + (cdr numbers-list) height symbol-width)))) + +After installation, this expression can be tested; here is a sample: + + (recursive-graph-body-print '(3 2 5 6 7 5 3 4 6 4 3 2 1)) + +Here is what `recursive-graph-body-print' produces: + + * + ** * + **** * + **** *** + * ********* + ************ + ************* + +Either of these two functions, `graph-body-print' or +`recursive-graph-body-print', create the body of a graph. + + +File: eintr, Node: Printed Axes, Next: Line Graph Exercise, Prev: recursive-graph-body-print, Up: Readying a Graph + +15.3 Need for Printed Axes +========================== + +A graph needs printed axes, so you can orient yourself. For a do-once +project, it may be reasonable to draw the axes by hand using Emacs' +Picture mode; but a graph drawing function may be used more than once. + +For this reason, I have written enhancements to the basic +`print-graph-body' function that automatically print labels for the +horizontal and vertical axes. Since the label printing functions do +not contain much new material, I have placed their description in an +appendix. *Note A Graph with Labelled Axes: Full Graph. + + +File: eintr, Node: Line Graph Exercise, Prev: Printed Axes, Up: Readying a Graph + +15.4 Exercise +============= + +Write a line graph version of the graph printing functions. + + +File: eintr, Node: Emacs Initialization, Next: Debugging, Prev: Readying a Graph, Up: Top + +16 Your `.emacs' File +********************* + +"You don't have to like Emacs to like it" - this seemingly paradoxical +statement is the secret of GNU Emacs. The plain, `out of the box' +Emacs is a generic tool. Most people who use it, customize it to suit +themselves. + +GNU Emacs is mostly written in Emacs Lisp; this means that by writing +expressions in Emacs Lisp you can change or extend Emacs. + +* Menu: + +* Default Configuration:: +* Site-wide Init:: +* defcustom:: +* Beginning a .emacs File:: +* Text and Auto-fill:: +* Mail Aliases:: +* Indent Tabs Mode:: +* Keybindings:: +* Keymaps:: +* Loading Files:: +* Autoload:: +* Simple Extension:: +* X11 Colors:: +* Miscellaneous:: +* Mode Line:: + + +File: eintr, Node: Default Configuration, Next: Site-wide Init, Prev: Emacs Initialization, Up: Emacs Initialization + +Emacs' Default Configuration +============================ + +There are those who appreciate Emacs' default configuration. After +all, Emacs starts you in C mode when you edit a C file, starts you in +Fortran mode when you edit a Fortran file, and starts you in +Fundamental mode when you edit an unadorned file. This all makes +sense, if you do not know who is going to use Emacs. Who knows what a +person hopes to do with an unadorned file? Fundamental mode is the +right default for such a file, just as C mode is the right default for +editing C code. (Enough programming languages have syntaxes that +enable them to share or nearly share features, so C mode is now +provided by by CC mode, the `C Collection'.) + +But when you do know who is going to use Emacs--you, yourself--then it +makes sense to customize Emacs. + +For example, I seldom want Fundamental mode when I edit an otherwise +undistinguished file; I want Text mode. This is why I customize Emacs: +so it suits me. + +You can customize and extend Emacs by writing or adapting a `~/.emacs' +file. This is your personal initialization file; its contents, written +in Emacs Lisp, tell Emacs what to do.(1) + +A `~/.emacs' file contains Emacs Lisp code. You can write this code +yourself; or you can use Emacs' `customize' feature to write the code +for you. You can combine your own expressions and auto-written +Customize expressions in your `.emacs' file. + +(I myself prefer to write my own expressions, except for those, +particularly fonts, that I find easier to manipulate using the +`customize' command. I combine the two methods.) + +Most of this chapter is about writing expressions yourself. It +describes a simple `.emacs' file; for more information, see *Note The +Init File: (emacs)Init File, and *Note The Init File: (elisp)Init File. + +---------- Footnotes ---------- + +(1) You may also add `.el' to `~/.emacs' and call it a `~/.emacs.el' +file. In the past, you were forbidden to type the extra keystrokes +that the name `~/.emacs.el' requires, but now you may. The new format +is consistent with the Emacs Lisp file naming conventions; the old +format saves typing. + + +File: eintr, Node: Site-wide Init, Next: defcustom, Prev: Default Configuration, Up: Emacs Initialization + +16.1 Site-wide Initialization Files +=================================== + +In addition to your personal initialization file, Emacs automatically +loads various site-wide initialization files, if they exist. These +have the same form as your `.emacs' file, but are loaded by everyone. + +Two site-wide initialization files, `site-load.el' and `site-init.el', +are loaded into Emacs and then `dumped' if a `dumped' version of Emacs +is created, as is most common. (Dumped copies of Emacs load more +quickly. However, once a file is loaded and dumped, a change to it +does not lead to a change in Emacs unless you load it yourself or +re-dump Emacs. *Note Building Emacs: (elisp)Building Emacs, and the +`INSTALL' file.) + +Three other site-wide initialization files are loaded automatically +each time you start Emacs, if they exist. These are `site-start.el', +which is loaded _before_ your `.emacs' file, and `default.el', and the +terminal type file, which are both loaded _after_ your `.emacs' file. + +Settings and definitions in your `.emacs' file will overwrite +conflicting settings and definitions in a `site-start.el' file, if it +exists; but the settings and definitions in a `default.el' or terminal +type file will overwrite those in your `.emacs' file. (You can prevent +interference from a terminal type file by setting `term-file-prefix' to +`nil'. *Note A Simple Extension: Simple Extension.) + +The `INSTALL' file that comes in the distribution contains descriptions +of the `site-init.el' and `site-load.el' files. + +The `loadup.el', `startup.el', and `loaddefs.el' files control loading. +These files are in the `lisp' directory of the Emacs distribution and +are worth perusing. + +The `loaddefs.el' file contains a good many suggestions as to what to +put into your own `.emacs' file, or into a site-wide initialization +file. + + +File: eintr, Node: defcustom, Next: Beginning a .emacs File, Prev: Site-wide Init, Up: Emacs Initialization + +16.2 Specifying Variables using `defcustom' +=========================================== + +You can specify variables using `defcustom' so that you and others can +then use Emacs' `customize' feature to set their values. (You cannot +use `customize' to write function definitions; but you can write +`defuns' in your `.emacs' file. Indeed, you can write any Lisp +expression in your `.emacs' file.) + +The `customize' feature depends on the `defcustom' special form. +Although you can use `defvar' or `setq' for variables that users set, +the `defcustom' special form is designed for the job. + +You can use your knowledge of `defvar' for writing the first three +arguments for `defcustom'. The first argument to `defcustom' is the +name of the variable. The second argument is the variable's initial +value, if any; and this value is set only if the value has not already +been set. The third argument is the documentation. + +The fourth and subsequent arguments to `defcustom' specify types and +options; these are not featured in `defvar'. (These arguments are +optional.) + +Each of these arguments consists of a keyword followed by a value. +Each keyword starts with the colon character `:'. + +For example, the customizable user option variable `text-mode-hook' +looks like this: + + (defcustom text-mode-hook nil + "Normal hook run when entering Text mode and many related modes." + :type 'hook + :options '(turn-on-auto-fill flyspell-mode) + :group 'data) + +The name of the variable is `text-mode-hook'; it has no default value; +and its documentation string tells you what it does. + +The `:type' keyword tells Emacs the kind of data to which +`text-mode-hook' should be set and how to display the value in a +Customization buffer. + +The `:options' keyword specifies a suggested list of values for the +variable. Currently, you can use `:options' only for a hook. The list +is only a suggestion; it is not exclusive; a person who sets the +variable may set it to other values; the list shown following the +`:options' keyword is intended to offer convenient choices to a user. + +Finally, the `:group' keyword tells the Emacs Customization command in +which group the variable is located. This tells where to find it. + +For more information, see *Note Writing Customization Definitions: +(elisp)Customization. + +Consider `text-mode-hook' as an example. + +There are two ways to customize this variable. You can use the +customization command or write the appropriate expressions yourself. + +Using the customization command, you can type: + + M-x customize + +and find that the group for editing files of data is called `data'. +Enter that group. Text Mode Hook is the first member. You can click +on its various options, such as `turn-on-auto-fill', to set the values. +After you click on the button to + + Save for Future Sessions + +Emacs will write an expression into your `.emacs' file. It will look +like this: + + (custom-set-variables + ;; custom-set-variables was added by Custom. + ;; If you edit it by hand, you could mess it up, so be careful. + ;; Your init file should contain only one such instance. + ;; If there is more than one, they won't work right. + '(text-mode-hook (quote (turn-on-auto-fill text-mode-hook-identify)))) + +(The `text-mode-hook-identify' function tells +`toggle-text-mode-auto-fill' which buffers are in Text mode. It comes +on automatically. ) + +The `custom-set-variables' function works somewhat differently than a +`setq'. While I have never learned the differences, I modify the +`custom-set-variables' expressions in my `.emacs' file by hand: I make +the changes in what appears to me to be a reasonable manner and have +not had any problems. Others prefer to use the Customization command +and let Emacs do the work for them. + +Another `custom-set-...' function is `custom-set-faces'. This function +sets the various font faces. Over time, I have set a considerable +number of faces. Some of the time, I re-set them using `customize'; +other times, I simply edit the `custom-set-faces' expression in my +`.emacs' file itself. + +The second way to customize your `text-mode-hook' is to set it yourself +in your `.emacs' file using code that has nothing to do with the +`custom-set-...' functions. + +When you do this, and later use `customize', you will see a message +that says + + CHANGED outside Customize; operating on it here may be unreliable. + +This message is only a warning. If you click on the button to + + Save for Future Sessions + +Emacs will write a `custom-set-...' expression near the end of your +`.emacs' file that will be evaluated after your hand-written +expression. It will, therefore, overrule your hand-written expression. +No harm will be done. When you do this, however, be careful to +remember which expression is active; if you forget, you may confuse +yourself. + +So long as you remember where the values are set, you will have no +trouble. In any event, the values are always set in your +initialization file, which is usually called `.emacs'. + +I myself use `customize' for hardly anything. Mostly, I write +expressions myself. + +Incidentally, `defsubst' defines an inline function. The syntax is +just like that of `defun'. `defconst' defines a symbol as a constant. +The intent is that neither programs nor users should ever change a +value set by `defconst' + + +File: eintr, Node: Beginning a .emacs File, Next: Text and Auto-fill, Prev: defcustom, Up: Emacs Initialization + +16.3 Beginning a `.emacs' File +============================== + +When you start Emacs, it loads your `.emacs' file unless you tell it +not to by specifying `-q' on the command line. (The `emacs -q' command +gives you a plain, out-of-the-box Emacs.) + +A `.emacs' file contains Lisp expressions. Often, these are no more +than expressions to set values; sometimes they are function definitions. + +*Note The Init File `~/.emacs': (emacs)Init File, for a short +description of initialization files. + +This chapter goes over some of the same ground, but is a walk among +extracts from a complete, long-used `.emacs' file--my own. + +The first part of the file consists of comments: reminders to myself. +By now, of course, I remember these things, but when I started, I did +not. + + ;;;; Bob's .emacs file + ; Robert J. Chassell + ; 26 September 1985 + +Look at that date! I started this file a long time ago. I have been +adding to it ever since. + + ; Each section in this file is introduced by a + ; line beginning with four semicolons; and each + ; entry is introduced by a line beginning with + ; three semicolons. + +This describes the usual conventions for comments in Emacs Lisp. +Everything on a line that follows a semicolon is a comment. Two, +three, and four semicolons are used as section and subsection markers. +(*Note Comments: (elisp)Comments, for more about comments.) + + ;;;; The Help Key + ; Control-h is the help key; + ; after typing control-h, type a letter to + ; indicate the subject about which you want help. + ; For an explanation of the help facility, + ; type control-h two times in a row. + +Just remember: type `C-h' two times for help. + + ; To find out about any mode, type control-h m + ; while in that mode. For example, to find out + ; about mail mode, enter mail mode and then type + ; control-h m. + +`Mode help', as I call this, is very helpful. Usually, it tells you +all you need to know. + +Of course, you don't need to include comments like these in your +`.emacs' file. I included them in mine because I kept forgetting about +Mode help or the conventions for comments--but I was able to remember +to look here to remind myself. + + +File: eintr, Node: Text and Auto-fill, Next: Mail Aliases, Prev: Beginning a .emacs File, Up: Emacs Initialization + +16.4 Text and Auto Fill Mode +============================ + +Now we come to the part that `turns on' Text mode and Auto Fill mode. + + ;;; Text mode and Auto Fill mode + ; The next two lines put Emacs into Text mode + ; and Auto Fill mode, and are for writers who + ; want to start writing prose rather than code. + + (setq default-major-mode 'text-mode) + (add-hook 'text-mode-hook 'turn-on-auto-fill) + +Here is the first part of this `.emacs' file that does something +besides remind a forgetful human! + +The first of the two lines in parentheses tells Emacs to turn on Text +mode when you find a file, _unless_ that file should go into some other +mode, such as C mode. + +When Emacs reads a file, it looks at the extension to the file name, if +any. (The extension is the part that comes after a `.'.) If the file +ends with a `.c' or `.h' extension then Emacs turns on C mode. Also, +Emacs looks at first nonblank line of the file; if the line says +`-*- C -*-', Emacs turns on C mode. Emacs possesses a list of +extensions and specifications that it uses automatically. In addition, +Emacs looks near the last page for a per-buffer, "local variables +list", if any. + +*Note How Major Modes are Chosen: (emacs)Choosing Modes. + +*Note Local Variables in Files: (emacs)File Variables. + +Now, back to the `.emacs' file. + +Here is the line again; how does it work? + + (setq default-major-mode 'text-mode) + +This line is a short, but complete Emacs Lisp expression. + +We are already familiar with `setq'. It sets the following variable, +`default-major-mode', to the subsequent value, which is `text-mode'. +The single quote mark before `text-mode' tells Emacs to deal directly +with the `text-mode' variable, not with whatever it might stand for. +*Note Setting the Value of a Variable: set & setq, for a reminder of +how `setq' works. The main point is that there is no difference +between the procedure you use to set a value in your `.emacs' file and +the procedure you use anywhere else in Emacs. + +Here is the next line: + + (add-hook 'text-mode-hook 'turn-on-auto-fill) + +In this line, the `add-hook' command adds `turn-on-auto-fill' to the +variable. + +`turn-on-auto-fill' is the name of a program, that, you guessed it!, +turns on Auto Fill mode. + +Every time Emacs turns on Text mode, Emacs runs the commands `hooked' +onto Text mode. So every time Emacs turns on Text mode, Emacs also +turns on Auto Fill mode. + +In brief, the first line causes Emacs to enter Text mode when you edit a +file, unless the file name extension, a first non-blank line, or local +variables to tell Emacs otherwise. + +Text mode among other actions, sets the syntax table to work +conveniently for writers. In Text mode, Emacs considers an apostrophe +as part of a word like a letter; but Emacs does not consider a period +or a space as part of a word. Thus, `M-f' moves you over `it's'. On +the other hand, in C mode, `M-f' stops just after the `t' of `it's'. + +The second line causes Emacs to turn on Auto Fill mode when it turns on +Text mode. In Auto Fill mode, Emacs automatically breaks a line that +is too wide and brings the excessively wide part of the line down to +the next line. Emacs breaks lines between words, not within them. + +When Auto Fill mode is turned off, lines continue to the right as you +type them. Depending on how you set the value of `truncate-lines', the +words you type either disappear off the right side of the screen, or +else are shown, in a rather ugly and unreadable manner, as a +continuation line on the screen. + +In addition, in this part of my `.emacs' file, I tell the Emacs fill +commands to insert two spaces after a colon: + + (setq colon-double-space t) + + +File: eintr, Node: Mail Aliases, Next: Indent Tabs Mode, Prev: Text and Auto-fill, Up: Emacs Initialization + +16.5 Mail Aliases +================= + +Here is a `setq' that `turns on' mail aliases, along with more +reminders. + + ;;; Mail mode + ; To enter mail mode, type `C-x m' + ; To enter RMAIL (for reading mail), + ; type `M-x rmail' + + (setq mail-aliases t) + +This `setq' command sets the value of the variable `mail-aliases' to +`t'. Since `t' means true, the line says, in effect, "Yes, use mail +aliases." + +Mail aliases are convenient short names for long email addresses or for +lists of email addresses. The file where you keep your `aliases' is +`~/.mailrc'. You write an alias like this: + + alias geo george@foobar.wiz.edu + +When you write a message to George, address it to `geo'; the mailer +will automatically expand `geo' to the full address. + + +File: eintr, Node: Indent Tabs Mode, Next: Keybindings, Prev: Mail Aliases, Up: Emacs Initialization + +16.6 Indent Tabs Mode +===================== + +By default, Emacs inserts tabs in place of multiple spaces when it +formats a region. (For example, you might indent many lines of text +all at once with the `indent-region' command.) Tabs look fine on a +terminal or with ordinary printing, but they produce badly indented +output when you use TeX or Texinfo since TeX ignores tabs. + +The following turns off Indent Tabs mode: + + ;;; Prevent Extraneous Tabs + (setq-default indent-tabs-mode nil) + +Note that this line uses `setq-default' rather than the `setq' command +that we have seen before. The `setq-default' command sets values only +in buffers that do not have their own local values for the variable. + +*Note Tabs vs. Spaces: (emacs)Just Spaces. + +*Note Local Variables in Files: (emacs)File Variables. + + +File: eintr, Node: Keybindings, Next: Keymaps, Prev: Indent Tabs Mode, Up: Emacs Initialization + +16.7 Some Keybindings +===================== + +Now for some personal keybindings: + + ;;; Compare windows + (global-set-key "\C-cw" 'compare-windows) + +`compare-windows' is a nifty command that compares the text in your +current window with text in the next window. It makes the comparison +by starting at point in each window, moving over text in each window as +far as they match. I use this command all the time. + +This also shows how to set a key globally, for all modes. + +The command is `global-set-key'. It is followed by the keybinding. In +a `.emacs' file, the keybinding is written as shown: `\C-c' stands for +`control-c', which means `press the control key and the `c' key at the +same time'. The `w' means `press the `w' key'. The keybinding is +surrounded by double quotation marks. In documentation, you would +write this as `C-c w'. (If you were binding a <META> key, such as +`M-c', rather than a <CTRL> key, you would write `\M-c'. *Note +Rebinding Keys in Your Init File: (emacs)Init Rebinding, for details.) + +The command invoked by the keys is `compare-windows'. Note that +`compare-windows' is preceded by a single quote; otherwise, Emacs would +first try to evaluate the symbol to determine its value. + +These three things, the double quotation marks, the backslash before +the `C', and the single quote mark are necessary parts of keybinding +that I tend to forget. Fortunately, I have come to remember that I +should look at my existing `.emacs' file, and adapt what is there. + +As for the keybinding itself: `C-c w'. This combines the prefix key, +`C-c', with a single character, in this case, `w'. This set of keys, +`C-c' followed by a single character, is strictly reserved for +individuals' own use. (I call these `own' keys, since these are for my +own use.) You should always be able to create such a keybinding for +your own use without stomping on someone else's keybinding. If you +ever write an extension to Emacs, please avoid taking any of these keys +for public use. Create a key like `C-c C-w' instead. Otherwise, we +will run out of `own' keys. + +Here is another keybinding, with a comment: + + ;;; Keybinding for `occur' + ; I use occur a lot, so let's bind it to a key: + (global-set-key "\C-co" 'occur) + +The `occur' command shows all the lines in the current buffer that +contain a match for a regular expression. Matching lines are shown in +a buffer called `*Occur*'. That buffer serves as a menu to jump to +occurrences. + +Here is how to unbind a key, so it does not work: + + ;;; Unbind `C-x f' + (global-unset-key "\C-xf") + +There is a reason for this unbinding: I found I inadvertently typed +`C-x f' when I meant to type `C-x C-f'. Rather than find a file, as I +intended, I accidentally set the width for filled text, almost always +to a width I did not want. Since I hardly ever reset my default width, +I simply unbound the key. + +The following rebinds an existing key: + + ;;; Rebind `C-x C-b' for `buffer-menu' + (global-set-key "\C-x\C-b" 'buffer-menu) + +By default, `C-x C-b' runs the `list-buffers' command. This command +lists your buffers in _another_ window. Since I almost always want to +do something in that window, I prefer the `buffer-menu' command, which +not only lists the buffers, but moves point into that window. + + +File: eintr, Node: Keymaps, Next: Loading Files, Prev: Keybindings, Up: Emacs Initialization + +16.8 Keymaps +============ + +Emacs uses "keymaps" to record which keys call which commands. When +you use `global-set-key' to set the keybinding for a single command in +all parts of Emacs, you are specifying the keybinding in +`current-global-map'. + +Specific modes, such as C mode or Text mode, have their own keymaps; +the mode-specific keymaps override the global map that is shared by all +buffers. + +The `global-set-key' function binds, or rebinds, the global keymap. +For example, the following binds the key `C-x C-b' to the function +`buffer-menu': + + (global-set-key "\C-x\C-b" 'buffer-menu) + +Mode-specific keymaps are bound using the `define-key' function, which +takes a specific keymap as an argument, as well as the key and the +command. For example, my `.emacs' file contains the following +expression to bind the `texinfo-insert-@group' command to `C-c C-c g': + + (define-key texinfo-mode-map "\C-c\C-cg" 'texinfo-insert-@group) + +The `texinfo-insert-@group' function itself is a little extension to +Texinfo mode that inserts `@group' into a Texinfo file. I use this +command all the time and prefer to type the three strokes `C-c C-c g' +rather than the six strokes `@ g r o u p'. (`@group' and its matching +`@end group' are commands that keep all enclosed text together on one +page; many multi-line examples in this book are surrounded by `@group +... @end group'.) + +Here is the `texinfo-insert-@group' function definition: + + (defun texinfo-insert-@group () + "Insert the string @group in a Texinfo buffer." + (interactive) + (beginning-of-line) + (insert "@group\n")) + +(Of course, I could have used Abbrev mode to save typing, rather than +write a function to insert a word; but I prefer key strokes consistent +with other Texinfo mode key bindings.) + +You will see numerous `define-key' expressions in `loaddefs.el' as well +as in the various mode libraries, such as `cc-mode.el' and +`lisp-mode.el'. + +*Note Customizing Key Bindings: (emacs)Key Bindings, and *Note Keymaps: +(elisp)Keymaps, for more information about keymaps. + + +File: eintr, Node: Loading Files, Next: Autoload, Prev: Keymaps, Up: Emacs Initialization + +16.9 Loading Files +================== + +Many people in the GNU Emacs community have written extensions to +Emacs. As time goes by, these extensions are often included in new +releases. For example, the Calendar and Diary packages are now part of +the standard GNU Emacs, as is Calc. + +You can use a `load' command to evaluate a complete file and thereby +install all the functions and variables in the file into Emacs. For +example: + + (load "~/emacs/slowsplit") + +This evaluates, i.e. loads, the `slowsplit.el' file or if it exists, +the faster, byte compiled `slowsplit.elc' file from the `emacs' +sub-directory of your home directory. The file contains the function +`split-window-quietly', which John Robinson wrote in 1989. + +The `split-window-quietly' function splits a window with the minimum of +redisplay. I installed it in 1989 because it worked well with the slow +1200 baud terminals I was then using. Nowadays, I only occasionally +come across such a slow connection, but I continue to use the function +because I like the way it leaves the bottom half of a buffer in the +lower of the new windows and the top half in the upper window. + +To replace the key binding for the default `split-window-vertically', +you must also unset that key and bind the keys to +`split-window-quietly', like this: + + (global-unset-key "\C-x2") + (global-set-key "\C-x2" 'split-window-quietly) + +If you load many extensions, as I do, then instead of specifying the +exact location of the extension file, as shown above, you can specify +that directory as part of Emacs' `load-path'. Then, when Emacs loads a +file, it will search that directory as well as its default list of +directories. (The default list is specified in `paths.h' when Emacs is +built.) + +The following command adds your `~/emacs' directory to the existing +load path: + + ;;; Emacs Load Path + (setq load-path (cons "~/emacs" load-path)) + +Incidentally, `load-library' is an interactive interface to the `load' +function. The complete function looks like this: + + (defun load-library (library) + "Load the library named LIBRARY. + This is an interface to the function `load'." + (interactive + (list (completing-read "Load library: " + 'locate-file-completion + (cons load-path (get-load-suffixes))))) + (load library)) + +The name of the function, `load-library', comes from the use of +`library' as a conventional synonym for `file'. The source for the +`load-library' command is in the `files.el' library. + +Another interactive command that does a slightly different job is +`load-file'. *Note Libraries of Lisp Code for Emacs: (emacs)Lisp +Libraries, for information on the distinction between `load-library' +and this command. + + +File: eintr, Node: Autoload, Next: Simple Extension, Prev: Loading Files, Up: Emacs Initialization + +16.10 Autoloading +================= + +Instead of installing a function by loading the file that contains it, +or by evaluating the function definition, you can make the function +available but not actually install it until it is first called. This +is called "autoloading". + +When you execute an autoloaded function, Emacs automatically evaluates +the file that contains the definition, and then calls the function. + +Emacs starts quicker with autoloaded functions, since their libraries +are not loaded right away; but you need to wait a moment when you first +use such a function, while its containing file is evaluated. + +Rarely used functions are frequently autoloaded. The `loaddefs.el' +library contains hundreds of autoloaded functions, from `bookmark-set' +to `wordstar-mode'. Of course, you may come to use a `rare' function +frequently. When you do, you should load that function's file with a +`load' expression in your `.emacs' file. + +In my `.emacs' file for Emacs version 22, I load 14 libraries that +contain functions that would otherwise be autoloaded. (Actually, it +would have been better to include these files in my `dumped' Emacs, but +I forgot. *Note Building Emacs: (elisp)Building Emacs, and the +`INSTALL' file for more about dumping.) + +You may also want to include autoloaded expressions in your `.emacs' +file. `autoload' is a built-in function that takes up to five +arguments, the final three of which are optional. The first argument +is the name of the function to be autoloaded; the second is the name of +the file to be loaded. The third argument is documentation for the +function, and the fourth tells whether the function can be called +interactively. The fifth argument tells what type of +object--`autoload' can handle a keymap or macro as well as a function +(the default is a function). + +Here is a typical example: + + (autoload 'html-helper-mode + "html-helper-mode" "Edit HTML documents" t) + +(`html-helper-mode' is an alternative to `html-mode', which is a +standard part of the distribution). + +This expression autoloads the `html-helper-mode' function. It takes it +from the `html-helper-mode.el' file (or from the byte compiled file +`html-helper-mode.elc', if it exists.) The file must be located in a +directory specified by `load-path'. The documentation says that this +is a mode to help you edit documents written in the HyperText Markup +Language. You can call this mode interactively by typing `M-x +html-helper-mode'. (You need to duplicate the function's regular +documentation in the autoload expression because the regular function +is not yet loaded, so its documentation is not available.) + +*Note Autoload: (elisp)Autoload, for more information. + + +File: eintr, Node: Simple Extension, Next: X11 Colors, Prev: Autoload, Up: Emacs Initialization + +16.11 A Simple Extension: `line-to-top-of-window' +================================================= + +Here is a simple extension to Emacs that moves the line point is on to +the top of the window. I use this all the time, to make text easier to +read. + +You can put the following code into a separate file and then load it +from your `.emacs' file, or you can include it within your `.emacs' +file. + +Here is the definition: + + ;;; Line to top of window; + ;;; replace three keystroke sequence C-u 0 C-l + (defun line-to-top-of-window () + "Move the line point is on to top of window." + (interactive) + (recenter 0)) + +Now for the keybinding. + +Nowadays, function keys as well as mouse button events and non-ASCII +characters are written within square brackets, without quotation marks. +(In Emacs version 18 and before, you had to write different function +key bindings for each different make of terminal.) + +I bind `line-to-top-of-window' to my <F6> function key like this: + + (global-set-key [f6] 'line-to-top-of-window) + +For more information, see *Note Rebinding Keys in Your Init File: +(emacs)Init Rebinding. + +If you run two versions of GNU Emacs, such as versions 21 and 22, and +use one `.emacs' file, you can select which code to evaluate with the +following conditional: + + (cond + ((string-equal (number-to-string 21) (substring (emacs-version) 10 12)) + ;; evaluate version 21 code + ( ... )) + ((string-equal (number-to-string 22) (substring (emacs-version) 10 12)) + ;; evaluate version 22 code + ( ... ))) + +For example, in contrast to version 20, version 21 blinks its cursor by +default. I hate such blinking, as well as some other features in +version 21, so I placed the following in my `.emacs' file(1): + + (if (string-equal "21" (substring (emacs-version) 10 12)) + (progn + (blink-cursor-mode 0) + ;; Insert newline when you press `C-n' (next-line) + ;; at the end of the buffer + (setq next-line-add-newlines t) + ;; Turn on image viewing + (auto-image-file-mode t) + ;; Turn on menu bar (this bar has text) + ;; (Use numeric argument to turn on) + (menu-bar-mode 1) + ;; Turn off tool bar (this bar has icons) + ;; (Use numeric argument to turn on) + (tool-bar-mode nil) + ;; Turn off tooltip mode for tool bar + ;; (This mode causes icon explanations to pop up) + ;; (Use numeric argument to turn on) + (tooltip-mode nil) + ;; If tooltips turned on, make tips appear promptly + (setq tooltip-delay 0.1) ; default is one second + )) + +(You will note that instead of typing `(number-to-string 21)', I +decided to save typing and wrote `21' as a string, `"21"', rather than +convert it from an integer to a string. In this instance, this +expression is better than the longer, but more general +`(number-to-string 21)'. However, if you do not know ahead of time +what type of information will be returned, then the `number-to-string' +function will be needed.) + +---------- Footnotes ---------- + +(1) When I start instances of Emacs that do not load my `.emacs' file +or any site file, I also turn off blinking: + + emacs -q --no-site-file -eval '(blink-cursor-mode nil)' + +Or nowadays, using an even more sophisticated set of options, + + emacs -Q - D + + +File: eintr, Node: X11 Colors, Next: Miscellaneous, Prev: Simple Extension, Up: Emacs Initialization + +16.12 X11 Colors +================ + +You can specify colors when you use Emacs with the MIT X Windowing +system. + +I dislike the default colors and specify my own. + +Here are the expressions in my `.emacs' file that set values: + + ;; Set cursor color + (set-cursor-color "white") + + ;; Set mouse color + (set-mouse-color "white") + + ;; Set foreground and background + (set-foreground-color "white") + (set-background-color "darkblue") + + ;;; Set highlighting colors for isearch and drag + (set-face-foreground 'highlight "white") + (set-face-background 'highlight "blue") + + (set-face-foreground 'region "cyan") + (set-face-background 'region "blue") + + (set-face-foreground 'secondary-selection "skyblue") + (set-face-background 'secondary-selection "darkblue") + + ;; Set calendar highlighting colors + (setq calendar-load-hook + '(lambda () + (set-face-foreground 'diary-face "skyblue") + (set-face-background 'holiday-face "slate blue") + (set-face-foreground 'holiday-face "white"))) + +The various shades of blue soothe my eye and prevent me from seeing the +screen flicker. + +Alternatively, I could have set my specifications in various X +initialization files. For example, I could set the foreground, +background, cursor, and pointer (i.e., mouse) colors in my +`~/.Xresources' file like this: + + Emacs*foreground: white + Emacs*background: darkblue + Emacs*cursorColor: white + Emacs*pointerColor: white + +In any event, since it is not part of Emacs, I set the root color of my +X window in my `~/.xinitrc' file, like this(1): + + xsetroot -solid Navy -fg white & + +---------- Footnotes ---------- + +(1) I also run more modern window managers, such as Enlightenment, +Gnome, or KDE; in those cases, I often specify an image rather than a +plain color. + + +File: eintr, Node: Miscellaneous, Next: Mode Line, Prev: X11 Colors, Up: Emacs Initialization + +16.13 Miscellaneous Settings for a `.emacs' File +================================================ + +Here are a few miscellaneous settings: + + - Set the shape and color of the mouse cursor: + + ; Cursor shapes are defined in + ; `/usr/include/X11/cursorfont.h'; + ; for example, the `target' cursor is number 128; + ; the `top_left_arrow' cursor is number 132. + + (let ((mpointer (x-get-resource "*mpointer" + "*emacs*mpointer"))) + ;; If you have not set your mouse pointer + ;; then set it, otherwise leave as is: + (if (eq mpointer nil) + (setq mpointer "132")) ; top_left_arrow + (setq x-pointer-shape (string-to-int mpointer)) + (set-mouse-color "white")) + + - Or you can set the values of a variety of features in an alist, + like this: + + (setq-default + default-frame-alist + '((cursor-color . "white") + (mouse-color . "white") + (foreground-color . "white") + (background-color . "DodgerBlue4") + ;; (cursor-type . bar) + (cursor-type . box) + (tool-bar-lines . 0) + (menu-bar-lines . 1) + (width . 80) + (height . 58) + (font . + "-Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO8859-1") + )) + + - Convert `<CTRL>-h' into <DEL> and <DEL> into `<CTRL>-h'. + (Some older keyboards needed this, although I have not seen the + problem recently.) + + ;; Translate `C-h' to <DEL>. + ; (keyboard-translate ?\C-h ?\C-?) + + ;; Translate <DEL> to `C-h'. + (keyboard-translate ?\C-? ?\C-h) + + - Turn off a blinking cursor! + + (if (fboundp 'blink-cursor-mode) + (blink-cursor-mode -1)) + + or start GNU Emacs with the command `emacs -nbc'. + + - Ignore case when using `grep' + `-n' Prefix each line of output with line number + `-i' Ignore case distinctions + `-e' Protect patterns beginning with a hyphen character, `-' + + (setq grep-command "grep -n -i -e ") + + - Find an existing buffer, even if it has a different name + This avoids problems with symbolic links. + + (setq find-file-existing-other-name t) + + - Set your language environment and default input method + + (set-language-environment "latin-1") + ;; Remember you can enable or disable multilingual text input + ;; with the `toggle-input-method'' (C-\) command + (setq default-input-method "latin-1-prefix") + + If you want to write with Chinese `GB' characters, set this + instead: + + (set-language-environment "Chinese-GB") + (setq default-input-method "chinese-tonepy") + +Fixing Unpleasant Key Bindings +.............................. + +Some systems bind keys unpleasantly. Sometimes, for example, the +<CTRL> key appears in an awkward spot rather than at the far left of +the home row. + +Usually, when people fix these sorts of keybindings, they do not change +their `~/.emacs' file. Instead, they bind the proper keys on their +consoles with the `loadkeys' or `install-keymap' commands in their boot +script and then include `xmodmap' commands in their `.xinitrc' or +`.Xsession' file for X Windows. + +For a boot script: + + loadkeys /usr/share/keymaps/i386/qwerty/emacs2.kmap.gz + +or + + install-keymap emacs2 + +For a `.xinitrc' or `.Xsession' file when the <Caps Lock> key is at the +far left of the home row: + + # Bind the key labeled `Caps Lock' to `Control' + # (Such a broken user interface suggests that keyboard manufacturers + # think that computers are typewriters from 1885.) + + xmodmap -e "clear Lock" + xmodmap -e "add Control = Caps_Lock" + +In a `.xinitrc' or `.Xsession' file, to convert an <ALT> key to a +<META> key: + + # Some ill designed keyboards have a key labeled ALT and no Meta + xmodmap -e "keysym Alt_L = Meta_L Alt_L" + + +File: eintr, Node: Mode Line, Prev: Miscellaneous, Up: Emacs Initialization + +16.14 A Modified Mode Line +========================== + +Finally, a feature I really like: a modified mode line. + +When I work over a network, I forget which machine I am using. Also, I +tend to I lose track of where I am, and which line point is on. + +So I reset my mode line to look like this: + + -:-- foo.texi rattlesnake:/home/bob/ Line 1 (Texinfo Fill) Top + +I am visiting a file called `foo.texi', on my machine `rattlesnake' in +my `/home/bob' buffer. I am on line 1, in Texinfo mode, and am at the +top of the buffer. + +My `.emacs' file has a section that looks like this: + + ;; Set a Mode Line that tells me which machine, which directory, + ;; and which line I am on, plus the other customary information. + (setq default-mode-line-format + (quote + (#("-" 0 1 + (help-echo + "mouse-1: select window, mouse-2: delete others ...")) + mode-line-mule-info + mode-line-modified + mode-line-frame-identification + " " + mode-line-buffer-identification + " " + (:eval (substring + (system-name) 0 (string-match "\\..+" (system-name)))) + ":" + default-directory + #(" " 0 1 + (help-echo + "mouse-1: select window, mouse-2: delete others ...")) + (line-number-mode " Line %l ") + global-mode-string + #(" %[(" 0 6 + (help-echo + "mouse-1: select window, mouse-2: delete others ...")) + (:eval (mode-line-mode-name)) + mode-line-process + minor-mode-alist + #("%n" 0 2 (help-echo "mouse-2: widen" local-map (keymap ...))) + ")%] " + (-3 . "%P") + ;; "-%-" + ))) + +Here, I redefine the default mode line. Most of the parts are from the +original; but I make a few changes. I set the _default_ mode line +format so as to permit various modes, such as Info, to override it. + +Many elements in the list are self-explanatory: `mode-line-modified' is +a variable that tells whether the buffer has been modified, `mode-name' +tells the name of the mode, and so on. However, the format looks +complicated because of two features we have not discussed. + +The first string in the mode line is a dash or hyphen, `-'. In the old +days, it would have been specified simply as `"-"'. But nowadays, +Emacs can add properties to a string, such as highlighting or, as in +this case, a help feature. If you place your mouse cursor over the +hyphen, some help information appears (By default, you must wait +seven-tenths of a second before the information appears. You can +change that timing by changing the value of `tooltip-delay'.) + +The new string format has a special syntax: + + #("-" 0 1 (help-echo "mouse-1: select window, ...")) + +The `#(' begins a list. The first element of the list is the string +itself, just one `-'. The second and third elements specify the range +over which the fourth element applies. A range starts _after_ a +character, so a zero means the range starts just before the first +character; a 1 means that the range ends just after the first +character. The third element is the property for the range. It +consists of a property list, a property name, in this case, +`help-echo', followed by a value, in this case, a string. The second, +third, and fourth elements of this new string format can be repeated. + +*Note Text Properties: (elisp)Text Properties, and see *Note Mode Line +Format: (elisp)Mode Line Format, for more information. + +`mode-line-buffer-identification' displays the current buffer name. It +is a list beginning `(#("%12b" 0 4 ...'. The `#(' begins the list. + +The `"%12b"' displays the current buffer name, using the `buffer-name' +function with which we are familiar; the `12' specifies the maximum +number of characters that will be displayed. When a name has fewer +characters, whitespace is added to fill out to this number. (Buffer +names can and often should be longer than 12 characters; this length +works well in a typical 80 column wide window.) + +`:eval' was a new feature in GNU Emacs version 21. It says to evaluate +the following form and use the result as a string to display. In this +case, the expression displays the first component of the full system +name. The end of the first component is a `.' (`period'), so I use the +`string-match' function to tell me the length of the first component. +The substring from the zeroth character to that length is the name of +the machine. + +This is the expression: + + (:eval (substring + (system-name) 0 (string-match "\\..+" (system-name)))) + +`%[' and `%]' cause a pair of square brackets to appear for each +recursive editing level. `%n' says `Narrow' when narrowing is in +effect. `%P' tells you the percentage of the buffer that is above the +bottom of the window, or `Top', `Bottom', or `All'. (A lower case `p' +tell you the percentage above the _top_ of the window.) `%-' inserts +enough dashes to fill out the line. + +Remember, "You don't have to like Emacs to like it" -- your own Emacs +can have different colors, different commands, and different keys than +a default Emacs. + +On the other hand, if you want to bring up a plain `out of the box' +Emacs, with no customization, type: + + emacs -q + +This will start an Emacs that does _not_ load your `~/.emacs' +initialization file. A plain, default Emacs. Nothing more. + + +File: eintr, Node: Debugging, Next: Conclusion, Prev: Emacs Initialization, Up: Top + +17 Debugging +************ + +GNU Emacs has two debuggers, `debug' and `edebug'. The first is built +into the internals of Emacs and is always with you; the second requires +that you instrument a function before you can use it. + +Both debuggers are described extensively in *Note Debugging Lisp +Programs: (elisp)Debugging. In this chapter, I will walk through a +short example of each. + +* Menu: + +* debug:: +* debug-on-entry:: +* debug-on-quit:: +* edebug:: +* Debugging Exercises:: + + +File: eintr, Node: debug, Next: debug-on-entry, Prev: Debugging, Up: Debugging + +17.1 `debug' +============ + +Suppose you have written a function definition that is intended to +return the sum of the numbers 1 through a given number. (This is the +`triangle' function discussed earlier. *Note Example with Decrementing +Counter: Decrementing Example, for a discussion.) + +However, your function definition has a bug. You have mistyped `1=' +for `1-'. Here is the broken definition: + + (defun triangle-bugged (number) + "Return sum of numbers 1 through NUMBER inclusive." + (let ((total 0)) + (while (> number 0) + (setq total (+ total number)) + (setq number (1= number))) ; Error here. + total)) + +If you are reading this in Info, you can evaluate this definition in +the normal fashion. You will see `triangle-bugged' appear in the echo +area. + +Now evaluate the `triangle-bugged' function with an argument of 4: + + (triangle-bugged 4) + +In GNU Emacs version 21, you will create and enter a `*Backtrace*' +buffer that says: + + + ---------- Buffer: *Backtrace* ---------- + Debugger entered--Lisp error: (void-function 1=) + (1= number) + (setq number (1= number)) + (while (> number 0) (setq total (+ total number)) + (setq number (1= number))) + (let ((total 0)) (while (> number 0) (setq total ...) + (setq number ...)) total) + triangle-bugged(4) + eval((triangle-bugged 4)) + eval-last-sexp-1(nil) + eval-last-sexp(nil) + call-interactively(eval-last-sexp) + ---------- Buffer: *Backtrace* ---------- + +(I have reformatted this example slightly; the debugger does not fold +long lines. As usual, you can quit the debugger by typing `q' in the +`*Backtrace*' buffer.) + +In practice, for a bug as simple as this, the `Lisp error' line will +tell you what you need to know to correct the definition. The function +`1=' is `void'. + +However, suppose you are not quite certain what is going on? You can +read the complete backtrace. + +In this case, you need to run GNU Emacs 22, which automatically starts +the debugger that puts you in the `*Backtrace*' buffer; or else, you +need to start the debugger manually as described below. + +Read the `*Backtrace*' buffer from the bottom up; it tells you what +Emacs did that led to the error. Emacs made an interactive call to +`C-x C-e' (`eval-last-sexp'), which led to the evaluation of the +`triangle-bugged' expression. Each line above tells you what the Lisp +interpreter evaluated next. + +The third line from the top of the buffer is + + (setq number (1= number)) + +Emacs tried to evaluate this expression; in order to do so, it tried to +evaluate the inner expression shown on the second line from the top: + + (1= number) + +This is where the error occurred; as the top line says: + + Debugger entered--Lisp error: (void-function 1=) + +You can correct the mistake, re-evaluate the function definition, and +then run your test again. + + +File: eintr, Node: debug-on-entry, Next: debug-on-quit, Prev: debug, Up: Debugging + +17.2 `debug-on-entry' +===================== + +GNU Emacs 22 starts the debugger automatically when your function has +an error. + +Incidentally, you can start the debugger manually for all versions of +Emacs; the advantage is that the debugger runs even if you do not have +a bug in your code. Sometimes your code will be free of bugs! + +You can enter the debugger when you call the function by calling +`debug-on-entry'. + +Type: + + M-x debug-on-entry RET triangle-bugged RET + +Now, evaluate the following: + + (triangle-bugged 5) + +All versions of Emacs will create a `*Backtrace*' buffer and tell you +that it is beginning to evaluate the `triangle-bugged' function: + + ---------- Buffer: *Backtrace* ---------- + Debugger entered--entering a function: + * triangle-bugged(5) + eval((triangle-bugged 5)) + eval-last-sexp-1(nil) + eval-last-sexp(nil) + call-interactively(eval-last-sexp) + ---------- Buffer: *Backtrace* ---------- + +In the `*Backtrace*' buffer, type `d'. Emacs will evaluate the first +expression in `triangle-bugged'; the buffer will look like this: + + ---------- Buffer: *Backtrace* ---------- + Debugger entered--beginning evaluation of function call form: + * (let ((total 0)) (while (> number 0) (setq total ...) + (setq number ...)) total) + * triangle-bugged(5) + eval((triangle-bugged 5)) + eval-last-sexp-1(nil) + eval-last-sexp(nil) + call-interactively(eval-last-sexp) + ---------- Buffer: *Backtrace* ---------- + +Now, type `d' again, eight times, slowly. Each time you type `d', +Emacs will evaluate another expression in the function definition. + +Eventually, the buffer will look like this: + + ---------- Buffer: *Backtrace* ---------- + Debugger entered--beginning evaluation of function call form: + * (setq number (1= number)) + * (while (> number 0) (setq total (+ total number)) + (setq number (1= number))) + * (let ((total 0)) (while (> number 0) (setq total ...) + (setq number ...)) total) + * triangle-bugged(5) + eval((triangle-bugged 5)) + eval-last-sexp-1(nil) + eval-last-sexp(nil) + call-interactively(eval-last-sexp) + ---------- Buffer: *Backtrace* ---------- + +Finally, after you type `d' two more times, Emacs will reach the error, +and the top two lines of the `*Backtrace*' buffer will look like this: + + ---------- Buffer: *Backtrace* ---------- + Debugger entered--Lisp error: (void-function 1=) + * (1= number) + ... + ---------- Buffer: *Backtrace* ---------- + +By typing `d', you were able to step through the function. + +You can quit a `*Backtrace*' buffer by typing `q' in it; this quits the +trace, but does not cancel `debug-on-entry'. + +To cancel the effect of `debug-on-entry', call `cancel-debug-on-entry' +and the name of the function, like this: + + M-x cancel-debug-on-entry RET triangle-bugged RET + +(If you are reading this in Info, cancel `debug-on-entry' now.) + + +File: eintr, Node: debug-on-quit, Next: edebug, Prev: debug-on-entry, Up: Debugging + +17.3 `debug-on-quit' and `(debug)' +================================== + +In addition to setting `debug-on-error' or calling `debug-on-entry', +there are two other ways to start `debug'. + +You can start `debug' whenever you type `C-g' (`keyboard-quit') by +setting the variable `debug-on-quit' to `t'. This is useful for +debugging infinite loops. + +Or, you can insert a line that says `(debug)' into your code where you +want the debugger to start, like this: + + (defun triangle-bugged (number) + "Return sum of numbers 1 through NUMBER inclusive." + (let ((total 0)) + (while (> number 0) + (setq total (+ total number)) + (debug) ; Start debugger. + (setq number (1= number))) ; Error here. + total)) + +The `debug' function is described in detail in *Note The Lisp Debugger: +(elisp)Debugger. + + +File: eintr, Node: edebug, Next: Debugging Exercises, Prev: debug-on-quit, Up: Debugging + +17.4 The `edebug' Source Level Debugger +======================================= + +Edebug is a source level debugger. Edebug normally displays the source +of the code you are debugging, with an arrow at the left that shows +which line you are currently executing. + +You can walk through the execution of a function, line by line, or run +quickly until reaching a "breakpoint" where execution stops. + +Edebug is described in *Note Edebug: (elisp)edebug. + +Here is a bugged function definition for `triangle-recursively'. *Note +Recursion in place of a counter: Recursive triangle function, for a +review of it. + + (defun triangle-recursively-bugged (number) + "Return sum of numbers 1 through NUMBER inclusive. + Uses recursion." + (if (= number 1) + 1 + (+ number + (triangle-recursively-bugged + (1= number))))) ; Error here. + +Normally, you would install this definition by positioning your cursor +after the function's closing parenthesis and typing `C-x C-e' +(`eval-last-sexp') or else by positioning your cursor within the +definition and typing `C-M-x' (`eval-defun'). (By default, the +`eval-defun' command works only in Emacs Lisp mode or in Lisp +Interactive mode.) + +However, to prepare this function definition for Edebug, you must first +"instrument" the code using a different command. You can do this by +positioning your cursor within the definition and typing + + M-x edebug-defun RET + +This will cause Emacs to load Edebug automatically if it is not already +loaded, and properly instrument the function. + +After instrumenting the function, place your cursor after the following +expression and type `C-x C-e' (`eval-last-sexp'): + + (triangle-recursively-bugged 3) + +You will be jumped back to the source for `triangle-recursively-bugged' +and the cursor positioned at the beginning of the `if' line of the +function. Also, you will see an arrowhead at the left hand side of +that line. The arrowhead marks the line where the function is +executing. (In the following examples, we show the arrowhead with +`=>'; in a windowing system, you may see the arrowhead as a solid +triangle in the window `fringe'.) + + =>-!-(if (= number 1) + +In the example, the location of point is displayed as `-!-' (in a +printed book, it is displayed with a five pointed star). + +If you now press <SPC>, point will move to the next expression to be +executed; the line will look like this: + + =>(if -!-(= number 1) + +As you continue to press <SPC>, point will move from expression to +expression. At the same time, whenever an expression returns a value, +that value will be displayed in the echo area. For example, after you +move point past `number', you will see the following: + + Result: 3 (#o3, #x3, ?\C-c) + +This means the value of `number' is 3, which is octal three, +hexadecimal three, and ASCII `control-c' (the third letter of the +alphabet, in case you need to know this information). + +You can continue moving through the code until you reach the line with +the error. Before evaluation, that line looks like this: + + => -!-(1= number))))) ; Error here. + +When you press <SPC> once again, you will produce an error message that +says: + + Symbol's function definition is void: 1= + +This is the bug. + +Press `q' to quit Edebug. + +To remove instrumentation from a function definition, simply +re-evaluate it with a command that does not instrument it. For +example, you could place your cursor after the definition's closing +parenthesis and type `C-x C-e'. + +Edebug does a great deal more than walk with you through a function. +You can set it so it races through on its own, stopping only at an +error or at specified stopping points; you can cause it to display the +changing values of various expressions; you can find out how many times +a function is called, and more. + +Edebug is described in *Note Edebug: (elisp)edebug. + + +File: eintr, Node: Debugging Exercises, Prev: edebug, Up: Debugging + +17.5 Debugging Exercises +======================== + + * Install the `count-words-region' function and then cause it to + enter the built-in debugger when you call it. Run the command on a + region containing two words. You will need to press `d' a + remarkable number of times. On your system, is a `hook' called + after the command finishes? (For information on hooks, see *Note + Command Loop Overview: (elisp)Command Overview.) + + * Copy `count-words-region' into the `*scratch*' buffer, instrument + the function for Edebug, and walk through its execution. The + function does not need to have a bug, although you can introduce + one if you wish. If the function lacks a bug, the walk-through + completes without problems. + + * While running Edebug, type `?' to see a list of all the Edebug + commands. (The `global-edebug-prefix' is usually `C-x X', i.e. + `<CTRL>-x' followed by an upper case `X'; use this prefix for + commands made outside of the Edebug debugging buffer.) + + * In the Edebug debugging buffer, use the `p' + (`edebug-bounce-point') command to see where in the region the + `count-words-region' is working. + + * Move point to some spot further down the function and then type the + `h' (`edebug-goto-here') command to jump to that location. + + * Use the `t' (`edebug-trace-mode') command to cause Edebug to walk + through the function on its own; use an upper case `T' for + `edebug-Trace-fast-mode'. + + * Set a breakpoint, then run Edebug in Trace mode until it reaches + the stopping point. + + +File: eintr, Node: Conclusion, Next: the-the, Prev: Debugging, Up: Top + +18 Conclusion +************* + +We have now reached the end of this Introduction. You have now learned +enough about programming in Emacs Lisp to set values, to write simple +`.emacs' files for yourself and your friends, and write simple +customizations and extensions to Emacs. + +This is a place to stop. Or, if you wish, you can now go onward, and +teach yourself. + +You have learned some of the basic nuts and bolts of programming. But +only some. There are a great many more brackets and hinges that are +easy to use that we have not touched. + +A path you can follow right now lies among the sources to GNU Emacs and +in *Note The GNU Emacs Lisp Reference Manual: (elisp)Top. + +The Emacs Lisp sources are an adventure. When you read the sources and +come across a function or expression that is unfamiliar, you need to +figure out or find out what it does. + +Go to the Reference Manual. It is a thorough, complete, and fairly +easy-to-read description of Emacs Lisp. It is written not only for +experts, but for people who know what you know. (The `Reference +Manual' comes with the standard GNU Emacs distribution. Like this +introduction, it comes as a Texinfo source file, so you can read it +on-line and as a typeset, printed book.) + +Go to the other on-line help that is part of GNU Emacs: the on-line +documentation for all functions and variables, and `find-tags', the +program that takes you to sources. + +Here is an example of how I explore the sources. Because of its name, +`simple.el' is the file I looked at first, a long time ago. As it +happens some of the functions in `simple.el' are complicated, or at +least look complicated at first sight. The `open-line' function, for +example, looks complicated. + +You may want to walk through this function slowly, as we did with the +`forward-sentence' function. (*Note The `forward-sentence' function: +forward-sentence.) Or you may want to skip that function and look at +another, such as `split-line'. You don't need to read all the +functions. According to `count-words-in-defun', the `split-line' +function contains 102 words and symbols. + +Even though it is short, `split-line' contains expressions we have not +studied: `skip-chars-forward', `indent-to', `current-column' and +`insert-and-inherit'. + +Consider the `skip-chars-forward' function. (It is part of the +function definition for `back-to-indentation', which is shown in *Note +Review: Review.) + +In GNU Emacs, you can find out more about `skip-chars-forward' by +typing `C-h f' (`describe-function') and the name of the function. +This gives you the function documentation. + +You may be able to guess what is done by a well named function such as +`indent-to'; or you can look it up, too. Incidentally, the +`describe-function' function itself is in `help.el'; it is one of those +long, but decipherable functions. You can look up `describe-function' +using the `C-h f' command! + +In this instance, since the code is Lisp, the `*Help*' buffer contains +the name of the library containing the function's source. You can put +point over the name of the library and press the RET key, which in this +situation is bound to `help-follow', and be taken directly to the +source, in the same way as `M-.' (`find-tag'). + +The definition for `describe-function' illustrates how to customize the +`interactive' expression without using the standard character codes; +and it shows how to create a temporary buffer. + +(The `indent-to' function is written in C rather than Emacs Lisp; it is +a `built-in' function. `help-follow' takes you to its source as does +`find-tag', when properly set up.) + +You can look at a function's source using `find-tag', which is bound to +`M-.' Finally, you can find out what the Reference Manual has to say +by visiting the manual in Info, and typing `i' (`Info-index') and the +name of the function, or by looking up the function in the index to a +printed copy of the manual. + +Similarly, you can find out what is meant by `insert-and-inherit'. + +Other interesting source files include `paragraphs.el', `loaddefs.el', +and `loadup.el'. The `paragraphs.el' file includes short, easily +understood functions as well as longer ones. The `loaddefs.el' file +contains the many standard autoloads and many keymaps. I have never +looked at it all; only at parts. `loadup.el' is the file that loads +the standard parts of Emacs; it tells you a great deal about how Emacs +is built. (*Note Building Emacs: (elisp)Building Emacs, for more about +building.) + +As I said, you have learned some nuts and bolts; however, and very +importantly, we have hardly touched major aspects of programming; I +have said nothing about how to sort information, except to use the +predefined `sort' function; I have said nothing about how to store +information, except to use variables and lists; I have said nothing +about how to write programs that write programs. These are topics for +another, and different kind of book, a different kind of learning. + +What you have done is learn enough for much practical work with GNU +Emacs. What you have done is get started. This is the end of a +beginning. + + +File: eintr, Node: the-the, Next: Kill Ring, Prev: Conclusion, Up: Top + +Appendix A The `the-the' Function +********************************* + +Sometimes when you you write text, you duplicate words--as with "you +you" near the beginning of this sentence. I find that most frequently, +I duplicate "the"; hence, I call the function for detecting duplicated +words, `the-the'. + +As a first step, you could use the following regular expression to +search for duplicates: + + \\(\\w+[ \t\n]+\\)\\1 + +This regexp matches one or more word-constituent characters followed by +one or more spaces, tabs, or newlines. However, it does not detect +duplicated words on different lines, since the ending of the first +word, the end of the line, is different from the ending of the second +word, a space. (For more information about regular expressions, see +*Note Regular Expression Searches: Regexp Search, as well as *Note +Syntax of Regular Expressions: (emacs)Regexps, and *Note Regular +Expressions: (elisp)Regular Expressions.) + +You might try searching just for duplicated word-constituent characters +but that does not work since the pattern detects doubles such as the +two occurrences of `th' in `with the'. + +Another possible regexp searches for word-constituent characters +followed by non-word-constituent characters, reduplicated. Here, +`\\w+' matches one or more word-constituent characters and `\\W*' +matches zero or more non-word-constituent characters. + + \\(\\(\\w+\\)\\W*\\)\\1 + +Again, not useful. + +Here is the pattern that I use. It is not perfect, but good enough. +`\\b' matches the empty string, provided it is at the beginning or end +of a word; `[^@ \n\t]+' matches one or more occurrences of any +characters that are _not_ an @-sign, space, newline, or tab. + + \\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b + +One can write more complicated expressions, but I found that this +expression is good enough, so I use it. + +Here is the `the-the' function, as I include it in my `.emacs' file, +along with a handy global key binding: + + (defun the-the () + "Search forward for for a duplicated word." + (interactive) + (message "Searching for for duplicated words ...") + (push-mark) + ;; This regexp is not perfect + ;; but is fairly good over all: + (if (re-search-forward + "\\b\\([^@ \n\t]+\\)[ \n\t]+\\1\\b" nil 'move) + (message "Found duplicated word.") + (message "End of buffer"))) + + ;; Bind `the-the' to C-c \ + (global-set-key "\C-c\\" 'the-the) + + +Here is test text: + + one two two three four five + five six seven + +You can substitute the other regular expressions shown above in the +function definition and try each of them on this list. + + +File: eintr, Node: Kill Ring, Next: Full Graph, Prev: the-the, Up: Top + +Appendix B Handling the Kill Ring +********************************* + +The kill ring is a list that is transformed into a ring by the workings +of the `current-kill' function. The `yank' and `yank-pop' commands use +the `current-kill' function. + +This appendix describes the `current-kill' function as well as both the +`yank' and the `yank-pop' commands, but first, consider the workings of +the kill ring. + +The kill ring has a default maximum length of sixty items; this number +is too large for an explanation. Instead, set it to four. Please +evaluate the following: + + (setq old-kill-ring-max kill-ring-max) + (setq kill-ring-max 4) + +Then, please copy each line of the following indented example into the +kill ring. You may kill each line with `C-k' or mark it and copy it +with `M-w'. + +(In a read-only buffer, such as the `*info*' buffer, the kill command, +`C-k' (`kill-line'), will not remove the text, merely copy it to the +kill ring. However, your machine may beep at you. (`kill-line' calls +`kill-region'.) Alternatively, for silence, you may copy the region of +each line with the `M-w' (`kill-ring-save') command. You must mark +each line for this command to succeed, but it does not matter at which +end you put point or mark.) + +Please invoke the calls in order, so that five elements attempt to fill +the kill ring: + + first some text + second piece of text + third line + fourth line of text + fifth bit of text + +Then find the value of `kill-ring' by evaluating + + kill-ring + +It is: + + ("fifth bit of text" "fourth line of text" + "third line" "second piece of text") + +The first element, `first some text', was dropped. + +To return to the old value for the length of the kill ring, evaluate: + + (setq kill-ring-max old-kill-ring-max) + +* Menu: + +* current-kill:: +* yank:: +* yank-pop:: +* ring file:: + + +File: eintr, Node: current-kill, Next: yank, Prev: Kill Ring, Up: Kill Ring + +B.1 The `current-kill' Function +=============================== + +The `current-kill' function changes the element in the kill ring to +which `kill-ring-yank-pointer' points. (Also, the `kill-new' function +sets `kill-ring-yank-pointer' to point to the latest element of the the +kill ring.) + +The `current-kill' function is used by `yank' and by `yank-pop'. Here +is the code for `current-kill': + + (defun current-kill (n &optional do-not-move) + "Rotate the yanking point by N places, and then return that kill. + If N is zero, `interprogram-paste-function' is set, and calling it + returns a string, then that string is added to the front of the + kill ring and returned as the latest kill. + If optional arg DO-NOT-MOVE is non-nil, then don't actually move the + yanking point; just return the Nth kill forward." + (let ((interprogram-paste (and (= n 0) + interprogram-paste-function + (funcall interprogram-paste-function)))) + (if interprogram-paste + (progn + ;; Disable the interprogram cut function when we add the new + ;; text to the kill ring, so Emacs doesn't try to own the + ;; selection, with identical text. + (let ((interprogram-cut-function nil)) + (kill-new interprogram-paste)) + interprogram-paste) + (or kill-ring (error "Kill ring is empty")) + (let ((ARGth-kill-element + (nthcdr (mod (- n (length kill-ring-yank-pointer)) + (length kill-ring)) + kill-ring))) + (or do-not-move + (setq kill-ring-yank-pointer ARGth-kill-element)) + (car ARGth-kill-element))))) + +In addition, the `kill-new' function sets `kill-ring-yank-pointer' to +the latest element of the the kill ring. And indirectly so does +`kill-append', since it calls `kill-new'. In addition, `kill-region' +and `kill-line' call the `kill-new' function. + +Here is the line in `kill-new', which is explained in *Note The +`kill-new' function: kill-new function. + + (setq kill-ring-yank-pointer kill-ring) + +* Menu: + +* Understanding current-kill:: +