Mercurial > emacs

--- a/lispref/processes.texi	Fri Jun 17 13:47:44 2005 +0000
+++ b/lispref/processes.texi	Fri Jun 17 13:51:19 2005 +0000
@@ -52,6 +52,7 @@
 * Datagrams::                UDP network connections.
 * Low-Level Network::        Lower-level but more general function
                                to create connections and servers.
+* Byte Packing::             Using bindat to pack and unpack binary data.
 @end menu

 @node Subprocess Creation
@@ -2015,6 +2016,407 @@
 @code{make-network-process} and @code{set-network-process-option}.
 @end table

+@node Byte Packing
+@section Packing and Unpacking Byte Arrays
+
+  This section describes how to pack and unpack arrays of bytes,
+usually for binary network protocols.  These functoins byte arrays to
+alists, and vice versa.  The byte array can be represented as a
+unibyte string or as a vector of integers, while the alist associates
+symbols either with fixed-size objects or with recursive sub-alists.
+
+@cindex serializing
+@cindex deserializing
+@cindex packing
+@cindex unpacking
+  Conversion from byte arrays to nested alists is also known as
+@dfn{deserializing} or @dfn{unpacking}, while going in the opposite
+direction is also known as @dfn{serializing} or @dfn{packing}.
+
+@menu
+* Bindat Spec::         Describing data layout.
+* Bindat Functions::    Doing the unpacking and packing.
+* Bindat Examples::     Samples of what bindat.el can do for you!
+@end menu
+
+@node Bindat Spec
+@subsection Describing Data Layout
+
+  To control unpacking and packing, you write a @dfn{data layout
+specification}, a special nested list describing named and typed
+@dfn{fields}.  This specification conrtols length of each field to be
+processed, and how to pack or unpack it.
+
+@cindex endianness
+@cindex big endian
+@cindex little endian
+@cindex network byte ordering
+  A field's @dfn{type} describes the size (in bytes) of the object
+that the field represents and, in the case of multibyte fields, how
+the bytes are ordered within the firld.  The two possible orderings
+are ``big endian'' (also known as ``network byte ordering'') and
+``little endian''.  For instance, the number @code{#x23cd} (decimal
+9165) in big endian would be the two bytes @code{#x23} @code{#xcd};
+and in little endian, @code{#xcd} @code{#x23}.  Here are the possible
+type values:
+
+@table @code
+@item u8
+@itemx byte
+Unsigned byte, with length 1.
+
+@item u16
+@itemx word
+@itemx short
+Unsigned integer in network byte order, with length 2.
+
+@item u24
+Unsigned integer in network byte order, with length 3.
+
+@item u32
+@itemx dword
+@itemx long
+Unsigned integer in network byte order, with length 4.
+Note: These values may be limited by Emacs' integer implementation limits.
+
+@item u16r
+@itemx u24r
+@itemx u32r
+Unsigned integer in little endian order, with length 2, 3 and 4, respectively.
+
+@item str @var{len}
+String of length @var{len}.
+
+@item strz @var{len}
+Zero-terminated string of length @var{len}.
+
+@item vec @var{len}
+Vector of @var{len} bytes.
+
+@item ip
+Four-byte vector representing an Internet address.  For example:
+@code{[127 0 0 1]} for localhost.
+
+@item bits @var{len}
+List of set bits in @var{len} bytes.  The bytes are taken in big
+endian order and the bits are numbered starting with @code{8 *
+@var{len} @minus{} 1}} and ending with zero.  For example: @code{bits
+2} unpacks @code{#x28} @code{#x1c} to @code{(2 3 4 11 13)} and
+@code{#x1c} @code{#x28} to @code{(3 5 10 11 12)}.
+
+@item (eval @var{form})
+@var{form} is a Lisp expression evaluated at the moment the field is
+unpacked or packed.  The result of the evaluation should be one of the
+above-listed type specifications.
+@end table
+
+A field specification generally has the form @code{([@var{name}]
+@var{handler})}.  The square braces indicate that @var{name} is
+optional.  (Don't use names that are symbols meaningful as type
+specifications (above) or handler specifications (below), since that
+would be ambiguous.)  @var{name} can be a symbol or the expression
+@code{(eval @var{form})}, in which case @var{form} should evaluate to
+a symbol.
+
+@var{handler} describes how to unpack or pack the field and can be one
+of the following:
+
+@table @code
+@item @var{type}
+Unpack/pack this field according to the type specification @var{type}.
+
+@item eval @var{form}
+Evaluate @var{form}, a Lisp expression, for side-effect only.  If the
+field name is specified, the value is bound to that field name.
+@var{form} can access and update these dynamically bound variables:
+
+@table @code
+@item raw-data
+The data as a byte array.
+
+@item pos
+Current position of the unpacking or packing operation.
+
+@item struct
+Alist.
+
+@item last
+Value of the last field processed.
+@end table
+
+@item fill @var{len}
+Skip @var{len} bytes.  In packing, this leaves them unchanged,
+which normally means they remain zero.  In unpacking, this means
+they are ignored.
+
+@item align @var{len}
+Skip to the next multiple of @var{len} bytes.
+
+@item struct @var{spec-name}
+Process @var{spec-name} as a sub-specification.  This descrobes a
+structure nested within another structure.
+
+@item union @var{form} (@var{tag} @var{spec})@dots{}
+@c ??? I don't see how one would actually  use this.
+@c ??? what kind of expression would be useful for @var{form}?
+Evaluate @var{form}, a Lisp expression, find the first @var{tag}
+that matches it, and process its associated data layout specification
+@var{spec}.  Matching can occur in one of three ways:
+
+@itemize
+@item
+If a @var{tag} has the form @code{(eval @var{expr})}, evaluate
+@var{expr} with the variable @code{tag} dynamically bound to the value
+of @var{form}.  A non-@code{nil} result indicates a match.
+
+@item
+@var{tag} matches if it is @code{equal} to the value of @var{form}.
+
+@item
+@var{tag} matches unconditionally if it is @code{t}.
+@end itemize
+
+@item repeat @var{count} @var{field-spec}@dots{}
+@var{count} may be an integer, or a list of one element naming a
+previous field.  For correct operation, each @var{field-spec} must
+include a name.
+@c ??? What does it MEAN?
+@end table
+
+@node Bindat Functions
+@subsection Functions to Unpack and Pack Bytes
+
+  In the following documentation, @var{spec} refers to a data layout
+specification, @code{raw-data} to a byte array, and @var{struct} to an
+alist representing unpacked field data.
+
+@defun bindat-unpack spec raw-data &optional pos
+This function unpacks data from the byte array @code{raw-data}
+according to @var{spec}.  Normally this starts unpacking at the
+beginning of the byte array, but if @var{pos} is non-@code{nil}, it
+specifies a zero-based starting position to use instead.
+
+The value is an alist or nested alist in which each element describes
+one unpacked field.
+@end defun
+
+@defun bindat-get-field struct &rest name
+This function selects a field's data from the nested alist
+@var{struct}.  Usually @var{struct} was returned by
+@code{bindat-unpack}.  If @var{name} corresponds to just one argument,
+that means to extract a top-level field value.  Multiple @var{name}
+arguments specify repeated lookup of sub-structures.  An integer name
+acts as an array index.
+
+For example, if @var{name} is @code{(a b 2 c)}, that means to find
+field @code{c} in the second element of subfield @code{b} of field
+@code{a}.  (This corresponds to @code{struct.a.b[2].c} in C.)
+@end defun
+
+@defun bindat-length spec struct
+@c ??? I don't understand this at all -- rms
+This function returns the length in bytes of @var{struct}, according
+to @var{spec}.
+@end defun
+
+@defun bindat-pack spec struct &optional raw-data pos
+This function returns a byte array packed according to @var{spec} from
+the data in the alist @var{struct}.  Normally it creates and fills a
+new byte array starting at the beginning.  However, if @var{raw-data}
+is non-@code{nil}, it speciries a pre-allocated string or vector to
+pack into.  If @var{pos} is non-@code{nil}, it specifies the starting
+offset for packing into @code{raw-data}.
+
+@c ??? Isn't this a bug?  Shoudn't it always be unibyte?
+Note: The result is a multibyte string; use @code{string-make-unibyte}
+on it to make it unibyte if necessary.
+@end defun
+
+@defun bindat-ip-to-string ip
+Convert the Internet address vector @var{ip} to a string in the usual
+dotted notation.
+
+@example
+(bindat-ip-to-string [127 0 0 1])
+     @result{} "127.0.0.1"
+@end example
+@end defun
+
+@node Bindat Examples
+@subsection Examples of Byte Unpacking and Packing
+
+  Here is a complete example of byte unpacking and packing:
+
+  @lisp
+(defvar fcookie-index-spec
+  '((:version  u32)
+    (:count    u32)
+    (:longest  u32)
+    (:shortest u32)
+    (:flags    u32)
+    (:delim    u8)
+    (:ignored  fill 3)
+    (:offset   repeat (:count)
+               (:foo u32)))
+  "Description of a fortune cookie index file's contents.")
+
+(defun fcookie (cookies &optional index)
+  "Display a random fortune cookie from file COOKIES.
+Optional second arg INDEX specifies the associated index
+filename, which is by default constructed by appending
+\".dat\" to COOKIES.  Display cookie text in possibly
+new buffer \"*Fortune Cookie: BASENAME*\" where BASENAME
+is COOKIES without the directory part."
+  (interactive "fCookies file: ")
+  (let* ((info (with-temp-buffer
+                 (insert-file-contents-literally
+                  (or index (concat cookies ".dat")))
+                 (bindat-unpack fcookie-index-spec
+                                (buffer-string))))
+         (sel (random (bindat-get-field info :count)))
+         (beg (cdar (bindat-get-field info :offset sel)))
+         (end (or (cdar (bindat-get-field info :offset (1+ sel)))
+                  (nth 7 (file-attributes cookies)))))
+    (switch-to-buffer (get-buffer-create
+                       (format "*Fortune Cookie: %s*"
+                               (file-name-nondirectory cookies))))
+    (erase-buffer)
+    (insert-file-contents-literally cookies nil beg (- end 3))))
+
+(defun fcookie-create-index (cookies &optional index delim)
+  "Scan file COOKIES, and write out its index file.
+Optional second arg INDEX specifies the index filename,
+which is by default constructed by appending \".dat\" to
+COOKIES.  Optional third arg DELIM specifies the unibyte
+character which, when found on a line of its own in
+COOKIES, indicates the border between entries."
+  (interactive "fCookies file: ")
+  (setq delim (or delim ?%))
+  (let ((delim-line (format "\n%c\n" delim))
+        (count 0)
+        (max 0)
+        min p q len offsets)
+    (unless (= 3 (string-bytes delim-line))
+      (error "Delimiter cannot be represented in one byte"))
+    (with-temp-buffer
+      (insert-file-contents-literally cookies)
+      (while (and (setq p (point))
+                  (search-forward delim-line (point-max) t)
+                  (setq len (- (point) 3 p)))
+        (setq count (1+ count)
+              max (max max len)
+              min (min (or min max) len)
+              offsets (cons (1- p) offsets))))
+    (with-temp-buffer
+      (set-buffer-multibyte nil)
+      (insert (string-make-unibyte
+               (bindat-pack
+                fcookie-index-spec
+                `((:version . 2)
+                  (:count . ,count)
+                  (:longest . ,max)
+                  (:shortest . ,min)
+                  (:flags . 0)
+                  (:delim . ,delim)
+                  (:offset . ,(mapcar (lambda (o)
+                                        (list (cons :foo o)))
+                                      (nreverse offsets)))))))
+      (let ((coding-system-for-write 'raw-text-unix))
+        (write-file (or index (concat cookies ".dat")))))))
+@end lisp
+
+Following is an example of defining and unpacking a complex structure.
+Consider the following C structures:
+
+@example
+struct header @{
+    unsigned long    dest_ip;
+    unsigned long    src_ip;
+    unsigned short   dest_port;
+    unsigned short   src_port;
+@};
+
+struct data @{
+    unsigned char    type;
+    unsigned char    opcode;
+    unsigned long    length;  /* In little endian order */
+    unsigned char    id[8];   /* nul-terminated string  */
+    unsigned char    data[/* (length + 3) & ~3 */];
+@};
+
+struct packet @{
+    struct header    header;
+    unsigned char    items;
+    unsigned char    filler[3];
+    struct data      item[/* items */];
+
+@};
+@end example
+
+The corresponding data layout specification:
+
+@lisp
+(setq header-spec
+      '((dest-ip   ip)
+        (src-ip    ip)
+        (dest-port u16)
+        (src-port  u16)))
+
+(setq data-spec
+      '((type      u8)
+        (opcode    u8)
+        (length    u16r) ;; little endian order
+        (id        strz 8)
+        (data      vec (length))
+        (align     4)))
+
+(setq packet-spec
+      '((header    struct header-spec)
+        (items     u8)
+        (fill      3)
+        (item      repeat (items)
+                   (struct data-spec))))
+@end lisp
+
+A binary data representation:
+
+@lisp
+(setq binary-data
+      [ 192 168 1 100 192 168 1 101 01 28 21 32 2 0 0 0
+        2 3 5 0 ?A ?B ?C ?D ?E ?F 0 0 1 2 3 4 5 0 0 0
+        1 4 7 0 ?B ?C ?D ?E ?F ?G 0 0 6 7 8 9 10 11 12 0 ])
+@end lisp
+
+The corresponding decoded structure:
+
+@lisp
+(setq decoded-structure (bindat-unpack packet-spec binary-data))
+     @result{}
+((header
+  (dest-ip   . [192 168 1 100])
+  (src-ip    . [192 168 1 101])
+  (dest-port . 284)
+  (src-port  . 5408))
+ (items . 2)
+ (item ((data . [1 2 3 4 5])
+        (id . "ABCDEF")
+        (length . 5)
+        (opcode . 3)
+        (type . 2))
+       ((data . [6 7 8 9 10 11 12])
+        (id . "BCDEFG")
+        (length . 7)
+        (opcode . 4)
+        (type . 1))))
+@end lisp
+
+Fetching data from this structure:
+
+@lisp
+(bindat-get-field decoded-structure 'item 1 'id)
+     @result{} "BCDEFG"
+@end lisp
+
 @ignore
    arch-tag: ba9da253-e65f-4e7f-b727-08fba0a1df7a
 @end ignore