view lisp/gnus/mm-url.el @ 61424:ad05d91d3598

Revision: miles@gnu.org--gnu-2005/emacs--cvs-trunk--0--patch-243 Merge from gnus--rel--5.10 Patches applied: * gnus--rel--5.10 (patch 59) - Update from CVS 2005-04-06 Katsumi Yamaoka <yamaoka@jpl.org> * lisp/calendar/time-date.el (time-to-seconds): Don't use the #xhhhh syntax which Emacs 20 doesn't support. (seconds-to-time, days-to-time, time-subtract, time-add): Ditto. 2005-04-06 Katsumi Yamaoka <yamaoka@jpl.org> * lisp/gnus/mm-util.el (mm-coding-system-p): Don't return binary for the nil argument in XEmacs. * lisp/gnus/nnrss.el (nnrss-compatible-encoding-alist): New variable. (nnrss-request-group): Decode group name first. (nnrss-request-article): Make a text/plain article if mml-to-mime failed. (nnrss-get-encoding): Return a compatible encoding according to nnrss-compatible-encoding-alist. (nnrss-opml-export): Use dolist. (nnrss-find-el): Use consp instead of listp. (nnrss-order-hrefs): Use dolist. 2005-04-06 Arne J,Ax(Brgensen <arne@arnested.dk> * lisp/gnus/nnrss.el (nnrss-verbose): Remove. (nnrss-request-group): Use `nnheader-message' instead. 2005-04-06 Mark Plaksin <happy@usg.edu> (tiny change) * lisp/gnus/nnrss.el (nnrss-verbose): New variable. (nnrss-request-group): Make it say nnrss is requesting a group. 2005-04-06 Katsumi Yamaoka <yamaoka@jpl.org> * lisp/gnus/gnus-agent.el (gnus-agent-group-path): Decode group name. (gnus-agent-group-pathname): Ditto. * lisp/gnus/gnus-cache.el (gnus-cache-file-name): Decode group name. * lisp/gnus/gnus-group.el (gnus-group-line-format-alist): Use decoded group name for only %g and %c. (gnus-group-insert-group-line): Bind gnus-tmp-decoded-group instead of gnus-tmp-group to decoded group name. (gnus-group-make-group): Decode group name. (gnus-group-delete-group): Ditto. (gnus-group-make-rss-group): Exclude `/'s from group names; register the group data after opening the nnrss group; unify non-ASCII group names; encode group name. (gnus-group-catchup-current): Decode group name. (gnus-group-expire-articles-1): Ditto. (gnus-group-set-current-level): Ditto. (gnus-group-kill-group): Ditto. * lisp/gnus/gnus-spec.el (gnus-update-format-specifications): Flush the group format spec cache if it doesn't support decoded group names. * lisp/gnus/mm-url.el (mm-url-predefined-programs): Add --silent arg to curl. * lisp/gnus/nnrss.el: Require rfc2047 and mml. (nnrss-file-coding-system): New variable. (nnrss-format-string): Redefine it as an inline function. (nnrss-decode-group-name): New function. (nnrss-string-as-multibyte): Remove. (nnrss-retrieve-headers): Decode group name; don't use nnrss-format-string. (nnrss-request-group): Decode group name. (nnrss-request-article): Decode group name; allow a Message-ID as well as an article number; don't use nnrss-format-string; encode a Message-ID string which may contain non-ASCII characters; use mml-to-mime to compose a MIME article; use search-forward instead of re-search-forward. (nnrss-request-expire-articles): Decode group name. (nnrss-request-delete-group): Delete entries in nnrss-group-alist as well; decode group name. (nnrss-get-encoding): Fix regexp. (nnrss-fetch): Clarify error message. (nnrss-read-server-data): Use insert-file-contents instead of load; bind file-name-coding-system; use multibyte buffer. (nnrss-save-server-data): Insert newline; bind coding-system-for-write to the value of nnrss-file-coding-system; bind file-name-coding-system; add coding cookie. (nnrss-read-group-data): Use insert-file-contents instead of load; bind file-name-coding-system; use multibyte buffer. (nnrss-save-group-data): Bind coding-system-for-write to the value of nnrss-file-coding-system; bind file-name-coding-system. (nnrss-decode-entities-string): Rename from n-d-e-unibyte-string; make it work with non-ASCII text. (nnrss-opml-export): Use mm-set-buffer-file-coding-system instead of set-buffer-file-coding-system. (nnrss-find-el): Check carefully whether there's a list of string which old xml.el may return rather than a string; make it work with old xml.el as well. 2005-04-06 Tsuyoshi AKIHO <akiho@kawachi.zaq.ne.jp> * lisp/gnus/gnus-sum.el (gnus-summary-walk-group-buffer): Decode group name. * lisp/gnus/nnrss.el (nnrss-get-encoding): New function. (nnrss-fetch): Use unibyte buffer initially; bind coding-system-for-read while performing mm-url-insert; remove ^Ms; decode contents according to the encoding attribute. (nnrss-save-group-data): Add coding cookie. (nnrss-mime-encode-string): New function. (nnrss-check-group): Use it to encode subject and author. 2005-04-06 Maciek Pasternacki <maciekp@japhy.fnord.org> (tiny change) * lisp/gnus/nnrss.el (nnrss-fetch): Signal an error if w3-parse-buffer also failed. 2005-04-06 Jesper Harder <harder@ifa.au.dk> * lisp/gnus/mm-util.el (mm-subst-char-in-string): Support inplace. * lisp/gnus/nnrss.el: Pedantic docstring and whitespace fixes (courtesy of checkdoc.el). (nnrss-request-article): Cleanup. (nnrss-request-delete-group): Use nnrss-make-filename. (nnrss-read-server-data): Use nnrss-make-filename; use load. (nnrss-save-server-data): Use nnrss-make-filename; use gnus-prin1. (nnrss-read-group-data): Fix off-by-one error. From Joakim Verona <joakim@verona.se>; hash on description if link is missing; use nnrss-make-filename; use load. (nnrss-save-group-data): Use nnrss-make-filename; use gnus-prin1. (nnrss-make-filename): New function. (nnrss-close): New function. (nnrss-check-group): Hash on description if link is missing. (nnrss-get-namespace-prefix): Use string= to compare strings! Reported by David D. Smith <davidsmith@acm.org>. (nnrss-opml-export): Turn on sgml-mode. 2005-04-06 Mark A. Hershberger <mah@everybody.org> * lisp/gnus/nnrss.el (nnrss-opml-import, nnrss-opml-export): New functions. 2005-04-06 Katsumi Yamaoka <yamaoka@jpl.org> * man/gnus.texi (RSS): Addition.
author Miles Bader <miles@gnu.org>
date Sun, 10 Apr 2005 04:20:14 +0000
parents 7503b2a24a3c
children 18a818a2ee7c
line wrap: on
line source

;;; mm-url.el --- a wrapper of url functions/commands for Gnus
;; Copyright (C) 2001, 2002, 2003 Free Software Foundation, Inc.

;; Author: Shenghuo Zhu <zsh@cs.rochester.edu>

;; This file is part of GNU Emacs.

;; GNU Emacs is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published
;; by the Free Software Foundation; either version 2, or (at your
;; option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful, but
;; WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING.  If not, write to the
;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
;; Boston, MA 02111-1307, USA.

;;; Commentary:

;; Some codes are stolen from w3 and url packages. Some are moved from
;; nnweb.

;; TODO: Support POST, cookie.

;;; Code:

(eval-when-compile (require 'cl))

(require 'mm-util)
(require 'gnus)

(eval-and-compile
  (autoload 'executable-find "executable"))

(eval-when-compile
  (if (featurep 'xemacs)
      (require 'timer-funcs)
    (require 'timer)))

(defgroup mm-url nil
  "A wrapper of url package and external url command for Gnus."
  :group 'gnus)

(defcustom mm-url-use-external (not
				(condition-case nil
				    (require 'url)
				  (error nil)))
  "*If non-nil, use external grab program `mm-url-program'."
  :version "22.1"
  :type 'boolean
  :group 'mm-url)

(defvar mm-url-predefined-programs
  '((wget "wget" "--user-agent=mm-url" "-q" "-O" "-")
    (w3m  "w3m" "-dump_source")
    (lynx "lynx" "-source")
    (curl "curl" "--silent")))

(defcustom mm-url-program
  (cond
   ((executable-find "wget") 'wget)
   ((executable-find "w3m") 'w3m)
   ((executable-find "lynx") 'lynx)
   ((executable-find "curl") 'curl)
   (t "GET"))
  "The url grab program.
Likely values are `wget', `w3m', `lynx' and `curl'."
  :version "22.1"
  :type '(choice
	  (symbol :tag "wget" wget)
	  (symbol :tag "w3m" w3m)
	  (symbol :tag "lynx" lynx)
	  (symbol :tag "curl" curl)
	  (string :tag "other"))
  :group 'mm-url)

(defcustom mm-url-arguments nil
  "The arguments for `mm-url-program'."
  :version "22.1"
  :type '(repeat string)
  :group 'mm-url)


;;; Internal variables

(defvar mm-url-package-name
  (gnus-replace-in-string
   (gnus-replace-in-string gnus-version " v.*$" "")
   " " "-"))

(defvar	mm-url-package-version gnus-version-number)

;; Stolen from w3.
(defvar mm-url-html-entities
  '(
    ;;(excl        .  33)
    (quot        .  34)
    ;;(num         .  35)
    ;;(dollar      .  36)
    ;;(percent     .  37)
    (amp         .  38)
    (rsquo       .  39)			; should be U+8217
    ;;(apos        .  39)
    ;;(lpar        .  40)
    ;;(rpar        .  41)
    ;;(ast         .  42)
    ;;(plus        .  43)
    ;;(comma       .  44)
    ;;(period      .  46)
    ;;(colon       .  58)
    ;;(semi        .  59)
    (lt          .  60)
    ;;(equals      .  61)
    (gt          .  62)
    ;;(quest       .  63)
    ;;(commat      .  64)
    ;;(lsqb        .  91)
    ;;(rsqb        .  93)
    (uarr        .  94)			; should be U+8593
    ;;(lowbar      .  95)
    (lsquo       .  96)			; should be U+8216
    (lcub        . 123)
    ;;(verbar      . 124)
    (rcub        . 125)
    (tilde       . 126)
    (nbsp        . 160)
    (iexcl       . 161)
    (cent        . 162)
    (pound       . 163)
    (curren      . 164)
    (yen         . 165)
    (brvbar      . 166)
    (sect        . 167)
    (uml         . 168)
    (copy        . 169)
    (ordf        . 170)
    (laquo       . 171)
    (not         . 172)
    (shy         . 173)
    (reg         . 174)
    (macr        . 175)
    (deg         . 176)
    (plusmn      . 177)
    (sup2        . 178)
    (sup3        . 179)
    (acute       . 180)
    (micro       . 181)
    (para        . 182)
    (middot      . 183)
    (cedil       . 184)
    (sup1        . 185)
    (ordm        . 186)
    (raquo       . 187)
    (frac14      . 188)
    (frac12      . 189)
    (frac34      . 190)
    (iquest      . 191)
    (Agrave      . 192)
    (Aacute      . 193)
    (Acirc       . 194)
    (Atilde      . 195)
    (Auml        . 196)
    (Aring       . 197)
    (AElig       . 198)
    (Ccedil      . 199)
    (Egrave      . 200)
    (Eacute      . 201)
    (Ecirc       . 202)
    (Euml        . 203)
    (Igrave      . 204)
    (Iacute      . 205)
    (Icirc       . 206)
    (Iuml        . 207)
    (ETH         . 208)
    (Ntilde      . 209)
    (Ograve      . 210)
    (Oacute      . 211)
    (Ocirc       . 212)
    (Otilde      . 213)
    (Ouml        . 214)
    (times       . 215)
    (Oslash      . 216)
    (Ugrave      . 217)
    (Uacute      . 218)
    (Ucirc       . 219)
    (Uuml        . 220)
    (Yacute      . 221)
    (THORN       . 222)
    (szlig       . 223)
    (agrave      . 224)
    (aacute      . 225)
    (acirc       . 226)
    (atilde      . 227)
    (auml        . 228)
    (aring       . 229)
    (aelig       . 230)
    (ccedil      . 231)
    (egrave      . 232)
    (eacute      . 233)
    (ecirc       . 234)
    (euml        . 235)
    (igrave      . 236)
    (iacute      . 237)
    (icirc       . 238)
    (iuml        . 239)
    (eth         . 240)
    (ntilde      . 241)
    (ograve      . 242)
    (oacute      . 243)
    (ocirc       . 244)
    (otilde      . 245)
    (ouml        . 246)
    (divide      . 247)
    (oslash      . 248)
    (ugrave      . 249)
    (uacute      . 250)
    (ucirc       . 251)
    (uuml        . 252)
    (yacute      . 253)
    (thorn       . 254)
    (yuml        . 255)

    ;; Special handling of these
    (frac56      . "5/6")
    (frac16      . "1/6")
    (frac45      . "4/5")
    (frac35      . "3/5")
    (frac25      . "2/5")
    (frac15      . "1/5")
    (frac23      . "2/3")
    (frac13      . "1/3")
    (frac78      . "7/8")
    (frac58      . "5/8")
    (frac38      . "3/8")
    (frac18      . "1/8")

    ;; The following 5 entities are not mentioned in the HTML 2.0
    ;; standard, nor in any other HTML proposed standard of which I
    ;; am aware.  I am not even sure they are ISO entity names.  ***
    ;; Hence, some arrangement should be made to give a bad HTML
    ;; message when they are seen.
    (ndash       .  45)
    (mdash       .  45)
    (emsp        .  32)
    (ensp        .  32)
    (sim         . 126)
    (le          . "<=")
    (agr         . "alpha")
    (rdquo       . "''")
    (ldquo       . "``")
    (trade       . "(TM)")
    ;; To be done
    ;; (shy      . ????) ; soft hyphen
    )
  "*An assoc list of entity names and how to actually display them.")

(defconst mm-url-unreserved-chars
  '(
    ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z
    ?A ?B ?C ?D ?E ?F ?G ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y ?Z
    ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9
    ?- ?_ ?. ?! ?~ ?* ?' ?\( ?\))
  "A list of characters that are _NOT_ reserved in the URL spec.
This is taken from RFC 2396.")

(defun mm-url-load-url ()
  "Load `url-insert-file-contents'."
  (unless (condition-case ()
	      (require 'url-handlers)
	    (error nil))
    ;; w3-4.0pre0.46 or earlier version.
    (require 'w3-vars)
    (require 'url)))

;;;###autoload
(defun mm-url-insert-file-contents (url)
  "Insert file contents of URL.
If `mm-url-use-external' is non-nil, use `mm-url-program'."
  (if mm-url-use-external
      (progn
	(if (string-match "^file:/+" url)
	    (insert-file-contents (substring url (1- (match-end 0))))
	  (mm-url-insert-file-contents-external url))
	(goto-char (point-min))
	(if (fboundp 'url-generic-parse-url)
	    (setq url-current-object
		  (url-generic-parse-url url)))
	(list url (buffer-size)))
    (mm-url-load-url)
    (let ((name buffer-file-name)
	  (url-request-extra-headers (list (cons "Connection" "Close")))
	  (url-package-name (or mm-url-package-name
				url-package-name))
	  (url-package-version (or mm-url-package-version
				   url-package-version))
	  result)
      (setq result (url-insert-file-contents url))
      (save-excursion
	(goto-char (point-min))
	(while (re-search-forward "\r 1000\r ?" nil t)
	  (replace-match "")))
      (setq buffer-file-name name)
      (if (and (fboundp 'url-generic-parse-url)
	       (listp result))
	  (setq url-current-object (url-generic-parse-url
				    (car result))))
      result)))

;;;###autoload
(defun mm-url-insert-file-contents-external (url)
  "Insert file contents of URL using `mm-url-program'."
  (let (program args)
    (if (symbolp mm-url-program)
	(let ((item (cdr (assq mm-url-program mm-url-predefined-programs))))
	  (setq program (car item)
		args (append (cdr item) (list url))))
      (setq program mm-url-program
	    args (append mm-url-arguments (list url))))
    (unless (eq 0 (apply 'call-process program nil t nil args))
      (error "Couldn't fetch %s" url))))

(defvar mm-url-timeout 30
  "The number of seconds before timing out an URL fetch.")

(defvar mm-url-retries 10
  "The number of retries after timing out when fetching an URL.")

(defun mm-url-insert (url &optional follow-refresh)
  "Insert the contents from an URL in the current buffer.
If FOLLOW-REFRESH is non-nil, redirect refresh url in META."
  (let ((times mm-url-retries)
	(done nil)
	(first t)
	result)
    (while (and (not (zerop (decf times)))
		(not done))
      (with-timeout (mm-url-timeout)
	(unless first
	  (message "Trying again (%s)..." (- mm-url-retries times)))
	(setq first nil)
	(if follow-refresh
	    (save-restriction
	      (narrow-to-region (point) (point))
	      (mm-url-insert-file-contents url)
	      (goto-char (point-min))
	      (when (re-search-forward
		     "<meta[ \t\r\n]*http-equiv=\"Refresh\"[^>]*URL=\\([^\"]+\\)\"" nil t)
		(let ((url (match-string 1)))
		  (delete-region (point-min) (point-max))
		  (setq result (mm-url-insert url t)))))
	  (setq result (mm-url-insert-file-contents url)))
	(setq done t)))
    result))

(defun mm-url-decode-entities ()
  "Decode all HTML entities."
  (goto-char (point-min))
  (while (re-search-forward "&\\(#[0-9]+\\|[a-z]+\\);" nil t)
    (let ((elem (if (eq (aref (match-string 1) 0) ?\#)
			(let ((c
			       (string-to-number (substring
						  (match-string 1) 1))))
			  (if (mm-char-or-char-int-p c) c 32))
		      (or (cdr (assq (intern (match-string 1))
				     mm-url-html-entities))
			  ?#))))
      (unless (stringp elem)
	(setq elem (char-to-string elem)))
      (replace-match elem t t))))

(defun mm-url-decode-entities-nbsp ()
  "Decode all HTML entities and &nbsp; to a space."
  (let ((mm-url-html-entities (cons '(nbsp . 32) mm-url-html-entities)))
    (mm-url-decode-entities)))

(defun mm-url-decode-entities-string (string)
  (with-temp-buffer
    (insert string)
    (mm-url-decode-entities)
    (buffer-string)))

(defun mm-url-form-encode-xwfu (chunk)
  "Escape characters in a string for application/x-www-form-urlencoded.
Blasphemous crap because someone didn't think %20 was good enough for encoding
spaces.  Die Die Die."
  ;; This will get rid of the 'attributes' specified by the file type,
  ;; which are useless for an application/x-www-form-urlencoded form.
  (if (consp chunk)
      (setq chunk (cdr chunk)))

  (mapconcat
   (lambda (char)
     (cond
      ((= char ?  ) "+")
      ((memq char mm-url-unreserved-chars) (char-to-string char))
      (t (upcase (format "%%%02x" char)))))
   ;; Fixme: Should this actually be accepting multibyte?  Is there a
   ;; better way in XEmacs?
   (if (featurep 'mule)
       (encode-coding-string chunk
			     (if (fboundp 'find-coding-systems-string)
				 (car (find-coding-systems-string chunk))
				 buffer-file-coding-system))
     chunk)
   ""))

(defun mm-url-encode-www-form-urlencoded (pairs)
  "Return PAIRS encoded for forms."
  (mapconcat
   (lambda (data)
     (concat (mm-url-form-encode-xwfu (car data)) "="
	     (mm-url-form-encode-xwfu (cdr data))))
   pairs "&"))

(defun mm-url-fetch-form (url pairs)
  "Fetch a form from URL with PAIRS as the data using the POST method."
  (mm-url-load-url)
  (let ((url-request-data (mm-url-encode-www-form-urlencoded pairs))
	(url-request-method "POST")
	(url-request-extra-headers
	 '(("Content-type" . "application/x-www-form-urlencoded"))))
    (url-insert-file-contents url)
    (setq buffer-file-name nil))
  t)

(defun mm-url-fetch-simple (url content)
  (mm-url-load-url)
  (let ((url-request-data content)
	(url-request-method "POST")
	(url-request-extra-headers
	 '(("Content-type" . "application/x-www-form-urlencoded"))))
    (url-insert-file-contents url)
    (setq buffer-file-name nil))
  t)

(defun mm-url-remove-markup ()
  "Remove all HTML markup, leaving just plain text."
  (goto-char (point-min))
  (while (search-forward "<!--" nil t)
    (delete-region (match-beginning 0)
		   (or (search-forward "-->" nil t)
		       (point-max))))
  (goto-char (point-min))
  (while (re-search-forward "<[^>]+>" nil t)
    (replace-match "" t t)))

(provide 'mm-url)

;;; arch-tag: 0594f9b3-417c-48b0-adc2-5082e1e7917f
;;; mm-url.el ends here