56927
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
1 ;;; spam-stat.el --- detecting spam based on statistics
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
2
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
3 ;; Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
4
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
5 ;; Author: Alex Schroeder <alex@gnu.org>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
6 ;; Keywords: network
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
7 ;; URL: http://www.emacswiki.org/cgi-bin/wiki.pl?SpamStat
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
8
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
9 ;; This file is part of GNU Emacs.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
10
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
11 ;; This is free software; you can redistribute it and/or modify it
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
12 ;; under the terms of the GNU General Public License as published by
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
13 ;; the Free Software Foundation; either version 2, or (at your option)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
14 ;; any later version.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
15
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
16 ;; This is distributed in the hope that it will be useful, but WITHOUT
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
17 ;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
18 ;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
19 ;; License for more details.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
20
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
21 ;; You should have received a copy of the GNU General Public License
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
22 ;; along with GNU Emacs; see the file COPYING. If not, write to the
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
23 ;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
24 ;; Boston, MA 02111-1307, USA.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
25
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
26 ;;; Commentary:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
27
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
28 ;; This implements spam analysis according to Paul Graham in "A Plan
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
29 ;; for Spam". The basis for all this is a statistical distribution of
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
30 ;; words for your spam and non-spam mails. We need this information
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
31 ;; in a hash-table so that the analysis can use the information when
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
32 ;; looking at your mails. Therefore, before you begin, you need tons
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
33 ;; of mails (Graham uses 4000 non-spam and 4000 spam mails for his
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
34 ;; experiments).
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
35 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
36 ;; The main interface to using spam-stat, are the following functions:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
37 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
38 ;; `spam-stat-buffer-is-spam' -- called in a buffer, that buffer is
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
39 ;; considered to be a new spam mail; use this for new mail that has
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
40 ;; not been processed before
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
41 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
42 ;; `spam-stat-buffer-is-non-spam' -- called in a buffer, that buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
43 ;; is considered to be a new non-spam mail; use this for new mail that
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
44 ;; has not been processed before
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
45 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
46 ;; `spam-stat-buffer-change-to-spam' -- called in a buffer, that
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
47 ;; buffer is no longer considered to be normal mail but spam; use this
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
48 ;; to change the status of a mail that has already been processed as
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
49 ;; non-spam
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
50 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
51 ;; `spam-stat-buffer-change-to-non-spam' -- called in a buffer, that
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
52 ;; buffer is no longer considered to be spam but normal mail; use this
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
53 ;; to change the status of a mail that has already been processed as
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
54 ;; spam
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
55 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
56 ;; `spam-stat-save' -- save the hash table to the file; the filename
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
57 ;; used is stored in the variable `spam-stat-file'
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
58 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
59 ;; `spam-stat-load' -- load the hash table from a file; the filename
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
60 ;; used is stored in the variable `spam-stat-file'
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
61 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
62 ;; `spam-stat-score-word' -- return the spam score for a word
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
63 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
64 ;; `spam-stat-score-buffer' -- return the spam score for a buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
65 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
66 ;; `spam-stat-split-fancy' -- for fancy mail splitting; add
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
67 ;; the rule (: spam-stat-split-fancy) to `nnmail-split-fancy'
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
68 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
69 ;; This requires the following in your ~/.gnus file:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
70 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
71 ;; (require 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
72 ;; (spam-stat-load)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
73
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
74 ;;; Testing:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
75
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
76 ;; Typical test will involve calls to the following functions:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
77 ;;
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
78 ;; Reset: (spam-stat-reset)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
79 ;; Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
80 ;; Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
81 ;; Save table: (spam-stat-save)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
82 ;; File size: (nth 7 (file-attributes spam-stat-file))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
83 ;; Number of words: (hash-table-count spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
84 ;; Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
85 ;; Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
86 ;; Reduce table size: (spam-stat-reduce-size)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
87 ;; Save table: (spam-stat-save)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
88 ;; File size: (nth 7 (file-attributes spam-stat-file))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
89 ;; Number of words: (hash-table-count spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
90 ;; Test spam: (spam-stat-test-directory "~/Mail/mail/spam")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
91 ;; Test non-spam: (spam-stat-test-directory "~/Mail/mail/misc")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
92
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
93 ;;; Dictionary Creation:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
94
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
95 ;; Typically, you will filter away mailing lists etc. using specific
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
96 ;; rules in `nnmail-split-fancy'. Somewhere among these rules, you
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
97 ;; will filter spam. Here is how you would create your dictionary:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
98
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
99 ;; Reset: (spam-stat-reset)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
100 ;; Learn spam: (spam-stat-process-spam-directory "~/Mail/mail/spam")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
101 ;; Learn non-spam: (spam-stat-process-non-spam-directory "~/Mail/mail/misc")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
102 ;; Repeat for any other non-spam group you need...
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
103 ;; Reduce table size: (spam-stat-reduce-size)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
104 ;; Save table: (spam-stat-save)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
105
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
106 ;;; Todo:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
107
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
108 ;; Speed it up. Integrate with Gnus such that it uses spam and expiry
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
109 ;; marks to call the appropriate functions when leaving the summary
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
110 ;; buffer and saves the hash table when leaving Gnus. More testing:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
111 ;; More mails, disabling SpamAssassin, double checking algorithm, find
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
112 ;; improved algorithm.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
113
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
114 ;;; Thanks:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
115
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
116 ;; Ted Zlatanov <tzz@lifelogs.com>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
117 ;; Jesper Harder <harder@myrealbox.com>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
118 ;; Dan Schmidt <dfan@dfan.org>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
119 ;; Lasse Rasinen <lrasinen@iki.fi>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
120 ;; Milan Zamazal <pdm@zamazal.org>
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
121
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
122
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
123
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
124 ;;; Code:
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
125
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
126 (defgroup spam-stat nil
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
127 "Statistical spam detection for Emacs.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
128 Use the functions to build a dictionary of words and their statistical
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
129 distribution in spam and non-spam mails. Then use a function to determine
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
130 whether a buffer contains spam or not."
|
59996
|
131 :version "22.1"
|
56927
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
132 :group 'gnus)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
133
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
134 (defcustom spam-stat-file "~/.spam-stat.el"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
135 "File used to save and load the dictionary.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
136 See `spam-stat-to-hash-table' for the format of the file."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
137 :type 'file
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
138 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
139
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
140 (defcustom spam-stat-install-hooks t
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
141 "Whether spam-stat should install its hooks in Gnus.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
142 This is set to nil if you use spam-stat through spam.el."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
143 :type 'boolean
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
144 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
145
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
146 (defcustom spam-stat-unknown-word-score 0.2
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
147 "The score to use for unknown words.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
148 Also used for words that don't appear often enough."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
149 :type 'number
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
150 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
151
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
152 (defcustom spam-stat-max-word-length 15
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
153 "Only words shorter than this will be considered."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
154 :type 'integer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
155 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
156
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
157 (defcustom spam-stat-max-buffer-length 10240
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
158 "Only the beginning of buffers will be analyzed.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
159 This variable says how many characters this will be."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
160 :type 'integer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
161 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
162
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
163 (defcustom spam-stat-split-fancy-spam-group "mail.spam"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
164 "Name of the group where spam should be stored, if
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
165 `spam-stat-split-fancy' is used in fancy splitting rules. Has no
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
166 effect when spam-stat is invoked through spam.el."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
167 :type 'string
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
168 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
169
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
170 (defcustom spam-stat-split-fancy-spam-threshhold 0.9
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
171 "Spam score threshhold in spam-stat-split-fancy."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
172 :type 'number
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
173 :group 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
174
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
175 (defvar spam-stat-syntax-table
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
176 (let ((table (copy-syntax-table text-mode-syntax-table)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
177 (modify-syntax-entry ?- "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
178 (modify-syntax-entry ?_ "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
179 (modify-syntax-entry ?. "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
180 (modify-syntax-entry ?! "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
181 (modify-syntax-entry ?? "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
182 (modify-syntax-entry ?+ "w" table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
183 table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
184 "Syntax table used when processing mails for statistical analysis.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
185 The important part is which characters are word constituents.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
186
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
187 (defvar spam-stat-dirty nil
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
188 "Whether the spam-stat database needs saving.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
189
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
190 (defvar spam-stat-buffer nil
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
191 "Buffer to use for scoring while splitting.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
192 This is set by hooking into Gnus.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
193
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
194 (defvar spam-stat-buffer-name " *spam stat buffer*"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
195 "Name of the `spam-stat-buffer'.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
196
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
197 ;; Functions missing in Emacs 20
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
198
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
199 (when (memq nil (mapcar 'fboundp
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
200 '(gethash hash-table-count make-hash-table
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
201 mapc puthash)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
202 (require 'cl)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
203 (unless (fboundp 'puthash)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
204 ;; alias puthash is missing from Emacs 20 cl-extra.el
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
205 (defalias 'puthash 'cl-puthash)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
206
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
207 (eval-when-compile
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
208 (unless (fboundp 'with-syntax-table)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
209 ;; Imported from Emacs 21.2
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
210 (defmacro with-syntax-table (table &rest body) "\
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
211 Evaluate BODY with syntax table of current buffer set to a copy of TABLE.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
212 The syntax table of the current buffer is saved, BODY is evaluated, and the
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
213 saved table is restored, even in case of an abnormal exit.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
214 Value is what BODY returns."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
215 (let ((old-table (make-symbol "table"))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
216 (old-buffer (make-symbol "buffer")))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
217 `(let ((,old-table (syntax-table))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
218 (,old-buffer (current-buffer)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
219 (unwind-protect
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
220 (progn
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
221 (set-syntax-table (copy-syntax-table ,table))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
222 ,@body)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
223 (save-current-buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
224 (set-buffer ,old-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
225 (set-syntax-table ,old-table))))))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
226
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
227 ;; Hooking into Gnus
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
228
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
229 (defun spam-stat-store-current-buffer ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
230 "Store a copy of the current buffer in `spam-stat-buffer'."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
231 (save-excursion
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
232 (let ((str (buffer-string)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
233 (set-buffer (get-buffer-create spam-stat-buffer-name))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
234 (erase-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
235 (insert str)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
236 (setq spam-stat-buffer (current-buffer)))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
237
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
238 (defun spam-stat-store-gnus-article-buffer ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
239 "Store a copy of the current article in `spam-stat-buffer'.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
240 This uses `gnus-article-buffer'."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
241 (save-excursion
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
242 (set-buffer gnus-original-article-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
243 (spam-stat-store-current-buffer)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
244
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
245 ;; Data -- not using defstruct in order to save space and time
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
246
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
247 (defvar spam-stat (make-hash-table :test 'equal)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
248 "Hash table used to store the statistics.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
249 Use `spam-stat-load' to load the file.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
250 Every word is used as a key in this table. The value is a vector.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
251 Use `spam-stat-ngood', `spam-stat-nbad', `spam-stat-good',
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
252 `spam-stat-bad', and `spam-stat-score' to access this vector.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
253
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
254 (defvar spam-stat-ngood 0
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
255 "The number of good mails in the dictionary.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
256
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
257 (defvar spam-stat-nbad 0
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
258 "The number of bad mails in the dictionary.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
259
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
260 (defsubst spam-stat-good (entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
261 "Return the number of times this word belongs to good mails."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
262 (aref entry 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
263
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
264 (defsubst spam-stat-bad (entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
265 "Return the number of times this word belongs to bad mails."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
266 (aref entry 1))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
267
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
268 (defsubst spam-stat-score (entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
269 "Set the score of this word."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
270 (if entry
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
271 (aref entry 2)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
272 spam-stat-unknown-word-score))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
273
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
274 (defsubst spam-stat-set-good (entry value)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
275 "Set the number of times this word belongs to good mails."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
276 (aset entry 0 value))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
277
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
278 (defsubst spam-stat-set-bad (entry value)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
279 "Set the number of times this word belongs to bad mails."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
280 (aset entry 1 value))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
281
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
282 (defsubst spam-stat-set-score (entry value)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
283 "Set the score of this word."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
284 (aset entry 2 value))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
285
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
286 (defsubst spam-stat-make-entry (good bad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
287 "Return a vector with the given properties."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
288 (let ((entry (vector good bad nil)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
289 (spam-stat-set-score entry (spam-stat-compute-score entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
290 entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
291
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
292 ;; Computing
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
293
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
294 (defun spam-stat-compute-score (entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
295 "Compute the score of this word. 1.0 means spam."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
296 ;; promote all numbers to floats for the divisions
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
297 (let* ((g (* 2.0 (spam-stat-good entry)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
298 (b (float (spam-stat-bad entry))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
299 (cond ((< (+ g b) 5)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
300 .2)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
301 ((= 0 spam-stat-ngood)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
302 .99)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
303 ((= 0 spam-stat-nbad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
304 .01)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
305 (t
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
306 (max .01
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
307 (min .99 (/ (/ b spam-stat-nbad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
308 (+ (/ g spam-stat-ngood)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
309 (/ b spam-stat-nbad)))))))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
310
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
311 ;; Parsing
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
312
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
313 (defmacro with-spam-stat-max-buffer-size (&rest body)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
314 "Narrows the buffer down to the first 4k characters, then evaluates BODY."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
315 `(save-restriction
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
316 (when (> (- (point-max)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
317 (point-min))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
318 spam-stat-max-buffer-length)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
319 (narrow-to-region (point-min)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
320 (+ (point-min) spam-stat-max-buffer-length)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
321 ,@body))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
322
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
323 (defun spam-stat-buffer-words ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
324 "Return a hash table of words and number of occurences in the buffer."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
325 (with-spam-stat-max-buffer-size
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
326 (with-syntax-table spam-stat-syntax-table
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
327 (goto-char (point-min))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
328 (let ((result (make-hash-table :test 'equal))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
329 word count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
330 (while (re-search-forward "\\w+" nil t)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
331 (setq word (match-string-no-properties 0)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
332 count (1+ (gethash word result 0)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
333 (when (< (length word) spam-stat-max-word-length)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
334 (puthash word count result)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
335 result))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
336
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
337 (defun spam-stat-buffer-is-spam ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
338 "Consider current buffer to be a new spam mail."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
339 (setq spam-stat-nbad (1+ spam-stat-nbad))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
340 (maphash
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
341 (lambda (word count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
342 (let ((entry (gethash word spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
343 (if entry
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
344 (spam-stat-set-bad entry (+ count (spam-stat-bad entry)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
345 (setq entry (spam-stat-make-entry 0 count)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
346 (spam-stat-set-score entry (spam-stat-compute-score entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
347 (puthash word entry spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
348 (spam-stat-buffer-words))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
349 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
350
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
351 (defun spam-stat-buffer-is-non-spam ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
352 "Consider current buffer to be a new non-spam mail."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
353 (setq spam-stat-ngood (1+ spam-stat-ngood))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
354 (maphash
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
355 (lambda (word count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
356 (let ((entry (gethash word spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
357 (if entry
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
358 (spam-stat-set-good entry (+ count (spam-stat-good entry)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
359 (setq entry (spam-stat-make-entry count 0)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
360 (spam-stat-set-score entry (spam-stat-compute-score entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
361 (puthash word entry spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
362 (spam-stat-buffer-words))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
363 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
364
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
365 (defun spam-stat-buffer-change-to-spam ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
366 "Consider current buffer no longer normal mail but spam."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
367 (setq spam-stat-nbad (1+ spam-stat-nbad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
368 spam-stat-ngood (1- spam-stat-ngood))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
369 (maphash
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
370 (lambda (word count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
371 (let ((entry (gethash word spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
372 (if (not entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
373 (error "This buffer has unknown words in it.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
374 (spam-stat-set-good entry (- (spam-stat-good entry) count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
375 (spam-stat-set-bad entry (+ (spam-stat-bad entry) count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
376 (spam-stat-set-score entry (spam-stat-compute-score entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
377 (puthash word entry spam-stat))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
378 (spam-stat-buffer-words))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
379 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
380
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
381 (defun spam-stat-buffer-change-to-non-spam ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
382 "Consider current buffer no longer spam but normal mail."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
383 (setq spam-stat-nbad (1- spam-stat-nbad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
384 spam-stat-ngood (1+ spam-stat-ngood))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
385 (maphash
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
386 (lambda (word count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
387 (let ((entry (gethash word spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
388 (if (not entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
389 (error "This buffer has unknown words in it.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
390 (spam-stat-set-good entry (+ (spam-stat-good entry) count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
391 (spam-stat-set-bad entry (- (spam-stat-bad entry) count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
392 (spam-stat-set-score entry (spam-stat-compute-score entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
393 (puthash word entry spam-stat))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
394 (spam-stat-buffer-words))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
395 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
396
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
397 ;; Saving and Loading
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
398
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
399 (defun spam-stat-save (&optional force)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
400 "Save the `spam-stat' hash table as lisp file.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
401 With a prefix argument save unconditionally."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
402 (interactive "P")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
403 (when (or force spam-stat-dirty)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
404 (with-temp-buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
405 (let ((standard-output (current-buffer))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
406 (font-lock-maximum-size 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
407 (insert "(setq spam-stat-ngood "
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
408 (number-to-string spam-stat-ngood)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
409 " spam-stat-nbad "
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
410 (number-to-string spam-stat-nbad)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
411 " spam-stat (spam-stat-to-hash-table '(")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
412 (maphash (lambda (word entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
413 (prin1 (list word
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
414 (spam-stat-good entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
415 (spam-stat-bad entry))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
416 spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
417 (insert ")))")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
418 (write-file spam-stat-file)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
419 (setq spam-stat-dirty nil)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
420
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
421 (defun spam-stat-load ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
422 "Read the `spam-stat' hash table from disk."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
423 ;; TODO: maybe we should warn the user if spam-stat-dirty is t?
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
424 (load-file spam-stat-file)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
425 (setq spam-stat-dirty nil))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
426
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
427 (defun spam-stat-to-hash-table (entries)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
428 "Turn list ENTRIES into a hash table and store as `spam-stat'.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
429 Every element in ENTRIES has the form \(WORD GOOD BAD) where WORD is
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
430 the word string, NGOOD is the number of good mails it has appeared in,
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
431 NBAD is the number of bad mails it has appeared in, GOOD is the number
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
432 of times it appeared in good mails, and BAD is the number of times it
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
433 has appeared in bad mails."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
434 (let ((table (make-hash-table :test 'equal)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
435 (mapc (lambda (l)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
436 (puthash (car l)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
437 (spam-stat-make-entry (nth 1 l) (nth 2 l))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
438 table))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
439 entries)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
440 table))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
441
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
442 (defun spam-stat-reset ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
443 "Reset `spam-stat' to an empty hash-table.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
444 This deletes all the statistics."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
445 (interactive)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
446 (setq spam-stat (make-hash-table :test 'equal)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
447 spam-stat-ngood 0
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
448 spam-stat-nbad 0)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
449 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
450
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
451 ;; Scoring buffers
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
452
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
453 (defvar spam-stat-score-data nil
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
454 "Raw data used in the last run of `spam-stat-score-buffer'.")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
455
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
456 (defsubst spam-stat-score-word (word)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
457 "Return score for WORD.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
458 The default score for unknown words is stored in
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
459 `spam-stat-unknown-word-score'."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
460 (spam-stat-score (gethash word spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
461
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
462 (defun spam-stat-buffer-words-with-scores ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
463 "Process current buffer, return the 15 most conspicuous words.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
464 These are the words whose spam-stat differs the most from 0.5.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
465 The list returned contains elements of the form \(WORD SCORE DIFF),
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
466 where DIFF is the difference between SCORE and 0.5."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
467 (with-spam-stat-max-buffer-size
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
468 (with-syntax-table spam-stat-syntax-table
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
469 (let (result word score)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
470 (maphash (lambda (word ignore)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
471 (setq score (spam-stat-score-word word)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
472 result (cons (list word score (abs (- score 0.5)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
473 result)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
474 (spam-stat-buffer-words))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
475 (setq result (sort result (lambda (a b) (< (nth 2 b) (nth 2 a)))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
476 (setcdr (nthcdr 14 result) nil)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
477 result))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
478
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
479 (defun spam-stat-score-buffer ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
480 "Return a score describing the spam-probability for this buffer."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
481 (setq spam-stat-score-data (spam-stat-buffer-words-with-scores))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
482 (let* ((probs (mapcar (lambda (e) (cadr e)) spam-stat-score-data))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
483 (prod (apply #'* probs)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
484 (/ prod (+ prod (apply #'* (mapcar #'(lambda (x) (- 1 x))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
485 probs))))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
486
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
487 (defun spam-stat-split-fancy ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
488 "Return the name of the spam group if the current mail is spam.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
489 Use this function on `nnmail-split-fancy'. If you are interested in
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
490 the raw data used for the last run of `spam-stat-score-buffer',
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
491 check the variable `spam-stat-score-data'."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
492 (condition-case var
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
493 (progn
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
494 (set-buffer spam-stat-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
495 (goto-char (point-min))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
496 (when (> (spam-stat-score-buffer) spam-stat-split-fancy-spam-threshhold)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
497 (when (boundp 'nnmail-split-trace)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
498 (mapc (lambda (entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
499 (push entry nnmail-split-trace))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
500 spam-stat-score-data))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
501 spam-stat-split-fancy-spam-group))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
502 (error (message "Error in spam-stat-split-fancy: %S" var)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
503 nil)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
504
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
505 ;; Testing
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
506
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
507 (defun spam-stat-process-directory (dir func)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
508 "Process all the regular files in directory DIR using function FUNC."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
509 (let* ((files (directory-files dir t "^[^.]"))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
510 (max (/ (length files) 100.0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
511 (count 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
512 (with-temp-buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
513 (dolist (f files)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
514 (when (and (file-readable-p f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
515 (file-regular-p f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
516 (> (nth 7 (file-attributes f)) 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
517 (setq count (1+ count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
518 (message "Reading %s: %.2f%%" dir (/ count max))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
519 (insert-file-contents f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
520 (funcall func)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
521 (erase-buffer))))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
522
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
523 (defun spam-stat-process-spam-directory (dir)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
524 "Process all the regular files in directory DIR as spam."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
525 (interactive "D")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
526 (spam-stat-process-directory dir 'spam-stat-buffer-is-spam))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
527
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
528 (defun spam-stat-process-non-spam-directory (dir)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
529 "Process all the regular files in directory DIR as non-spam."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
530 (interactive "D")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
531 (spam-stat-process-directory dir 'spam-stat-buffer-is-non-spam))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
532
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
533 (defun spam-stat-count ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
534 "Return size of `spam-stat'."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
535 (interactive)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
536 (hash-table-count spam-stat))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
537
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
538 (defun spam-stat-test-directory (dir)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
539 "Test all the regular files in directory DIR for spam.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
540 If the result is 1.0, then all files are considered spam.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
541 If the result is 0.0, non of the files is considered spam.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
542 You can use this to determine error rates."
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
543 (interactive "D")
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
544 (let* ((files (directory-files dir t "^[^.]"))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
545 (total (length files))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
546 (score 0.0); float
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
547 (max (/ total 100.0)); float
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
548 (count 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
549 (with-temp-buffer
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
550 (dolist (f files)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
551 (when (and (file-readable-p f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
552 (file-regular-p f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
553 (> (nth 7 (file-attributes f)) 0))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
554 (setq count (1+ count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
555 (message "Reading %.2f%%, score %.2f%%"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
556 (/ count max) (/ score count))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
557 (insert-file-contents f)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
558 (when (> (spam-stat-score-buffer) 0.9)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
559 (setq score (1+ score)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
560 (erase-buffer))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
561 (message "Final score: %d / %d = %f" score total (/ score total))))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
562
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
563 ;; Shrinking the dictionary
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
564
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
565 (defun spam-stat-reduce-size (&optional count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
566 "Reduce the size of `spam-stat'.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
567 This removes all words that occur less than COUNT from the dictionary.
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
568 COUNT defaults to 5"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
569 (interactive)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
570 (setq count (or count 5))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
571 (maphash (lambda (key entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
572 (when (< (+ (spam-stat-good entry)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
573 (spam-stat-bad entry))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
574 count)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
575 (remhash key spam-stat)))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
576 spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
577 (setq spam-stat-dirty t))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
578
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
579 (defun spam-stat-install-hooks-function ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
580 "Install the spam-stat function hooks"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
581 (interactive)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
582 (add-hook 'nnmail-prepare-incoming-message-hook
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
583 'spam-stat-store-current-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
584 (add-hook 'gnus-select-article-hook
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
585 'spam-stat-store-gnus-article-buffer))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
586
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
587 (when spam-stat-install-hooks
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
588 (spam-stat-install-hooks-function))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
589
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
590 (defun spam-stat-unload-hook ()
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
591 "Uninstall the spam-stat function hooks"
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
592 (interactive)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
593 (remove-hook 'nnmail-prepare-incoming-message-hook
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
594 'spam-stat-store-current-buffer)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
595 (remove-hook 'gnus-select-article-hook
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
596 'spam-stat-store-gnus-article-buffer))
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
597
|
57550
|
598 (add-hook 'spam-stat-unload-hook 'spam-stat-unload-hook)
|
|
599
|
56927
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
600 (provide 'spam-stat)
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
601
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
602 ;;; arch-tag: ff1d2200-8ddb-42fb-bb7b-1b5e20448554
|
55fd4f77387a
Revision: miles@gnu.org--gnu-2004/emacs--cvs-trunk--0--patch-523
Miles Bader <miles@gnu.org>
parents:
diff
changeset
|
603 ;;; spam-stat.el ends here
|