SpamAssassin Bayes Training

Use this forum for discussions about SpamAssassin and anti-spam in general.
Post Reply
thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

SpamAssassin Bayes Training

Post by thomas10 » 2018-03-19 03:33

Hope someone can help me out on Bayes training.
1) Message successfully un/learned.
Is it learned? Because I am confused on 'un/learned'. As shown below:
spamd learn.jpg
2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
As shown below:
sa learn- skipped message.jpg

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-19 04:30

Add On another question:
What file type that trainbayes.bat support? I just tried with mbx, eml and msg (File copy out from outlook), and it works.

User avatar
jimimaseye
Moderator
Moderator
Posts: 10060
Joined: 2011-09-08 17:48

Re: SpamAssassin Bayes Training

Post by jimimaseye » 2018-03-19 10:07

1) Message successfully un/learned.
Is it learned? Because I am confused on 'un/learned'. As shown below:
It has been LEARNED - just as you have requested it to be:

You ask for "S" - they will be un/LEARNed

You ask for "H" - it will be UN/LEARNed.

Think of the word "un/learned" as being a replacement for "whatever you asked to to be".

2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
It exceeds the SPAMC maximum message size. Ignore it or increase your max message size:
https://spamassassin.apache.org/full/3. ... spamc.html
-s max_size, --max-size=max_size
Set the maximum message size which will be sent to spamd -- any bigger than this threshold and the message will be returned unprocessed (default: 500 KB). If spamc gets handed a message bigger than this, it won't be passed to spamd. The maximum message size is 256 MB.

The size is specified in bytes, as a positive integer greater than 0. For example, -s 500000
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-20 09:42

2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
It exceeds the SPAMC maximum message size. Ignore it or increase your max message size:
https://spamassassin.apache.org/full/3. ... spamc.html
-s max_size, --max-size=max_size
Set the maximum message size which will be sent to spamd -- any bigger than this threshold and the message will be returned unprocessed (default: 500 KB). If spamc gets handed a message bigger than this, it won't be passed to spamd. The maximum message size is 256 MB.

The size is specified in bytes, as a positive integer greater than 0. For example, -s 500000
Understood on setting the max , how do I do it?
I tried cmd-> cd to spamc location-> Then type spamc -s 700000
But after I enter, nothing pops up.

Tried with sa-learn --max-size 700000 , but still not working.

User avatar
jimimaseye
Moderator
Moderator
Posts: 10060
Joined: 2011-09-08 17:48

Re: SpamAssassin Bayes Training

Post by jimimaseye » 2018-03-20 10:07

As you will have read from the documentation link I posted, the use of spamc is:

spamc [options] < message

This is what is supplied in the bat file you are running.

It also states:
CONFIGURATION FILE

The above command-line switches can also be loaded from a configuration file.

The format of the file is similar to the SpamAssassin rules files; blank lines and lines beginning with # are ignored. Any space-separated words are considered additions to the command line, and are prepended. Newlines are treated as equivalent to spaces. Existing command line switches will override any settings in the configuration file.

If the -F switch is specified, that file will be used. Otherwise, spamc will attempt to load spamc.conf in SYSCONFDIR (default: /etc/mail/spamassassin). If that file doesn't exist, and the -F switch is not specified, no configuration file will be read.

Example:

# spamc global configuration file

# connect to "server.example.com", port 783
-d server.example.com
-p 783

# max message size for scanning = 350k
-s 350000
So create a spamc.conf with your parameters in it.

(Tip: I dont know the answer to your questions. Ive never used it or seen it. I just simply did a bit of reading of the link I already supplied. Something, Im sure, you could do yourself and come to the same conclusion and speed up achieving what you want).
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-20 10:17

Ok noted. Will try it on next week and see the outcome. :D

One last question, is bayes learning supporting .msg file (Outlook message file)?
Because I tried yesterday and it seemed to work.

If so, then I have no need to worry on converting.

User avatar
jimimaseye
Moderator
Moderator
Posts: 10060
Joined: 2011-09-08 17:48

Re: SpamAssassin Bayes Training

Post by jimimaseye » 2018-03-20 10:29

I'm guessing yes.
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-26 05:43

Update:

Tried with code below in spamc.conf or spamc.cf in /etc/spamassassin, but the max size still the same. :cry:

Code: Select all

# spamc global configuration file
# max message size for scanning = 700k
-s 700000

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-26 06:17

Update:

Finally found the answer from this link.
viewtopic.php?f=22&t=26750&p=163734&hilit=max#p163968

Need to edit the trainbayes.bat file to set the max message size. :lol:

User avatar
jimimaseye
Moderator
Moderator
Posts: 10060
Joined: 2011-09-08 17:48

Re: SpamAssassin Bayes Training

Post by jimimaseye » 2018-03-26 08:56

thomas10 wrote:
2018-03-26 06:17
Update:

Finally found the answer from this link.
viewtopic.php?f=22&t=26750&p=163734&hilit=max#p163968

Need to edit the trainbayes.bat file to set the max message size. :lol:

Yes, That is what I was suggesting when I said:

jimimaseye wrote:
2018-03-20 10:07
As you will have read from the documentation link I posted, the use of spamc is:

spamc [options] < message

This is what is supplied in the bat file you are running.

Isnt expecting spam messages to be greater than 512kb a little excessive and therefore possibly wasting the effort of recording it as Ham and filling the database? (ie, if its greater than 512kb its probably ham). See here for stats: https://securelist.com/spam-and-phishin ... email-size
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-26 09:59

Jimi, you are right indeed, i have misread the one you mentioned about the bat I am running. Silly me :oops:
Thanks for your suggestion jimi.
Below is the sample of the spam I have found here, the size is 702kb.
Image
Attachments
sample.jpg

User avatar
jimimaseye
Moderator
Moderator
Posts: 10060
Joined: 2011-09-08 17:48

Re: SpamAssassin Bayes Training

Post by jimimaseye » 2018-03-26 10:13

Fair enough.

I do note, however, that it is already identified as spam (by a long way - I dont expect anything over 2.5 and have 3 as my threshold). Out of interest, what are the Spamassassin headers/report? )Dont the headers already say BAYES probablity > 80%?)
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

thomas10
Normal user
Normal user
Posts: 132
Joined: 2013-10-30 03:13

Re: SpamAssassin Bayes Training

Post by thomas10 » 2018-03-26 10:38

jimimaseye wrote:
2018-03-26 10:13
Fair enough.

I do note, however, that it is already identified as spam (by a long way - I dont expect anything over 2.5 and have 3 as my threshold). Out of interest, what are the Spamassassin headers/report? )Dont the headers already say BAYES probablity > 80%?)
You are right indeed, the probability is >80%.
Below is the header of the spam mail.

Code: Select all

Return-Path: postmaster@enerzia.com
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on GCF
X-Spam-Flag: YES
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.3 required=5.0 tests=BAYES_00,DEAR_NOBODY, HTML_MESSAGE,KHOP_DNSBL_BUMP,MIME_HTML_ONLY,RAZOR2_CF_RANGE_51_100,
 RAZOR2_CHECK,RCVD_IN_HOSTKARMA_BL,URIBL_BLOCKED autolearn=no
 autolearn_force=no version=3.4.1
X-Spam-Report: *  0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. *
       See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block * 
     for more information. *      [URIs: enerzia.com] *  1.7
 RCVD_IN_HOSTKARMA_BL RBL: HostKarma: relay in black list *      [86.106.131.194
 listed in hostkarma.junkemailfilter.com] *  0.0 DEAR_NOBODY BODY: Message
 contains Dear but with no name *  0.0 HTML_MESSAGE BODY: HTML included in
 message *  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% *      [score: 0.0001]
 *  1.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% *   
 [cf: 100] *  0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) * 
 2.0 KHOP_DNSBL_BUMP Hits a trusted non-overlapping DNSBL
Received: from slot0.chinatraders.trade (slot0.chinatraders.trade [86.106.131.194]) by
 global-gp.com with ESMTP ; Tue, 20 Mar 2018 18:39:53 +0800
From: Joseph Raj <postmaster@enerzia.com>
To: nicole.lim@pkg.global-gp.com
Subject: [SPAM] [5.3] bank_slip_$340002
Date: 20 Mar 2018 03:40:00 -0700
Message-ID: <20180320034000.CE3591466D1B35AB@enerzia.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0012_690EBCC9.9FA2C39F"
X-Spam-Prev-Subject: bank_slip_$340002
X-hMailServer-Spam: YES
X-hMailServer-Reason-1: The host name specified in HELO does not match IP address. - (Score: 2)
X-hMailServer-Reason-2: Tagged as Spam by SpamAssassin - (Score: 5)
X-hMailServer-Reason-Score: 7
X-hMailServer-LoopCount: 1
I am setting HMS by ticking Use SA score with 5 and delete threshold 8. There are around 1-2 FP mail per week found with this setting so that I can have it learned by Bayes.

Post Reply