SpamAssassin Bayes Training
SpamAssassin Bayes Training
Hope someone can help me out on Bayes training.
1) Message successfully un/learned.
Is it learned? Because I am confused on 'un/learned'. As shown below: 2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
As shown below:
1) Message successfully un/learned.
Is it learned? Because I am confused on 'un/learned'. As shown below: 2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
As shown below:
Re: SpamAssassin Bayes Training
Add On another question:
What file type that trainbayes.bat support? I just tried with mbx, eml and msg (File copy out from outlook), and it works.
What file type that trainbayes.bat support? I just tried with mbx, eml and msg (File copy out from outlook), and it works.
- jimimaseye
- Moderator
- Posts: 10060
- Joined: 2011-09-08 17:48
Re: SpamAssassin Bayes Training
It has been LEARNED - just as you have requested it to be:1) Message successfully un/learned.
Is it learned? Because I am confused on 'un/learned'. As shown below:
You ask for "S" - they will be un/LEARNed
You ask for "H" - it will be UN/LEARNed.
Think of the word "un/learned" as being a replacement for "whatever you asked to to be".
It exceeds the SPAMC maximum message size. Ignore it or increase your max message size:2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
https://spamassassin.apache.org/full/3. ... spamc.html
-s max_size, --max-size=max_size
Set the maximum message size which will be sent to spamd -- any bigger than this threshold and the message will be returned unprocessed (default: 500 KB). If spamc gets handed a message bigger than this, it won't be passed to spamd. The maximum message size is 256 MB.
The size is specified in bytes, as a positive integer greater than 0. For example, -s 500000
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Re: SpamAssassin Bayes Training
Understood on setting the max , how do I do it?It exceeds the SPAMC maximum message size. Ignore it or increase your max message size:2) Some spam mails are unable to learn due to greater than max message size (512000 bytes). The size of the email is more than 512kb but it is spam. What should I do?
https://spamassassin.apache.org/full/3. ... spamc.html-s max_size, --max-size=max_size
Set the maximum message size which will be sent to spamd -- any bigger than this threshold and the message will be returned unprocessed (default: 500 KB). If spamc gets handed a message bigger than this, it won't be passed to spamd. The maximum message size is 256 MB.
The size is specified in bytes, as a positive integer greater than 0. For example, -s 500000
I tried cmd-> cd to spamc location-> Then type spamc -s 700000
But after I enter, nothing pops up.
Tried with sa-learn --max-size 700000 , but still not working.
- jimimaseye
- Moderator
- Posts: 10060
- Joined: 2011-09-08 17:48
Re: SpamAssassin Bayes Training
As you will have read from the documentation link I posted, the use of spamc is:
spamc [options] < message
This is what is supplied in the bat file you are running.
It also states:
(Tip: I dont know the answer to your questions. Ive never used it or seen it. I just simply did a bit of reading of the link I already supplied. Something, Im sure, you could do yourself and come to the same conclusion and speed up achieving what you want).
spamc [options] < message
This is what is supplied in the bat file you are running.
It also states:
So create a spamc.conf with your parameters in it.CONFIGURATION FILE
The above command-line switches can also be loaded from a configuration file.
The format of the file is similar to the SpamAssassin rules files; blank lines and lines beginning with # are ignored. Any space-separated words are considered additions to the command line, and are prepended. Newlines are treated as equivalent to spaces. Existing command line switches will override any settings in the configuration file.
If the -F switch is specified, that file will be used. Otherwise, spamc will attempt to load spamc.conf in SYSCONFDIR (default: /etc/mail/spamassassin). If that file doesn't exist, and the -F switch is not specified, no configuration file will be read.
Example:
# spamc global configuration file
# connect to "server.example.com", port 783
-d server.example.com
-p 783
# max message size for scanning = 350k
-s 350000
(Tip: I dont know the answer to your questions. Ive never used it or seen it. I just simply did a bit of reading of the link I already supplied. Something, Im sure, you could do yourself and come to the same conclusion and speed up achieving what you want).
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Re: SpamAssassin Bayes Training
Ok noted. Will try it on next week and see the outcome.
One last question, is bayes learning supporting .msg file (Outlook message file)?
Because I tried yesterday and it seemed to work.
If so, then I have no need to worry on converting.
One last question, is bayes learning supporting .msg file (Outlook message file)?
Because I tried yesterday and it seemed to work.
If so, then I have no need to worry on converting.
- jimimaseye
- Moderator
- Posts: 10060
- Joined: 2011-09-08 17:48
Re: SpamAssassin Bayes Training
I'm guessing yes.
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Re: SpamAssassin Bayes Training
Update:
Tried with code below in spamc.conf or spamc.cf in /etc/spamassassin, but the max size still the same.
Tried with code below in spamc.conf or spamc.cf in /etc/spamassassin, but the max size still the same.
Code: Select all
# spamc global configuration file
# max message size for scanning = 700k
-s 700000
Re: SpamAssassin Bayes Training
Update:
Finally found the answer from this link.
viewtopic.php?f=22&t=26750&p=163734&hilit=max#p163968
Need to edit the trainbayes.bat file to set the max message size.
Finally found the answer from this link.
viewtopic.php?f=22&t=26750&p=163734&hilit=max#p163968
Need to edit the trainbayes.bat file to set the max message size.
- jimimaseye
- Moderator
- Posts: 10060
- Joined: 2011-09-08 17:48
Re: SpamAssassin Bayes Training
thomas10 wrote: ↑2018-03-26 06:17Update:
Finally found the answer from this link.
viewtopic.php?f=22&t=26750&p=163734&hilit=max#p163968
Need to edit the trainbayes.bat file to set the max message size.
Yes, That is what I was suggesting when I said:
jimimaseye wrote: ↑2018-03-20 10:07As you will have read from the documentation link I posted, the use of spamc is:
spamc [options] < message
This is what is supplied in the bat file you are running.
Isnt expecting spam messages to be greater than 512kb a little excessive and therefore possibly wasting the effort of recording it as Ham and filling the database? (ie, if its greater than 512kb its probably ham). See here for stats: https://securelist.com/spam-and-phishin ... email-size
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Re: SpamAssassin Bayes Training
Jimi, you are right indeed, i have misread the one you mentioned about the bat I am running. Silly me
Thanks for your suggestion jimi.
Below is the sample of the spam I have found here, the size is 702kb.
Thanks for your suggestion jimi.
Below is the sample of the spam I have found here, the size is 702kb.
- jimimaseye
- Moderator
- Posts: 10060
- Joined: 2011-09-08 17:48
Re: SpamAssassin Bayes Training
Fair enough.
I do note, however, that it is already identified as spam (by a long way - I dont expect anything over 2.5 and have 3 as my threshold). Out of interest, what are the Spamassassin headers/report? )Dont the headers already say BAYES probablity > 80%?)
I do note, however, that it is already identified as spam (by a long way - I dont expect anything over 2.5 and have 3 as my threshold). Out of interest, what are the Spamassassin headers/report? )Dont the headers already say BAYES probablity > 80%?)
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829
Re: SpamAssassin Bayes Training
You are right indeed, the probability is >80%.jimimaseye wrote: ↑2018-03-26 10:13Fair enough.
I do note, however, that it is already identified as spam (by a long way - I dont expect anything over 2.5 and have 3 as my threshold). Out of interest, what are the Spamassassin headers/report? )Dont the headers already say BAYES probablity > 80%?)
Below is the header of the spam mail.
Code: Select all
Return-Path: postmaster@enerzia.com
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on GCF
X-Spam-Flag: YES
X-Spam-Level: *****
X-Spam-Status: Yes, score=5.3 required=5.0 tests=BAYES_00,DEAR_NOBODY, HTML_MESSAGE,KHOP_DNSBL_BUMP,MIME_HTML_ONLY,RAZOR2_CF_RANGE_51_100,
RAZOR2_CHECK,RCVD_IN_HOSTKARMA_BL,URIBL_BLOCKED autolearn=no
autolearn_force=no version=3.4.1
X-Spam-Report: * 0.0 URIBL_BLOCKED ADMINISTRATOR NOTICE: The query to URIBL was blocked. *
See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block *
for more information. * [URIs: enerzia.com] * 1.7
RCVD_IN_HOSTKARMA_BL RBL: HostKarma: relay in black list * [86.106.131.194
listed in hostkarma.junkemailfilter.com] * 0.0 DEAR_NOBODY BODY: Message
contains Dear but with no name * 0.0 HTML_MESSAGE BODY: HTML included in
message * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0001]
* 1.9 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50% *
[cf: 100] * 0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/) *
2.0 KHOP_DNSBL_BUMP Hits a trusted non-overlapping DNSBL
Received: from slot0.chinatraders.trade (slot0.chinatraders.trade [86.106.131.194]) by
global-gp.com with ESMTP ; Tue, 20 Mar 2018 18:39:53 +0800
From: Joseph Raj <postmaster@enerzia.com>
To: nicole.lim@pkg.global-gp.com
Subject: [SPAM] [5.3] bank_slip_$340002
Date: 20 Mar 2018 03:40:00 -0700
Message-ID: <20180320034000.CE3591466D1B35AB@enerzia.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_NextPart_000_0012_690EBCC9.9FA2C39F"
X-Spam-Prev-Subject: bank_slip_$340002
X-hMailServer-Spam: YES
X-hMailServer-Reason-1: The host name specified in HELO does not match IP address. - (Score: 2)
X-hMailServer-Reason-2: Tagged as Spam by SpamAssassin - (Score: 5)
X-hMailServer-Reason-Score: 7
X-hMailServer-LoopCount: 1