SpamAssassin Bootcamp (sa-learn) train BAYES

This section contains scripts that hMailServer has contributed with. hMailServer 5 is needed to use these.
User avatar
jimimaseye
Moderator
Moderator
Posts: 8861
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by jimimaseye » 2018-08-27 11:34

Whilst your tinkering, is this any interest to you: viewtopic.php?f=20&t=28052 ?
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2018-08-27 12:46

jimimaseye wrote:
2018-08-27 11:34
Whilst your tinkering, is this any interest to you: viewtopic.php?f=20&t=28052 ?
That's very cool but I already set up an account to which all spam including spam above the delete threshold gets sent. That way I can sort it for learning purposes because I know that no users are sorting. At least they're not sorting very in any consistent manner. :roll:

rub.ak
New user
New user
Posts: 2
Joined: 2018-09-22 19:44

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by rub.ak » 2018-09-22 19:49

I used sa-learn.0.6.1.rar and encountered the problem of copying mail (spam.cmd ham.cmd)

Code: Select all

C:\SpamAssassin\temp>COPY "C:\Program Files (x86)\hMailServer\Data\home.aln\spam
\4B\{4BEF5CA4-5DBD-4668-B8B2-3C8104BD65BB}.eml" C:\SpamAssassin\temp\ham\5585.em
l /Y
Системе не удается найти указанный путь.
Скопировано файлов:         0.
How to solve the problem with copying?

User avatar
jimimaseye
Moderator
Moderator
Posts: 8861
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by jimimaseye » 2018-09-22 20:46

rub.ak wrote:
2018-09-22 19:49
I used sa-learn.0.6.1.rar and encountered the problem of copying mail (spam.cmd ham.cmd)

Code: Select all

C:\SpamAssassin\temp>COPY "C:\Program Files (x86)\hMailServer\Data\home.aln\spam
\4B\{4BEF5CA4-5DBD-4668-B8B2-3C8104BD65BB}.eml" C:\SpamAssassin\temp\ham\5585.em
l /Y
Системе не удается найти указанный путь.
Скопировано файлов:         0.
How to solve the problem with copying?

What version of hns do you have? Can you run this and post the results: viewtopic.php?f=20&t=30914
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

rub.ak
New user
New user
Posts: 2
Joined: 2018-09-22 19:44

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by rub.ak » 2018-09-22 22:32

jimimaseye wrote:
2018-09-22 20:46
Thanks for the response.
I solved the problem myself, it was enough to create the folders / temp / spam, / temp / ham

User avatar
jimimaseye
Moderator
Moderator
Posts: 8861
Joined: 2011-09-08 17:48

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by jimimaseye » 2018-09-22 22:54

Ok
5.7 on test.
SpamassassinForWindows 3.4.0 spamd service
AV: Clamwin + Clamd service + sanesecurity defs : https://www.hmailserver.com/forum/viewtopic.php?f=21&t=26829

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-11-28 11:55

SorenR wrote:
2014-08-09 13:31
The success of SpamAssassin relies on a well trained Bayes database. There are many ways to train your Bayes database, this is my shot at doing it.
NB! wrote:How to obtain SpamAssassin and the installation and configuration of SpamAssassin is NOT described here! Search elsewhere on this forum to obtain this information.
The idea came from my play/toy MailServer (Postfix, Dovecut & MailScanner) on my Synology DS209+II NAS. There it is basically a no-brainer to set up as everything works off of MailDirs.

So how to do ... on a hMailServer 5.4.2-B1964 system ...

1 -- Build a script using the COM api to find and extract relevant emails and what could be more natural than to assume INBOX is good and SPAM is bad. Also, if HAM end up in SPAM (or visa versa) you move it to the respective folder (INBOX or SPAM) and at the next scheduled run, the email will be classified differently. I execute this script using Windows Schedule at 04:00 in the morning when everyone is (supposed to be) at sleep.

2 -- A global rule in hMailServer to move emails tagged as SPAM into a SPAM folder. Setting this rule as a global rule will ensure that ALL users of hMailServer are covered. If the SPAM folder do not exist, it will be created by hMailServer automatically.

Rule name: sa-learn

Code: Select all

Criteria -> Custom header field -> X-hMailServer-Spam = YES
Action -> Move to IMAP folder -> IMAP folder = SPAM
Some admins may want to monitor what is tagged as SPAM, like me, so I forward a copy to spam@my-domain.tld with this revised global rule.

Rule name: sa-learn (BigBrother version) :: Use AND

Code: Select all

Criteria -> Custom header field -> X-hMailServer-Spam = YES
Criteria -> Custom header field -> X-hMailServer-LoopCount < 1
Action -> Move to IMAP folder -> IMAP folder = SPAM
Action -> Forward email -> To = spam@my-domain.tld
"X-hMailServer-LoopCount" is used to prevent loops. By checking for value < 1 we make sure it is only run once. Thus SPAM will stay in the spam@my-domain.tld INBOX.

3 -- Now that we have both good and bad emails defined, we need to pull them off of the server. For that I have choosen VBScript to interact with the hMailServer COM API.
Functional Description: wrote: The VBScript (sa-learn.vbs) will work with ONE domain at present as I only have one domain. Using the COM API it will locate and process all account addresses for that domain - except those addresses listed in the exception list.

The script will do two passes, one for INBOX and one for SPAM, and generate two .cmd files (HAMCopy.cmd & SPAMCopy.cmd) to be run by the script.

During the two passes, the number of messages in the respective folders are checked and only the last (max.) 20 messages are processed. This procedure is based on the assumption that the COM API will return data sorted by table ID and from examining the database it appears that hMailServer simply adds new message ID's to the table in favor of reusing deleted message ID's.

HAMCopy.cmd and SPAMCopy.cmd simply copies the selected .eml files to a HAM or SPAM directory.

A third .cmd file (sa-learn.cmd) is also executed by the script and this .cmd file contains the commands to execute sa-learn --spam, sa-learn --ham, sa-learn --sync and sa-learn --backup as it is customary on Unix type systems.

On a Dual-Core 3GHz, 4GB RAM, SATA, System w/ Windows Server 2003R2 it took almost 20 minutes to process ~4.200 HAM and ~5.600 SPAM emails.
sa-learn.vbs

Code: Select all

Option Explicit
   '
   ' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
   '

   Dim hmAdmin, hmPassword, hmDomain, hmSPAMFolder, hmSPAMDir, hmHAMFolder, hmHAMDir, hmExcludeAddress
   Dim i, j, s, objApp, objDomain, objAccount, objIMAPFolder, objMessage
   Dim fsoSPAM, fsoHAM, fsoSALearn, objFSO, objSPAM, objHAM, objShell

   '
   ' Configuration parameters - BEGIN
   '
   hmAdmin = "Administrator"          ' hMailServer Administrator user
   hmPassword = "********"            ' hMailServer Administrator password
   hmDomain = "my-domain.tld"         ' Domain name
   hmSPAMFolder = "SPAM"              ' SPAM IMAP folder
   hmSPAMDir = "C:\hMailServer\SPAM"  ' You need to create this directory!
   hmHAMFolder = "INBOX"              ' HAM IMAP Folder
   hmHAMDir = "C:\hMailServer\HAM"    ' You need to create this directory!
   hmExcludeAddress = "spam@my-domain.tld, surveillance@my-domain.tld"

   fsoSPAM = "C:\hMailServer\Events\SPAMCopy.cmd"
   fsoHAM = "C:\hMailServer\Events\HAMCopy.cmd"
   fsoSALearn = "C:\hMailServer\Events\sa-learn.cmd"
   '
   ' Configuration parameters - END
   '

   Set objShell = WScript.CreateObject("WScript.Shell")
   Set objFSO = CreateObject("Scripting.FileSystemObject")
   Set objApp = CreateObject("hMailServer.Application")
   Call objApp.Authenticate(hmAdmin, hmPassword)
   Set objDomain = objApp.Domains.ItemByName(hmDomain)

   '
   ' Find SPAM messages
   '
   Set objSPAM = objFSO.CreateTextFile(fsoSPAM,True)
   For i = 0 to objDomain.Accounts.Count -1
      Set objAccount = objDomain.Accounts.Item(i)

      ' DO NOT process excluded and non-active accounts.
      If (NOT InStr(hmExcludeAddress, objAccount.Address)) * objAccount.Active Then

         Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmSPAMFolder)

         ' If no messages - skip
         If objIMAPFolder.Messages.Count > 0 Then

            s = 0
            If objIMAPFolder.Messages.Count - 20 > 0 Then s = objIMAPFolder.Messages.Count - 20

            For j = s to objIMAPFolder.Messages.Count -1
               Set objMessage = objIMAPFolder.Messages.Item(j)
               objSPAM.Write "COPY " & objMessage.FileName & " " & hmSPAMDir & " /Y" & vbCrLf
            Next
         End If
      End If
   Next
   objSPAM.Close

   '
   ' Find HAM messages
   '
   Set objHAM = objFSO.CreateTextFile(fsoHAM,True)
   For i = 0 to objDomain.Accounts.Count -1
      Set objAccount = objDomain.Accounts.Item(i)

      ' DO NOT process excluded and non-active accounts.
      If (NOT InStr(hmExcludeAddress, objAccount.Address)) * objAccount.Active Then

         Set objIMAPFolder = objAccount.IMAPFolders.ItemByName(hmHAMFolder)

         ' If no messages - skip
         If objIMAPFolder.Messages.Count > 0 Then

            s = 0
            If objIMAPFolder.Messages.Count - 20 > 0 Then s = objIMAPFolder.Messages.Count - 20

            For j = s to objIMAPFolder.Messages.Count -1
               Set objMessage = objIMAPFolder.Messages.Item(j)
               objHAM.Write "COPY " & objMessage.FileName & " " & hmHAMDir & " /Y" & vbCrLf
            Next
         End If
      End If
   Next
   objHAM.Close

   '
   ' Execute file copy and sa-learn.exe - sequentially - no StdOut.
   '
   objShell.Run fsoSPAM, 0, true
   objShell.Run fsoHAM, 0, true
   objShell.Run fsoSALearn, 0, true
sa-learn.cmd

Code: Select all

C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --spam "C:\hMailServer\SPAM\*.eml"
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --ham "C:\hMailServer\HAM\*.eml"
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --sync
C:\SpamAssassin\sa-learn.exe --siteconfigpath="C:\SpamAssassin\etc\spamassassin" --dbpath "C:\Documents and Settings\Default User\.spamassassin\bayes" --backup > "C:\Documents and Settings\Default User\.spamassassin\bayes_backup"
REM DELETE C:\hMailServer\SPAM\*.eml /Q
REM DELETE C:\hMailServer\HAM\*.eml /Q
Disclaimer: wrote: I take no responsibility for what you may or may not do with the above script/shell script. It works for me - it may not work for you. I DO NOT GUARANTEE THIS CODE TO BE BUG-FREE! USE AT YOUR OWN RISK! WHATEVER YOU DO - IT'S NOT MY FAULT!

AND remember; Real men do NOT backup - but they CRY a lot!

Please feel free to adopt and modify.
Hello,

I want to use this to my existing hmailserver installation, however I am a bit confused how to use and implement the script. Is there any step-by-step instruction (similar to what Jimimaseye did to this post https://www.hmailserver.com/forum/viewt ... 91#p174991 ) where to put these scripts? Which folder, etc.

Thank you.

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2019-11-28 12:53

ashtec014 wrote:
2019-11-28 11:55
SorenR wrote:
2014-08-09 13:31
The success of SpamAssassin relies on a well trained Bayes database. There are many ways to train your Bayes database, this is my shot at doing it.
NB! wrote:How to obtain SpamAssassin and the installation and configuration of SpamAssassin is NOT described here! Search elsewhere on this forum to obtain this information.
The idea came from my play/toy MailServer (Postfix, Dovecut & MailScanner) on my Synology DS209+II NAS. There it is basically a no-brainer to set up as everything works off of MailDirs.

So how to do ... on a hMailServer 5.4.2-B1964 system ...

1 -- Build a script using the COM api to find and extract relevant emails and what could be more natural than to assume INBOX is good and SPAM is bad. Also, if HAM end up in SPAM (or visa versa) you move it to the respective folder (INBOX or SPAM) and at the next scheduled run, the email will be classified differently. I execute this script using Windows Schedule at 04:00 in the morning when everyone is (supposed to be) at sleep.

2 -- A global rule in hMailServer to move emails tagged as SPAM into a SPAM folder. Setting this rule as a global rule will ensure that ALL users of hMailServer are covered. If the SPAM folder do not exist, it will be created by hMailServer automatically.

bla bla bla
bla bla bla
Disclaimer: wrote: I take no responsibility for what you may or may not do with the above script/shell script. It works for me - it may not work for you. I DO NOT GUARANTEE THIS CODE TO BE BUG-FREE! USE AT YOUR OWN RISK! WHATEVER YOU DO - IT'S NOT MY FAULT!

AND remember; Real men do NOT backup - but they CRY a lot!

Please feel free to adopt and modify.
Hello,

I want to use this to my existing hmailserver installation, however I am a bit confused how to use and implement the script. Is there any step-by-step instruction (similar to what Jimimaseye did to this post https://www.hmailserver.com/forum/viewt ... 91#p174991 ) where to put these scripts? Which folder, etc.

Thank you.
Did you read the whole thread? The code in the very first post will probably not even work anymore after 5 years of Microsoft Updates. I do not have admin privs. on this board so code changes are posted as they are made.

Once you have read everything you will realize that it does not really matter where you put the scripts, they have hardcoded paths in them. I have sa-learn.vbs and sa-learn.cmd in my c:\spamassassin directory.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-11 18:24

SorenR wrote:
2018-07-28 13:43
Ok... Two steps forward and one step back...

I suspect there continues to be an issue with curly brackets.

Version 0.6.1 is back to creating HAM.CMD and SPAM.CMD that will copy mails to c:\spamassassin\temp\ham and c:\spamassassin\temp\spam. SA-LEARN.CMD does the actual learning.

DO NOT forget to delete the HAM and SPAM files in .\temp or you WILL get the permission issue. You need to recreate the HAM and SPAM folders in .\temp.
Hi, I've managed to use and run this version and I got no error, however I am not sure if I configured it right.
My logs shows no data:

Code: Select all

Wed 12/11/2019 18:29:51.94 - START 
HAM: 
SPAM: 
Wed 12/11/2019 18:29:51.95 - STOP 
Wed 12/11/2019 18:43:48.84 - START 
HAM: 
SPAM: 
Wed 12/11/2019 18:46:03.29 - STOP 
Wed 12/11/2019 18:46:34.55 - START 
HAM: 
SPAM: 
Wed 12/11/2019 18:47:41.52 - STOP 
Wed 12/11/2019 18:52:41.20 - START 
HAM: 
SPAM: 
Wed 12/11/2019 18:52:41.20 - STOP 
Wed 12/11/2019 18:55:55.38 - START 
HAM: 
SPAM: 
Wed 12/11/2019 18:55:55.38 - STOP 
Also, when I was trying to run sa-learn.vbs thru windows scheduler, I got this pop-up:
Image

I'm worried that if I set the time like 04:00am this doesn't automate because I need to click the 'ok' button to proceed. Is there any script where to automatically run the sa-learn.vbs?

My 'temp' folder has a data but the 'bayes_db' folder has no data.

And, do I need to enabled any of these inside my local.cf or just leave it as it is?

Code: Select all

#   Use Bayesian classifier (default: 1)
#
# use_bayes 1


#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1

#   Set headers which may provide inappropriate cues to the Bayesian
#   classifier
#
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-11 19:18

Re the popup:

Code: Select all

Const Verbose         = 0
You have it set to 1, I believe.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-11 19:25

ashtec014 wrote:
2019-12-11 18:24
And, do I need to enabled any of these inside my local.cf or just leave it as it is?

Code: Select all

#   Use Bayesian classifier (default: 1)
#
# use_bayes 1


#   Bayesian classifier auto-learning (default: 1)
#
# bayes_auto_learn 1

#   Set headers which may provide inappropriate cues to the Bayesian
#   classifier
#
# bayes_ignore_header X-Bogosity
# bayes_ignore_header X-Spam-Flag
# bayes_ignore_header X-Spam-Status
Try this instead. Make sure to change bayes_path to the correct folder. SA windows user account needs access permission to the folder, so don't put it somewhere that SA cannot be read/write. Restart required.

Code: Select all

use_bayes 1
bayes_path X:\sa-learn\.spamassassin\bayes
bayes_auto_learn 0
# bayes_ignore_header X-Bogosity
bayes_ignore_header X-Spam-Flag
bayes_ignore_header X-Spam-Status
You DO NOT want to use autolearn with this script. I'm not even sure if it will work at all. But autolearn is the EXACT OPPOSITE of bayes training.
Last edited by palinka on 2019-12-11 19:34, edited 1 time in total.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-11 19:32

Also....

https://spamassassin.apache.org/full/3. ... _Conf.html
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.
You can change the minimum if you need to, but I think it will work better with the default minimum of 200. I trust these guys to know what's best. :mrgreen:

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-12 09:16

palinka wrote:
2019-12-11 19:18
Re the popup:

Code: Select all

Const Verbose         = 0
You have it set to 1, I believe.
Hi Palinka, I changed this to

Code: Select all

Const Verbose         = 1
but still I am getting the pop after running the task scheduler.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-12 09:18

palinka wrote:
2019-12-11 19:32
Also....

https://spamassassin.apache.org/full/3. ... _Conf.html
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.
You can change the minimum if you need to, but I think it will work better with the default minimum of 200. I trust these guys to know what's best. :mrgreen:
Regarding this one:
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
Do I need to add this as well to local.cf or where do I find this?

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-12 13:47

ashtec014 wrote:
2019-12-12 09:16
palinka wrote:
2019-12-11 19:18
Re the popup:

Code: Select all

Const Verbose         = 0
You have it set to 1, I believe.
Hi Palinka, I changed this to

Code: Select all

Const Verbose         = 1
but still I am getting the pop after running the task scheduler.
Should be 0 for no popup.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-12 13:50

ashtec014 wrote:
2019-12-12 09:18
palinka wrote:
2019-12-11 19:32
Also....

https://spamassassin.apache.org/full/3. ... _Conf.html
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
To be accurate, the Bayes system does not activate until a certain number of ham (non-spam) and spam have been learned. The default is 200 of each ham and spam, but you can tune these up or down with these two settings.
You can change the minimum if you need to, but I think it will work better with the default minimum of 200. I trust these guys to know what's best. :mrgreen:
Regarding this one:
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
Do I need to add this as well to local.cf or where do I find this?
You don't need to add it. But you will need 200 spams and 200 hams for bayes to work. If you don't have that many and can't wait, you can add those lines to local.cf and lower the numbers. But I recommend waiting for 200.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-12 14:43

palinka wrote:
2019-12-12 13:47
ashtec014 wrote:
2019-12-12 09:16
palinka wrote:
2019-12-11 19:18
Re the popup:

Code: Select all

Const Verbose         = 0
You have it set to 1, I believe.
Hi Palinka, I changed this to

Code: Select all

Const Verbose         = 1
but still I am getting the pop after running the task scheduler.
Should be 0 for no popup.
Pop-up is still exist after reverting it back to 0.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-12 14:45

palinka wrote:
2019-12-12 13:50
ashtec014 wrote:
2019-12-12 09:18
palinka wrote:
2019-12-11 19:32
Also....

https://spamassassin.apache.org/full/3. ... _Conf.html



You can change the minimum if you need to, but I think it will work better with the default minimum of 200. I trust these guys to know what's best. :mrgreen:
Regarding this one:
bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
Do I need to add this as well to local.cf or where do I find this?
You don't need to add it. But you will need 200 spams and 200 hams for bayes to work. If you don't have that many and can't wait, you can add those lines to local.cf and lower the numbers. But I recommend waiting for 200.
Thank you this is noted. So far spams are not yet 200 but hams I'm sure its already more than 200 but I'm gonna wait as recommended.

tunis
Senior user
Senior user
Posts: 260
Joined: 2015-01-05 20:22
Location: Sweden

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by tunis » 2019-12-12 16:06

ashtec014 wrote:
2019-12-12 14:43
palinka wrote:
2019-12-12 13:47
ashtec014 wrote:
2019-12-12 09:16


Hi Palinka, I changed this to

Code: Select all

Const Verbose         = 1
but still I am getting the pop after running the task scheduler.
Should be 0 for no popup.
Pop-up is still exist after reverting it back to 0.
Are call it with cscript?

If you use wscript you get popups.
HMS 5.6.8 B2494.25 on Windows Server 2019 Core VM.
HMS 5.6.8 B2534.28 on Windows Server 2016 Core VM.
HMS 5.6.7 B2425.16 on Windows Server 2012 R2 Core VM.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-12 16:17

tunis wrote:
2019-12-12 16:06
ashtec014 wrote:
2019-12-12 14:43
palinka wrote:
2019-12-12 13:47


Should be 0 for no popup.
Pop-up is still exist after reverting it back to 0.
Are call it with cscript?

If you use wscript you get popups.
Unfortunately no script. I just normally called it thru windows scheduler. I have no idea how to use cscript. Can you please help me how? Thank you

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2019-12-12 16:46

Here's what I have. I only changed the password and domain names.

Code: Select all

Option Explicit
'
' Version 0.6.1 28/07-2018, Soren Rathje - Curley brackets still acting up, reverting to copying mails.
' Version 0.6.0 26/07-2018, Soren Rathje - Experimental support for both sa-learn AND spamc -L.
' Version 0.5.a 26/07-2018, Soren Rathje - Experimental rewrite of filelist to fix curly brace problem in sa-learn.
' Version 0.5.0 25/07-2018, Soren Rathje - Multiple domains, skipping non-existing folders plus reworked code.
' Version 0.4.3 30/05-2018, Soren Rathje - Introduced two special folders; non-delivered SPAM and False Positives.
' Version 0.4.2 28/05-2016, Soren Rathje - Compatibility issues (curly brackets bug in sa-learn).
' Version 0.4.1 25/11-2014, Soren Rathje - Compatibility issues.
' Version 0.4.0 27/10-2014, Soren Rathje - Changed error logging.
' Version 0.3.0 11/10-2014, Soren Rathje - Bugfixing & Log error to Eventlog if IMAPFolder is missing
' Version 0.2.0 30/08-2014, Soren Rathje - Selection changed to DAYS.
' Version 0.1.0 09-08-2014, Soren Rathje - Initial version.
'
' Configuration parameters
'
'     Administrative and automation accounts can be excluded from processing
'     by defining them in "ExcludeList"
'
Const Administrator   = "Administrator"
Const Secret          = "supersecretpassword"

Const ExcludeList     = "user@mydomain.tld, user@anotherdomain.tld"
Const DomainList      = "mydomain.tld, anotherdomain.tld, thirddomain.tld"
Const SPAMFolders     = "SPAM, Junk, Junk E-mail"
Const HAMFolders      = "INBOX, HAM"

Const Batch           = 1                              ' 0 = spamc, 1 = sa-learn
Const SALearn         = "X:\sa-learn\sa-learn.cmd" ' Intermediate commandfile
Const TempDir         = "X:\sa-learn\temp\"        ' Need permission for create, read & write
Const LogDir          = "C:\SpamAssassin\logs\"        ' Need permission for create, read & write
Const BayesDir        = "X:\sa-learn\.spamassassin\"    ' Need permission for create, read & write
Const RetainDays      = 7
Const Verbose         = 0

Sub BuildList(a, mExcludes, b, mDays, mTemp, mType)
   Dim i, j, k, l, strFileName
   Dim oFile, oDomain, oAccount, oMessage, oMessages
   Dim mDomain, mDomains, mFolder, mFolders
   
   If Batch Then Set oFile = oFSO.CreateTextFile(mTemp & mType & ".cmd",True)
   mDomains = Split(a, ",")
   mFolders = Split(b, ",")
   If Verbose Then WScript.Echo "Type: " & mType
   For Each mDomain In mDomains
      mDomain = Trim(mDomain)
      Set oDomain = oApp.Domains.ItemByName(mDomain)
      If Verbose Then WScript.Echo "     Domain: " & oDomain.Name
      For i = 0 To oDomain.Accounts.Count - 1
         Set oAccount = oDomain.Accounts.Item(i)
         If InStr(mExcludes, oAccount.Address) = 0 And oAccount.Active Then
            If Verbose Then WScript.Echo "          Account: " & oAccount.Address
            For Each mFolder In mFolders
               mFolder = Trim(mFolder)
               On Error Resume Next
               If oAccount.IMAPFolders.ItemByName(mFolder) Is Nothing Then
                  On Error Goto 0
               Else
                  On Error Goto 0
                  If Verbose Then WScript.Echo "               Folder: " & mFolder
                  Set oMessages  = oAccount.IMAPFolders.ItemByName(mFolder).Messages
                  If Not IsNull(oMessages) Then
                     For j = 0 To oMessages.Count - 1
                        Set oMessage = oMessages.Item(j)
                        If oMessage.InternalDate > CDate(Now - mDays) Then
                           If Batch Then
                              oFile.Write "COPY " & Chr(34) & oMessage.FileName & Chr(34) & " " & mTemp & mType & "\" & CLng(oMessage.ID) & ".eml /Y" & vbCrLf
                           Else
                              oCMD.Run "cmd.exe /C spamc.exe -d " & SAHost & " -p " & SAPort & " -L " & mType & " < " & Chr(34) & oMessage.FileName & Chr(34), Verbose, True
                           End If
                        End If
                     Next
                  End If
               End If
            Next
         End If
      Next
   Next
   If Batch Then oFile.Close
End Sub

Dim oFSO : Set oFSO = CreateObject("Scripting.FileSystemObject")
Dim oApp : Set oApp = CreateObject("hMailServer.Application")
Dim oCMD : Set oCMD = CreateObject("WScript.Shell")
Call oApp.Authenticate(Administrator, Secret)
Dim SAHost : SAHost = oApp.Settings.AntiSpam.SpamAssassinHost
Dim SAPort : SAPort = oApp.Settings.AntiSpam.SpamAssassinPort

'
' Find HAM messages
'
If Verbose Then WScript.Echo "Processing HAM mails"
Call BuildList(DomainList, ExcludeList, HAMFolders, RetainDays, TempDir, "ham")

'
' Find SPAM messages
'
If Verbose Then WScript.Echo "Processing SPAM mails"
Call BuildList(DomainList, ExcludeList, SPAMFolders, RetainDays, TempDir, "spam")

'
' Execute file copy and sa-learn.exe - sequentially - no StdOut.
'
If Batch Then
   ' On Error Resume Next
   
   If Verbose Then WScript.Echo "Copying SPAM mails"
   oCMD.Run "cmd.exe /C " & TempDir & "spam.cmd", Verbose, True
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oCMD.Run SPAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If
   
   If Verbose Then WScript.Echo "Copying HAM mails"
   oCMD.Run "cmd.exe /C " & TempDir & "ham.cmd", Verbose, True
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oCMD.Run HAMCopy, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If
   
   If Verbose Then WScript.Echo "Starting the learning process ... "
   oCMD.Run "cmd.exe /C " & SALearn, Verbose, True
   If Err.Number Then
      EventLog.Write ("Exception   : sa-learn.vbs -> oCMD.Run SALearn, 0, true")
      EventLog.Write ("Error       : " & Err.Number)
      EventLog.Write ("Error (hex) : 0x" & Hex(Err.Number))
      EventLog.Write ("Source      : " & Err.Source)
      EventLog.Write ("Description : " & Err.Description)
      Err.Clear
   End If
   
   ' On Error Goto 0
   
End If

As you can see, by setting "Const Verbose = 0" it will bypass lines like:

Code: Select all

   If Verbose Then WScript.Echo "Starting the learning process ... "
If that's not happening for you, then you can delete all the verbose if statements. But you probably have some silly mistake like your scheduled task is using a different file than the one you're editing, or you didn't save it, or something dumb like that.

tunis
Senior user
Senior user
Posts: 260
Joined: 2015-01-05 20:22
Location: Sweden

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by tunis » 2019-12-12 17:02

ashtec014 wrote:
2019-12-12 16:17
tunis wrote:
2019-12-12 16:06
ashtec014 wrote:
2019-12-12 14:43


Pop-up is still exist after reverting it back to 0.
Are call it with cscript?

If you use wscript you get popups.
Unfortunately no script. I just normally called it thru windows scheduler. I have no idea how to use cscript. Can you please help me how? Thank you
You call it like this in windows scheduler: "cscript.exe" C:\path\sa-learn.vbs

Code: Select all

schtasks /create /ru "SYSTEM" /tn "SA learn" /tr "\"cscript.exe\" C:\path\sa-learn.vbs" /sc DAILY /mo 1
HMS 5.6.8 B2494.25 on Windows Server 2019 Core VM.
HMS 5.6.8 B2534.28 on Windows Server 2016 Core VM.
HMS 5.6.7 B2425.16 on Windows Server 2012 R2 Core VM.

ashtec014
Normal user
Normal user
Posts: 41
Joined: 2019-09-05 11:56

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by ashtec014 » 2019-12-14 09:03

tunis wrote:
2019-12-12 17:02
You call it like this in windows scheduler: "cscript.exe" C:\path\sa-learn.vbs

Code: Select all

schtasks /create /ru "SYSTEM" /tn "SA learn" /tr "\"cscript.exe\" C:\path\sa-learn.vbs" /sc DAILY /mo 1
This one works for me. I ran it via command prompt then attempted to run using windows scheduler and got the logs. Thanks so much guys! I really appreciate your help.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 15:52

SorenR wrote:
2018-07-26 19:21
OK, this is slowly getting out of hand ... :mrgreen:

Support for "spamc.exe -L" and "sa-learn.exe". For spamc to work you must run spamd with "--allow-tell".

When Batch = 0 (spamc) mode is selected there will be no sync of the database and no backup. Spamd is doing the sync on-the-fly and backup is on you.

sa-learn.vbs Version 0.6.0

Code: Select all

oCMD.Run "cmd.exe /C spamc.exe -d " & SAHost & " -p " & SAPort & " -L " & mType & " < " & Chr(34) & oMessage.FileName & Chr(34), Verbose, True
Did you ever get this working?

Also, how do I run spamd with --allow-tell? Is that an argument in the spamd windows service?

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-20 16:32

palinka wrote:
2020-11-20 15:52
SorenR wrote:
2018-07-26 19:21
OK, this is slowly getting out of hand ... :mrgreen:

Support for "spamc.exe -L" and "sa-learn.exe". For spamc to work you must run spamd with "--allow-tell".

When Batch = 0 (spamc) mode is selected there will be no sync of the database and no backup. Spamd is doing the sync on-the-fly and backup is on you.

sa-learn.vbs Version 0.6.0

Code: Select all

oCMD.Run "cmd.exe /C spamc.exe -d " & SAHost & " -p " & SAPort & " -L " & mType & " < " & Chr(34) & oMessage.FileName & Chr(34), Verbose, True
Did you ever get this working?

Also, how do I run spamd with --allow-tell? Is that an argument in the spamd windows service?
Yes and Yes ...
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 16:35

SorenR wrote:
2020-11-20 16:32
palinka wrote:
2020-11-20 15:52
SorenR wrote:
2018-07-26 19:21
OK, this is slowly getting out of hand ... :mrgreen:

Support for "spamc.exe -L" and "sa-learn.exe". For spamc to work you must run spamd with "--allow-tell".

When Batch = 0 (spamc) mode is selected there will be no sync of the database and no backup. Spamd is doing the sync on-the-fly and backup is on you.

sa-learn.vbs Version 0.6.0

Code: Select all

oCMD.Run "cmd.exe /C spamc.exe -d " & SAHost & " -p " & SAPort & " -L " & mType & " < " & Chr(34) & oMessage.FileName & Chr(34), Verbose, True
Did you ever get this working?

Also, how do I run spamd with --allow-tell? Is that an argument in the spamd windows service?
Yes and Yes ...
OK cool. I'm cooking something.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 17:38

SorenR wrote:
2020-11-20 16:32
Yes and Yes ...
My test script seems to run fine even though I did not change "--allow-tell".

First operational run had errors for file too large for spamc to scan:

Code: Select all

Finished feeding 190 messages to Bayes in 1 minute 47 seconds
Learned tokens from 141 of 185 HAM messages fed to Bayes
Learned tokens from 5 of 5 SPAM messages fed to Bayes
----------------------------
Successfully synced Bayes database
Then I changed the script to ignore too-large files (> 512k) and got this:

Code: Select all

Finished feeding 192 messages to Bayes in 1 minute 40 seconds
Learned tokens from 1 of 187 HAM messages fed to Bayes
Learned tokens from 1 of 5 SPAM messages fed to Bayes
----------------------------
Successfully synced Bayes database
Do you think I'm missing something?

I'm trying to incorporate this into my powershell backup routine.

Test script:

Code: Select all

<#

.SYNOPSIS
	Prune Messages & Feed Bayes

.DESCRIPTION
	Delete messages in specified folders older than N days
	Feeds messages to Bayes

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, deletes all messages older than N days within that folder and all subfolders within
	Deletes empty subfolders within matching folders if DeleteEmptySubFolders set to True in config
	Feeds ham and spam to Bayes

.PARAMETER 

	
.NOTES
	Folder name matching occurs at any level folder
	Empty folders are assumed to be trash if they're located in this script
	Only empty folders found in levels BELOW matching level will be deleted
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = "supersecretpassword"     # hMailServer Admin password
$DoDelete              = $False           # FOR TESTING - set to false to run and report results without deleting messages and folders
$DoSpamC               = $False           # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham

$PruneSubFolders       = $True            # True will prune all folders in levels below name matching folders
$PruneEmptySubFolders  = $True            # True will delete empty subfolders below the matching level unless a subfolder within contains messages
$DaysBeforeDelete      = 30               # Number of days to keep messages in pruned folders
$PruneFolders          = "2nd level|Trash|Deleted|Junk|Spam|APCUPSD|BrotherMFC|Administrative|Horde|SAUserList|Chase|Unsubscribes"  # Names of IMAP folders you want to cleanup - uses regex

$FeedBayes             = $True            # Feed spamC with spam/ham to populate bayes database
$HamFolders            = "INBOX|Ham"      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = "Spam|Junk"      # Spam folders to feed messages to spamC for bayes database - uses regex
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = "C:\Program Files\JAM Software\SpamAssassin for Windows"  # SpamAssassin Install Directory


<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}

<#  Set pruning variables  #>
Set-Variable -Name TotalDeletedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalDeletedFolders -Value 0 -Option AllScope
Set-Variable -Name DeleteMessageErrors -Value 0 -Option AllScope
Set-Variable -Name DeleteFolderErrors -Value 0 -Option AllScope

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayDeletedFolders = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($PruneSubFolders) {GetMessages $SubFolder}
			} Else {
				If ($DeleteEmptySubFolders) {$ArrayDeletedFolders += $SubFolderID}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	If ($DeleteEmptySubFolders) {
		$ArrayDeletedFolders | ForEach {
			$CheckFolder = $Folder.SubFolders.ItemByDBID($_)
			$FolderName = $CheckFolder.Name
			If (SubFoldersEmpty $CheckFolder) {
				Try {
					If ($DoDelete) {$Folder.SubFolders.DeleteByDBID($_)}
					$TotalDeletedFolders++
					Debug "Deleted empty subfolder $FolderName in $AccountAddress"
				}
				Catch {
					$DeleteFolderErrors++
					Debug "[ERROR] Deleting empty subfolder $FolderName in $AccountAddress"
					Debug "[ERROR] : $Error"
				}
				$Error.Clear()
			}
		}
	}
	$ArrayDeletedFolders.Clear()
}

Function SubFoldersEmpty ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			If ($SubFolder.Messages.Count -gt 0) {
				Return $False
				Break
			}
			If ($SubFolder.SubFolders.Count -gt 0) {
				SubFoldersEmpty $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	} Else {
		Return $True
	}
}

Function GetMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If ($SubFolderName -match [regex]$PruneFolders) {
				GetSubFolders $SubFolder
				GetMessages $SubFolder
			} Else {
				GetMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetMessages ($Folder) {
	$IterateMessage = 0
	$ArrayMessagesToDelete = @()
	$DeletedMessages = 0
	If ($Folder.Messages.Count -gt 0) {
		Do {
			$Message = $Folder.Messages.Item($IterateMessage)
			If ($Message.InternalDate -lt ((Get-Date).AddDays(-$DaysBeforeDelete))) {
				$ArrayMessagesToDelete += $Message.ID
			}
			$IterateMessage++
		} Until ($IterateMessage -eq $Folder.Messages.Count)
	}
	$ArrayMessagesToDelete | ForEach {
		$AFolderName = $Folder.Name
		Try {
			If ($DoDelete) {$Folder.Messages.DeleteByDBID($_)}
			$DeletedMessages++
			$TotalDeletedMessages++
		}
		Catch {
			$DeleteMessageErrors++
			Debug "[ERROR] Deleting messages from folder $AFolderName in $AccountAddress"
			Debug "[ERROR] $Error"
		}
		$Error.Clear()
	}
	If ($DeletedMessages -gt 0) {
		Debug "Deleted $DeletedMessages messages from $AFolderName in $AccountAddress"
	}
	$ArrayMessagesToDelete.Clear()
}

Function PruneMessages {
	
	$Error.Clear()
	$BeginDeletingOldMessages = Get-Date
	Debug "----------------------------"
	Debug "Begin deleting messages older than $DaysBeforeDelete days"
	If (-not($DoDelete)) {
		Debug "Delete disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$IterateDomains = 0
	Do {
		$hMSDomain = $hMS.Domains.Item($IterateDomains)
		If ($hMSDomain.Active) {
			$IterateAccounts = 0
			Do {
				$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
				If ($hMSAccount.Active) {
					$AccountAddress = $hMSAccount.Address
					$IterateIMAPFolders = 0
					If ($hMSAccount.IMAPFolders.Count -gt 0) {
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If ($hMSIMAPFolder.Name -match [regex]$PruneFolders) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetSubFolders $hMSIMAPFolder
								} # IF SUBFOLDER COUNT > 0
								GetMessages $hMSIMAPFolder
							} # IF FOLDERNAME MATCH REGEX
							Else {
								GetMatchFolders $hMSIMAPFolder
							} # IF NOT FOLDERNAME MATCH REGEX
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					} # IF IMAPFOLDER COUNT > 0
				} #IF ACCOUNT ACTIVE
				$IterateAccounts++
			} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
		} # IF DOMAIN ACTIVE
		$IterateDomains++
	} Until ($IterateDomains -eq $hMS.Domains.Count)

	If ($DeleteMessageErrors -gt 0) {
		Debug "Finished Message Pruning : $DeleteMessageErrors Errors present"
		Email "[ERROR] Message Pruning : $DeleteMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalDeletedMessages -gt 0) {
			Debug "Finished pruning $TotalDeletedMessages messages in $(ElapsedTime $BeginDeletingOldMessages)"
			Email "[OK] Finished pruning $TotalDeletedMessages messages in $(ElapsedTime $BeginDeletingOldMessages)"
		} Else {
			Debug "No messages older than $DaysBeforeDelete days to prune"
			Email "[OK] No messages older than $DaysBeforeDelete days to prune"
		}
	}
	If ($DeleteFolderErrors -gt 0) {
		Debug "Deleting Empty Folders : $DeleteFolderErrors Errors present"
		Email "[ERROR] Deleting Empty Folders : $DeleteFolderErrors Errors present : Check debug log"
	} Else {
		If ($TotalDeletedFolders -gt 0) {
			Debug "Deleted $TotalDeletedFolders empty subfolders"
			Email "[OK] Deleted $TotalDeletedFolders empty subfolders"
		} Else {
			Debug "No empty subfolders deleted"
			Email "[OK] No empty subfolders deleted"
		}
	}
}

PruneMessages

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($PruneSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match [regex]$HamFolders) -or ($SubFolderName -match [regex]$SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match [regex]$HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match [regex]$SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {$LearnedHamMessages++}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {$LearnedSpamMessages++}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Fed $HamFedMessages HAM messages to SpamC from $AccountAddress"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Fed $SpamFedMessages SPAM messages to SpamC from $AccountAddress"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin deleting messages older than $DaysBeforeDelete days"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	$IterateDomains = 0
	Do {
		$hMSDomain = $hMS.Domains.Item($IterateDomains)
		If ($hMSDomain.Active) {
			$IterateAccounts = 0
			Do {
				$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
				If ($hMSAccount.Active) {
					$AccountAddress = $hMSAccount.Address
					$IterateIMAPFolders = 0
					If ($hMSAccount.IMAPFolders.Count -gt 0) {
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If (($hMSIMAPFolder.Name -match [regex]$HamFolders) -or ($hMSIMAPFolder.Name -match [regex]$SpamFolders)) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetBayesSubFolders $hMSIMAPFolder
								} # IF SUBFOLDER COUNT > 0
								GetBayesMessages $hMSIMAPFolder
							} # IF FOLDERNAME MATCH REGEX
							Else {
								GetBayesMatchFolders $hMSIMAPFolder
							} # IF NOT FOLDERNAME MATCH REGEX
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					} # IF IMAPFOLDER COUNT > 0
				} #IF ACCOUNT ACTIVE
				$IterateAccounts++
			} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
		} # IF DOMAIN ACTIVE
		$IterateDomains++
	} Until ($IterateDomains -eq $hMS.Domains.Count)

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Errors present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalHamFedMessages -gt 0) {
			Debug "Learned tokens from $LearnedHamMessages of $TotalHamFedMessages HAM messages fed to Bayes"
			Email "[OK] Learned tokens from $LearnedHamMessages of $TotalHamFedMessages HAM messages fed to Bayes"
		} Else {
			Debug "No HAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
		}
	}
	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Errors present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalSpamFedMessages -gt 0) {
			Debug "Learned tokens from $LearnedSpamMessages of $TotalSpamFedMessages SPAM messages fed to Bayes"
			Email "[OK] Learned tokens from $LearnedSpamMessages of $TotalSpamFedMessages SPAM messages fed to Bayes"
		} Else {
			Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
		}
	}
}

FeedBayes

Try {
	& cmd /c "`"$SADir\sa-learn.exe`" --sync"
	Debug "----------------------------"
	Debug "Successfully synced Bayes database"
}
Catch {
	$Err = $Error[0]
	Debug "----------------------------"
	Debug "[ERROR] syncing Bayes : $Err"
}



palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 17:45

Ehhh.... Never mind. Just took a dump.

Code: Select all

0.000          0          0          0  non-token data: last journal sync atime
At least I know what to do now. :mrgreen:

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-20 17:47

palinka wrote:
2020-11-20 17:38

First operational run had errors for file too large for spamc to scan:

Code: Select all

Finished feeding 190 messages to Bayes in 1 minute 47 seconds
Learned tokens from 141 of 185 HAM messages fed to Bayes
Learned tokens from 5 of 5 SPAM messages fed to Bayes
----------------------------
Successfully synced Bayes database
Then I changed the script to ignore too-large files (> 512k) and got this:

Code: Select all

Finished feeding 192 messages to Bayes in 1 minute 40 seconds
Learned tokens from 1 of 187 HAM messages fed to Bayes
Learned tokens from 1 of 5 SPAM messages fed to Bayes
----------------------------
Successfully synced Bayes database
Do you think I'm missing something?
Nope, I can see you received 2 new message, 1 of the 2 new messages was used to learn some HAM tokens and one SPAM message was re-visited :mrgreen:
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 17:51

SorenR wrote:
2020-11-20 17:47
Nope, I can see you received 2 new message, 1 of the 2 new messages was used to learn some HAM tokens and one SPAM message was re-visited :mrgreen:
But spamd will not use these tokens unless the journal is synced - correct?

I think the tokens are being placed into the db, but not being used to score spam.

I need that --allow-tell... :roll:

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-20 18:09

Code: Select all

#   If this option is set, whenever SpamAssassin does Bayes learning, it will
#   put the information into the journal instead of directly into the database.
#   This lowers contention for locking the database to execute an update, but
#   will also cause more access to the journal and cause a delay before the
#   updates are actually committed to the Bayes database.
#
# bayes_learn_to_journal (default: 0)

Code: Select all

-l, --allow-tell
Allow learning and forgetting (to a local Bayes database), reporting and revoking (to a remote database) by spamd. The client issues a TELL command to tell what type of message is being processed and whether local (learn/forget) or remote (report/revoke) databases should be updated.
No Journal involved.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 18:25

SorenR wrote:
2020-11-20 18:09

Code: Select all

#   If this option is set, whenever SpamAssassin does Bayes learning, it will
#   put the information into the journal instead of directly into the database.
#   This lowers contention for locking the database to execute an update, but
#   will also cause more access to the journal and cause a delay before the
#   updates are actually committed to the Bayes database.
#
# bayes_learn_to_journal (default: 0)

Code: Select all

-l, --allow-tell
Allow learning and forgetting (to a local Bayes database), reporting and revoking (to a remote database) by spamd. The client issues a TELL command to tell what type of message is being processed and whether local (learn/forget) or remote (report/revoke) databases should be updated.
No Journal involved.
Now I'm really confused. Do I need to re-set my spamd service with argument --allow-tell or not?

It looks like its working, but I'm not sure and I'm not even sure how to test it.

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-20 20:54

Add "--allow-tell" to your service to allow SPAMC to report SPAM/HAM.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 22:22

SorenR wrote:
2020-11-20 20:54
Add "--allow-tell" to your service to allow SPAMC to report SPAM/HAM.
✓ Done

One last question - is syncing required with spamc or only sa-learn?

In your sa-learn.cmd you had a command for sa-learn --sync.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-20 23:50

SorenR wrote:
2020-11-20 18:09
No Journal involved.
I guess that answers my ^ question. :D

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-21 00:26

OK, here's my version. Sorry about the commingled script for deleting old messages - I'm too lazy to untangle it, so I'm just pasting it in its entirety. This will be in my backup & offsite upload routine shortly.

Code: Select all

<#

.SYNOPSIS
	Prune Messages & Feed Bayes

.DESCRIPTION
	Delete messages in specified folders older than N days
	Feeds messages to Bayes

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, deletes all messages older than N days within that folder and all subfolders within
	Deletes empty subfolders within matching folders if DeleteEmptySubFolders set to True in config
	Feeds ham and spam to Bayes

.PARAMETER 

	
.NOTES
	Folder name matching occurs at any level folder
	Empty folders are assumed to be trash if they're located in this script
	Only empty folders found in levels BELOW matching level will be deleted
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = 'supersecretpassword'     # hMailServer Admin password
$DoDelete              = $False           # FOR TESTING - set to false to run and report results without deleting messages and folders
$DoSpamC               = $True            # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham

$PruneSubFolders       = $True            # True will prune all folders in levels below name matching folders
$PruneEmptySubFolders  = $True            # True will delete empty subfolders below the matching level unless a subfolder within contains messages
$DaysBeforeDelete      = 30               # Number of days to keep messages in pruned folders
$PruneFolders          = '2nd level|Trash|Deleted|Junk|Spam|APCUPSD|BrotherMFC|Administrative|Horde|SAUserList|Chase|Unsubscribes'  # Names of IMAP folders you want to cleanup - uses regex

$FeedBayes             = $True            # Feed spamC with spam/ham to populate bayes database
$HamFolders            = 'Inbox|Ham'      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = 'Spam|Junk'      # Spam folders to feed messages to spamC for bayes database - uses regex
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = 'C:\Program Files\JAM Software\SpamAssassin for Windows'  # SpamAssassin Install Directory


<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}
Function Plural ($Integer) {
	If ($Integer -eq 1) {$S = ""} Else {$S = "s"}
	Return $S
}

<#  Set pruning variables  #>
Set-Variable -Name TotalDeletedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalDeletedFolders -Value 0 -Option AllScope
Set-Variable -Name DeleteMessageErrors -Value 0 -Option AllScope
Set-Variable -Name DeleteFolderErrors -Value 0 -Option AllScope

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayDeletedFolders = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($PruneSubFolders) {GetMessages $SubFolder}
			} Else {
				If ($DeleteEmptySubFolders) {$ArrayDeletedFolders += $SubFolderID}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	If ($DeleteEmptySubFolders) {
		$ArrayDeletedFolders | ForEach {
			$CheckFolder = $Folder.SubFolders.ItemByDBID($_)
			$FolderName = $CheckFolder.Name
			If (SubFoldersEmpty $CheckFolder) {
				Try {
					If ($DoDelete) {$Folder.SubFolders.DeleteByDBID($_)}
					$TotalDeletedFolders++
					Debug "Deleted empty subfolder $FolderName in $AccountAddress"
				}
				Catch {
					$DeleteFolderErrors++
					Debug "[ERROR] Deleting empty subfolder $FolderName in $AccountAddress"
					Debug "[ERROR] : $Error"
				}
				$Error.Clear()
			}
		}
	}
	$ArrayDeletedFolders.Clear()
}

Function SubFoldersEmpty ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			If ($SubFolder.Messages.Count -gt 0) {
				Return $False
				Break
			}
			If ($SubFolder.SubFolders.Count -gt 0) {
				SubFoldersEmpty $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	} Else {
		Return $True
	}
}

Function GetMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If ($SubFolderName -match $PruneFolders) {
				GetSubFolders $SubFolder
				GetMessages $SubFolder
			} Else {
				GetMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetMessages ($Folder) {
	$IterateMessage = 0
	$ArrayMessagesToDelete = @()
	$DeletedMessages = 0
	If ($Folder.Messages.Count -gt 0) {
		Do {
			$Message = $Folder.Messages.Item($IterateMessage)
			If ($Message.InternalDate -lt ((Get-Date).AddDays(-$DaysBeforeDelete))) {
				$ArrayMessagesToDelete += $Message.ID
			}
			$IterateMessage++
		} Until ($IterateMessage -eq $Folder.Messages.Count)
	}
	$ArrayMessagesToDelete | ForEach {
		$AFolderName = $Folder.Name
		Try {
			If ($DoDelete) {$Folder.Messages.DeleteByDBID($_)}
			$DeletedMessages++
			$TotalDeletedMessages++
		}
		Catch {
			$DeleteMessageErrors++
			Debug "[ERROR] Deleting messages from folder $AFolderName in $AccountAddress"
			Debug "[ERROR] $Error"
		}
		$Error.Clear()
	}
	If ($DeletedMessages -gt 0) {
		Debug "Deleted $DeletedMessages message$(Plural $DeletedMessages) from $AFolderName in $AccountAddress"
	}
	$ArrayMessagesToDelete.Clear()
}

Function PruneMessages {
	
	$Error.Clear()
	$BeginDeletingOldMessages = Get-Date
	Debug "----------------------------"
	Debug "Begin deleting messages older than $DaysBeforeDelete days"
	If (-not($DoDelete)) {
		Debug "Delete disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$IterateDomains = 0
	Do {
		$hMSDomain = $hMS.Domains.Item($IterateDomains)
		If ($hMSDomain.Active) {
			$IterateAccounts = 0
			Do {
				$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
				If ($hMSAccount.Active) {
					$AccountAddress = $hMSAccount.Address
					$IterateIMAPFolders = 0
					If ($hMSAccount.IMAPFolders.Count -gt 0) {
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If ($hMSIMAPFolder.Name -match $PruneFolders) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetSubFolders $hMSIMAPFolder
								} # IF SUBFOLDER COUNT > 0
								GetMessages $hMSIMAPFolder
							} # IF FOLDERNAME MATCH REGEX
							Else {
								GetMatchFolders $hMSIMAPFolder
							} # IF NOT FOLDERNAME MATCH REGEX
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					} # IF IMAPFOLDER COUNT > 0
				} #IF ACCOUNT ACTIVE
				$IterateAccounts++
			} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
		} # IF DOMAIN ACTIVE
		$IterateDomains++
	} Until ($IterateDomains -eq $hMS.Domains.Count)

	If ($DeleteMessageErrors -gt 0) {
		Debug "Finished Message Pruning : $DeleteMessageErrors Errors present"
		Email "[ERROR] Message Pruning : $DeleteMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalDeletedMessages -gt 0) {
			Debug "Finished pruning $TotalDeletedMessages messages in $(ElapsedTime $BeginDeletingOldMessages)"
			Email "[OK] Finished pruning $TotalDeletedMessages messages in $(ElapsedTime $BeginDeletingOldMessages)"
		} Else {
			Debug "No messages older than $DaysBeforeDelete days to prune"
			Email "[OK] No messages older than $DaysBeforeDelete days to prune"
		}
	}
	If ($DeleteFolderErrors -gt 0) {
		Debug "Deleting Empty Folders : $DeleteFolderErrors Errors present"
		Email "[ERROR] Deleting Empty Folders : $DeleteFolderErrors Errors present : Check debug log"
	} Else {
		If ($TotalDeletedFolders -gt 0) {
			Debug "Deleted $TotalDeletedFolders empty subfolders"
			Email "[OK] Deleted $TotalDeletedFolders empty subfolders"
		} Else {
			Debug "No empty subfolders deleted"
			Email "[OK] No empty subfolders deleted"
		}
	}
}

<#  Feed Bayes  #>

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($PruneSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match $HamFolders) -or ($SubFolderName -match $SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	$FolderName = $Folder.Name
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match $HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match $SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {$LearnedHamMessages++}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {$LearnedSpamMessages++}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Fed $HamFedMessages HAM message$(Plural $HamFedMessages) from $FolderName in $AccountAddress"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Fed $SpamFedMessages SPAM message$(Plural $SpamFedMessages) from $FolderName in $AccountAddress"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin feeding SpamC"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	$IterateDomains = 0
	Do {
		$hMSDomain = $hMS.Domains.Item($IterateDomains)
		If ($hMSDomain.Active) {
			$IterateAccounts = 0
			Do {
				$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
				If ($hMSAccount.Active) {
					$AccountAddress = $hMSAccount.Address
					$IterateIMAPFolders = 0
					If ($hMSAccount.IMAPFolders.Count -gt 0) {
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If (($hMSIMAPFolder.Name -match $HamFolders) -or ($hMSIMAPFolder.Name -match $SpamFolders)) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetBayesSubFolders $hMSIMAPFolder
								} # IF SUBFOLDER COUNT > 0
								GetBayesMessages $hMSIMAPFolder
							} # IF FOLDERNAME MATCH REGEX
							Else {
								GetBayesMatchFolders $hMSIMAPFolder
							} # IF NOT FOLDERNAME MATCH REGEX
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					} # IF IMAPFOLDER COUNT > 0
				} #IF ACCOUNT ACTIVE
				$IterateAccounts++
			} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
		} # IF DOMAIN ACTIVE
		$IterateDomains++
	} Until ($IterateDomains -eq $hMS.Domains.Count)

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Error$(Plural $HamFedMessageErrors) present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalHamFedMessages -gt 0) {
			Debug "Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
			Email "[OK] Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
		} Else {
			Debug "No HAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
		}
	}
	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Error$(Plural $SpamFedMessageErrors) present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalSpamFedMessages -gt 0) {
			Debug "Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
			Email "[OK] Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
		} Else {
			Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
		}
	}
	Try {
		& cmd /c "`"$SADir\sa-learn.exe`" --backup > `"X:\sa-learn\.spamassassin\bayes_backup`""
		Debug "----------------------------"
		Debug "Successfully backed up Bayes database"
	}
	Catch {
		$Err = $Error[0]
		Debug "----------------------------"
		Debug "[ERROR] backing up Bayes : $Err"
	}
}

PruneMessages
FeedBayes
Last edited by palinka on 2020-11-21 00:50, edited 1 time in total.

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-21 00:47

palinka wrote:
2020-11-20 22:22
SorenR wrote:
2020-11-20 20:54
Add "--allow-tell" to your service to allow SPAMC to report SPAM/HAM.
✓ Done

One last question - is syncing required with spamc or only sa-learn?

In your sa-learn.cmd you had a command for sa-learn --sync.
sa-learn is used for batch learning as it can reference an entire directory thus using the journal make sense not to lock the database for the duration of processing the directory. "--sync" is required to update the database from the journal and activate the learned tokens.

spamc however is normally used as a one-off learning process and is done by directly addressing the database.

If SpamAssassin is running on a different host it is more efficient to "net use" the drive with the email folders and use sa-learn thus putting the major stress on the network and not the the Bayes database.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-21 00:54

SorenR wrote:
2020-11-21 00:47
sa-learn is used for batch learning as it can reference an entire directory thus using the journal make sense not to lock the database for the duration of processing the directory. "--sync" is required to update the database from the journal and activate the learned tokens.

spamc however is normally used as a one-off learning process and is done by directly addressing the database.

If SpamAssassin is running on a different host it is more efficient to "net use" the drive with the email folders and use sa-learn thus putting the major stress on the network and not the the Bayes database.
Got it. 👍

Autolearn is starting to make sense now as well. I'm not using it, but a lot of knowledge gaps have been filled in.

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-21 01:17

I did my monthly cleanup a couple of days ago...

Regular clients get to keep 30 days of SPAM in case they forgot to move a message to/from the SPAM folder and 30 days of deleted messages. Non-xpunged messages are xpunged in all mail folders.

My SPAM account get to keep 6 months worth of faul language .... :wink:

Received by the SPAM account ... SPAM account get a copy of ALL SPAM.
Tagged by SpamAssassin: 879
Not tagged by SpamAssassin: 93
Tagged by SpamAssassin AND hMailserver: 1991

Not having analysed the 1991 emails I believe I could rely on SpamAssasson for 95% of my SPAM fighting. That is the result of a well-trained Bayes database. :mrgreen:

EDIT: Actually 1974 of the 1991 are tagged by SpamAssassin ...
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-21 03:01

SorenR wrote:
2020-11-21 01:17
Not having analysed the 1991 emails I believe I could rely on SpamAssasson for 95% of my SPAM fighting. That is the result of a well-trained Bayes database. :mrgreen:
Not with all those delicious spam filters you came up with. My spam fighting is highly tilted toward reject first, ask questions later. :mrgreen:

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-21 13:39

palinka wrote:
2020-11-21 03:01
SorenR wrote:
2020-11-21 01:17
Not having analysed the 1991 emails I believe I could rely on SpamAssasson for 95% of my SPAM fighting. That is the result of a well-trained Bayes database. :mrgreen:
Not with all those delicious spam filters you came up with. My spam fighting is highly tilted toward reject first, ask questions later. :mrgreen:
Well... I reject non-RFC compliant HELO/EHLO greetings, and the following list of TLD's:

.top, .xyz, .icu, .best, .ga, .club, .press, .today, .guru, .casa, .tk, .ml, .work, .buzz, .co, .monster, .cyou

Besides any addresses flagged as Snowshoe SPAM or Lashback SPAM.

Everything else is fed to the SpamAssassin beast and the ones that survive are delivered to my clients, the rest to my SPAM account. I have a procedure for false-positives in my SPAM account however my clients handle their own false-positives by moving messages in or out of their SPAM folders.
Sometimes I do have to intervene manually and whitelist messages if flagged by RBL or SURBL.

It's a work-in-progress and you never finish :roll:

I have blacklists that I use for small amounts of non-found SPAM - it takes about 1-2 weeks before SpamAssassin have properly learned the non-found SPAM and so I can deactivate the entry in my blacklist.

My black- and whitelists have date fields and hit counts that enable me to clean up unused entries after a while. I simply deactivate any unused entries but leave them in the list so I later can activate them if problem comes back.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-21 14:57

SorenR wrote:
2020-11-21 13:39
palinka wrote:
2020-11-21 03:01
SorenR wrote:
2020-11-21 01:17
Not having analysed the 1991 emails I believe I could rely on SpamAssasson for 95% of my SPAM fighting. That is the result of a well-trained Bayes database. :mrgreen:
Not with all those delicious spam filters you came up with. My spam fighting is highly tilted toward reject first, ask questions later. :mrgreen:
Well... I reject non-RFC compliant HELO/EHLO greetings, and the following list of TLD's:

.top, .xyz, .icu, .best, .ga, .club, .press, .today, .guru, .casa, .tk, .ml, .work, .buzz, .co, .monster, .cyou

Besides any addresses flagged as Snowshoe SPAM or Lashback SPAM.

Everything else is fed to the SpamAssassin beast and the ones that survive are delivered to my clients, the rest to my SPAM account. I have a procedure for false-positives in my SPAM account however my clients handle their own false-positives by moving messages in or out of their SPAM folders.
Sometimes I do have to intervene manually and whitelist messages if flagged by RBL or SURBL.

It's a work-in-progress and you never finish :roll:

I have blacklists that I use for small amounts of non-found SPAM - it takes about 1-2 weeks before SpamAssassin have properly learned the non-found SPAM and so I can deactivate the entry in my blacklist.

My black- and whitelists have date fields and hit counts that enable me to clean up unused entries after a while. I simply deactivate any unused entries but leave them in the list so I later can activate them if problem comes back.
Yep. I got all that too.

Thanks to copy/paste your scripts. :mrgreen:

Actually, you taught me so much, I'm doing my own thing now. We can collaborate finally instead of the one-way street it used to be. :D

You should check out my backup routine. Its evolving into a one-stop-shop for hmailserver daily maintenance. Bayes training now included!

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-21 20:13

Here you go - untangled version. Bayes feeding only script with a couple of minor fix-ups. Stick a fork in it. Its done.

Code: Select all

<#

.SYNOPSIS
	Feed Bayes

.DESCRIPTION
	Feeds messages to Bayes for spam/ham learning

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, feeds messages to spamc for learning

.PARAMETER 

	
.NOTES
	Add "--allow-tell" argument to your SPAMD service to allow SPAMC to report SPAM/HAM
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = 'supersecretpassword'     # hMailServer Admin password
$DoSpamC               = $True            # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham
$BayesSubFolders       = $True            # True will feed messages from regex name matching subfolders
$HamFolders            = 'Inbox|Ham'      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = 'Spam|Junk'      # Spam folders to feed messages to spamC for bayes database - uses regex
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = 'C:\Program Files\JAM Software\SpamAssassin for Windows'  # SpamAssassin Install Directory
$BayesBackupLocation   = "X:\sa-learn\.spamassassin\bayes_backup"  # Bayes backup folder

<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}
Function Plural ($Integer) {
	If ($Integer -eq 1) {$S = ""} Else {$S = "s"}
	Return $S
}

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($BayesSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match $HamFolders) -or ($SubFolderName -match $SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	$LearnedHamMessagesFolder = 0
	$LearnedSpamMessagesFolder = 0
	$FolderName = $Folder.Name
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match $HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match $SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedHamMessages++
						$LearnedHamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFedMessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedSpamMessages++
						$LearnedSpamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedHamMessagesFolder of $HamFedMessages HAM message$(Plural $HamFedMessages) fed from $FolderName in $AccountAddress"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedSpamMessagesFolder of $SpamFedMessages SPAM message$(Plural $SpamFedMessages) fed from $FolderName in $AccountAddress"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin learning Bayes tokens from messages newer than $BayesDays days"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	$IterateDomains = 0
	Do {
		$hMSDomain = $hMS.Domains.Item($IterateDomains)
		If ($hMSDomain.Active) {
			$IterateAccounts = 0
			Do {
				$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
				If ($hMSAccount.Active) {
					$AccountAddress = $hMSAccount.Address
					$IterateIMAPFolders = 0
					If ($hMSAccount.IMAPFolders.Count -gt 0) {
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If (($hMSIMAPFolder.Name -match $HamFolders) -or ($hMSIMAPFolder.Name -match $SpamFolders)) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetBayesSubFolders $hMSIMAPFolder
								} # IF SUBFOLDER COUNT > 0
								GetBayesMessages $hMSIMAPFolder
							} # IF FOLDERNAME MATCH REGEX
							Else {
								GetBayesMatchFolders $hMSIMAPFolder
							} # IF NOT FOLDERNAME MATCH REGEX
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					} # IF IMAPFOLDER COUNT > 0
				} #IF ACCOUNT ACTIVE
				$IterateAccounts++
			} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
		} # IF DOMAIN ACTIVE
		$IterateDomains++
	} Until ($IterateDomains -eq $hMS.Domains.Count)

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	Debug "----------------------------"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Error$(Plural $HamFedMessageErrors) present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalHamFedMessages -gt 0) {
			Debug "Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
			Email "[OK] Bayes HAM learn from $LearnedHamMessages of $TotalHamFedMessages message$(Plural $TotalHamFedMessages)"
		} Else {
			Debug "No HAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
		}
	}
	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Error$(Plural $SpamFedMessageErrors) present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	} Else {
		If ($TotalSpamFedMessages -gt 0) {
			Debug "Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
			Email "[OK] Bayes SPAM learn from $LearnedSpamMessages of $TotalSpamFedMessages message$(Plural $TotalSpamFedMessages)"
		} Else {
			Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
			Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
		}
	}
	Debug "----------------------------"
	Try {
		& cmd /c "`"$SADir\sa-learn.exe`" --backup > `"$BayesBackupLocation`"" -ErrorAction Stop
		Debug "Successfully backed up Bayes database"
	}
	Catch {
		$Err = $Error[0]
		Debug "[ERROR] backing up Bayes : $Err"
	}
}

FeedBayes

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-23 01:23

I made some changes RvdH suggested in the "delete older than N days" thread. The two use the same com routine for finding messages. I also figured out a way to successfully trap errors sa-learn --backup.

Code: Select all

<#

.SYNOPSIS
	Feed Bayes Database

.DESCRIPTION
	Feeds messages to SPAMC for Bayes spam/ham learning

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, feeds messages to spamc for learning

.PARAMETER 

	
.NOTES
	Add "--allow-tell" argument to your SPAMD service to allow SPAMC to report SPAM/HAM
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = 'secretpassword' # hMailServer Admin password
$DoSpamC               = $True            # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham
$BayesSubFolders       = $True            # True will feed messages from regex name matching subfolders
$HamFolders            = 'Inbox|Ham'      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = 'Spam|Junk'      # Spam folders to feed messages to spamC for bayes database - uses regex
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = 'C:\Program Files\JAM Software\SpamAssassin for Windows'  # SpamAssassin Install Directory
$BayesBackupLocation   = "X:\sa-learn\.spamassassin\bayes_backup"  # Bayes backup FILE

<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}
Function Plural ($Integer) {
	If ($Integer -eq 1) {$S = ""} Else {$S = "s"}
	Return $S
}

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($BayesSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match $HamFolders) -or ($SubFolderName -match $SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	$LearnedHamMessagesFolder = 0
	$LearnedSpamMessagesFolder = 0
	$FolderName = $Folder.Name
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match $HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match $SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedHamMessages++
						$LearnedHamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFedMessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedSpamMessages++
						$LearnedSpamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $AccountAddress"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedHamMessagesFolder of $HamFedMessages HAM message$(Plural $HamFedMessages) fed from $FolderName in $AccountAddress"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedSpamMessagesFolder of $SpamFedMessages SPAM message$(Plural $SpamFedMessages) fed from $FolderName in $AccountAddress"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin learning Bayes tokens from messages newer than $BayesDays days"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	$IterateDomains = 0
	If ($hMS.Domains.Count -gt 0) {
		Do {
			$hMSDomain = $hMS.Domains.Item($IterateDomains)
			If ($hMSDomain.Active) {
				$IterateAccounts = 0
				If ($hMSDomain.Accounts.Count -gt 0) {
					Do {
						$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
						If ($hMSAccount.Active) {
							$AccountAddress = $hMSAccount.Address
							$IterateIMAPFolders = 0
							If ($hMSAccount.IMAPFolders.Count -gt 0) {
								Do {
									$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
									If (($hMSIMAPFolder.Name -match $HamFolders) -or ($hMSIMAPFolder.Name -match $SpamFolders)) {
										If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
											GetBayesSubFolders $hMSIMAPFolder
										} # IF SUBFOLDER COUNT > 0
										GetBayesMessages $hMSIMAPFolder
									} # IF FOLDERNAME MATCH REGEX
									Else {
										GetBayesMatchFolders $hMSIMAPFolder
									} # IF NOT FOLDERNAME MATCH REGEX
								$IterateIMAPFolders++
								} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
							} # IF IMAPFOLDER COUNT > 0
						} #IF ACCOUNT ACTIVE
						$IterateAccounts++
					} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
				} # IF ACCOUNT COUNT > 0
			} # IF DOMAIN ACTIVE
			$IterateDomains++
		} Until ($IterateDomains -eq $hMS.Domains.Count)
	} # IF DOMAIN COUNT > 0

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	Debug "----------------------------"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Error$(Plural $HamFedMessageErrors) present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalHamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
		Email "[OK] Bayes HAM learn from $LearnedHamMessages of $TotalHamFedMessages message$(Plural $TotalHamFedMessages)"
	} Else {
		Debug "No HAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
	}

	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Error$(Plural $SpamFedMessageErrors) present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalSpamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
		Email "[OK] Bayes SPAM learn from $LearnedSpamMessages of $TotalSpamFedMessages message$(Plural $TotalSpamFedMessages)"
	} Else {
		Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
	}

	Debug "----------------------------"
	Try {
		& cmd /c "`"$SADir\sa-learn.exe`" --backup > `"$BayesBackupLocation`""
		If ((Get-Item -Path $BayesBackupLocation).LastWriteTime -lt ((Get-Date).AddSeconds(-30))) {
			Throw "Unknown Error backing up Bayes database"
		}
		Debug "Successfully backed up Bayes database"
	}
	Catch {
		$Err = $Error[0]
		Debug "[ERROR] backing up Bayes : $Err"
	}
}

FeedBayes

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2020-11-24 14:08

I guess nothing's ever done. PULL THE FORK! PULL THE FORK!!! :mrgreen:

I realized I missed something - the ability to skip domains or user accounts if wanted.

Code: Select all

<#

.SYNOPSIS
	Feed Bayes Database

.DESCRIPTION
	Feeds messages to SPAMC for Bayes spam/ham learning

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, feeds messages to spamc for learning

.PARAMETER 

	
.NOTES
	Add "--allow-tell" argument to your SPAMD service to allow SPAMC to report SPAM/HAM
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = "secretpassword" # hMailServer Admin password
$DoSpamC               = $False           # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham
$BayesSubFolders       = $True            # True will feed messages from regex name matching subfolders
$HamFolders            = "Inbox|Ham"      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = "Spam|Junk"      # Spam folders to feed messages to spamC for bayes database - uses regex
$SkipAccountBayes      = "user@domain1.tld|spam@domain2.tld|dmarc@domain3.tld" # User accounts to skip - uses regex - If not used, leave blank (not "") or it will match EVERYTHING!
$SkipDomainBayes       = "domain.tld"     # Domains to skip - uses regex - If not used, leave blank (not "") or it will match EVERYTHING!
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = "C:\Program Files\JAM Software\SpamAssassin for Windows"  # SpamAssassin Install Directory
$BayesBackupLocation   = "X:\sa-learn\.spamassassin\bayes_backup"  # Bayes backup file

<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}
Function Plural ($Integer) {
	If ($Integer -eq 1) {$S = ""} Else {$S = "s"}
	Return $S
}

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($BayesSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match $HamFolders) -or ($SubFolderName -match $SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	$LearnedHamMessagesFolder = 0
	$LearnedSpamMessagesFolder = 0
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match $HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match $SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedHamMessages++
						$LearnedHamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFedMessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $($hMSAccount.Address)"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedSpamMessages++
						$LearnedSpamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $($hMSAccount.Address)"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedHamMessagesFolder of $HamFedMessages HAM message$(Plural $HamFedMessages) fed from $($Folder.Name) in $($hMSAccount.Address)"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedSpamMessagesFolder of $SpamFedMessages SPAM message$(Plural $SpamFedMessages) fed from $($Folder.Name) in $($hMSAccount.Address)"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin learning Bayes tokens from messages newer than $BayesDays days"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	$IterateDomains = 0
	If ($hMS.Domains.Count -gt 0) {
		Do {
			$hMSDomain = $hMS.Domains.Item($IterateDomains)
			If (($hMSDomain.Active) -and ($hMSDomain.Name -notmatch $SkipDomainBayes)) {
				$IterateAccounts = 0
				If ($hMSDomain.Accounts.Count -gt 0) {
					Do {
						$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
						If (($hMSAccount.Active) -and ($hMSAccount.Address -notmatch $SkipAccountBayes)) {
							$IterateIMAPFolders = 0
							If ($hMSAccount.IMAPFolders.Count -gt 0) {
								Do {
									$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
									If (($hMSIMAPFolder.Name -match $HamFolders) -or ($hMSIMAPFolder.Name -match $SpamFolders)) {
										If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
											GetBayesSubFolders $hMSIMAPFolder
										} # IF SUBFOLDER COUNT > 0
										GetBayesMessages $hMSIMAPFolder
									} # IF FOLDERNAME MATCH REGEX
									Else {
										GetBayesMatchFolders $hMSIMAPFolder
									} # IF NOT FOLDERNAME MATCH REGEX
								$IterateIMAPFolders++
								} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
							} # IF IMAPFOLDER COUNT > 0
						} #IF ACCOUNT ACTIVE
						$IterateAccounts++
					} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
				} # IF ACCOUNT COUNT > 0
			} # IF DOMAIN ACTIVE
			$IterateDomains++
		} Until ($IterateDomains -eq $hMS.Domains.Count)
	} # IF DOMAIN COUNT > 0

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	Debug "----------------------------"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Error$(Plural $HamFedMessageErrors) present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalHamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
		Email "[OK] Bayes HAM learn from $LearnedHamMessages of $TotalHamFedMessages message$(Plural $TotalHamFedMessages)"
	} Else {
		Debug "No HAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
	}

	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Error$(Plural $SpamFedMessageErrors) present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalSpamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
		Email "[OK] Bayes SPAM learn from $LearnedSpamMessages of $TotalSpamFedMessages message$(Plural $TotalSpamFedMessages)"
	} Else {
		Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
	}

	Debug "----------------------------"
	Try {
		& cmd /c "`"$SADir\sa-learn.exe`" --backup > `"$BayesBackupLocation`""
		If ((Get-Item -Path $BayesBackupLocation).LastWriteTime -lt ((Get-Date).AddSeconds(-30))) {
			Throw "Unknown Error backing up Bayes database"
		}
		Debug "Successfully backed up Bayes database"
	}
	Catch {
		$Err = $Error[0]
		Debug "[ERROR] backing up Bayes : $Err"
	}
}

FeedBayes

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2020-11-24 14:42

Well, you know what they say :mrgreen:

Image
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

User avatar
johang
Senior user
Senior user
Posts: 398
Joined: 2008-09-01 09:20

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by johang » 2020-11-24 19:23

SorenR wrote:
2020-11-24 14:42
Well, you know what they say :mrgreen:

Image
it aint over until Madonna sings ?
___________________________________________________________end of the line

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2021-01-03 17:58

I'm having trouble wrapping my head around bayes_journal. Since I started using spamC, I've been using it exclusively for bayes learning. On a whim I synced bayes_journal and there were tokens to sync!

Code: Select all

bayes: synced databases from journal in 0 seconds: 893 unique entries (900 total entries)
Not only that but it keeps growing. Yesterday it read 150 entries. I synced again and got no output, so I assume it only reports when there are entries to sync, and those previous 150 were synced into the database.

So now I'm more confused than ever. I added journal syncing to my script just because better safe than sorry, although at this point, I don't even know if it does anything at all.

Code: Select all

<#

.SYNOPSIS
	Feed Bayes Database

.DESCRIPTION
	Feeds messages to SPAMC for Bayes spam/ham learning

.FUNCTIONALITY
	Looks for folder name match at any folder level and if found, feeds messages to spamc for learning

.PARAMETER 

	
.NOTES
	Add "--allow-tell" argument to your SPAMD service to allow SPAMC to report SPAM/HAM
	
.EXAMPLE


#>

<###   USER VARIABLES   ###>
$hMSAdminPass          = "secretpassword" # hMailServer Admin password
$DoSpamC               = $True           # FOR TESTING - set to false to run and report results without feeding SpamC with spam/ham
$BayesSubFolders       = $True            # True will feed messages from regex name matching subfolders
$HamFolders            = "Inbox|Ham"      # Ham folders to feed messages to spamC for bayes database - uses regex
$SpamFolders           = "Spam|Junk"      # Spam folders to feed messages to spamC for bayes database - uses regex
$SkipAccountBayes      =                  # User accounts to skip - uses regex - If not used, leave blank (not "") or it will match EVERYTHING!
$SkipDomainBayes       =                  # Domains to skip - uses regex - If not used, leave blank (not "") or it will match EVERYTHING!
$BayesDays             = 7                # Number of days worth of spam/ham to feed to bayes
$SADir                 = "C:\Program Files\JAM Software\SpamAssassin for Windows"  # SpamAssassin Install Directory
$BayesBackupLocation   = "X:\sa-learn\bayes\bayes_backup"  # Bayes backup file

<###   START SCRIPT   ###>

<#  Functions copied from hMailServer Backup required for testing  #>
Function Debug ($DebugOutput) {Write-Host $DebugOutput}
Function Email ($DebugOutput) {}
Function ElapsedTime ($EndTime) {
	$TimeSpan = New-Timespan $EndTime
	If (([int]($TimeSpan).Hours) -eq 0) {$Hours = ""} ElseIf (([int]($TimeSpan).Hours) -eq 1) {$Hours = "1 hour "} Else {$Hours = "$([int]($TimeSpan).Hours) hours "}
	If (([int]($TimeSpan).Minutes) -eq 0) {$Minutes = ""} ElseIf (([int]($TimeSpan).Minutes) -eq 1) {$Minutes = "1 minute "} Else {$Minutes = "$([int]($TimeSpan).Minutes) minutes "}
	If (([int]($TimeSpan).Seconds) -eq 1) {$Seconds = "1 second"} Else {$Seconds = "$([int]($TimeSpan).Seconds) seconds"}
	If (($TimeSpan).TotalSeconds -lt 1) {
		$Return = "less than 1 second"
	} Else {
		$Return = "$Hours$Minutes$Seconds"
	}
	Return $Return
}
Function Plural ($Integer) {
	If ($Integer -eq 1) {$S = ""} Else {$S = "s"}
	Return $S
}

<#  Set Bayes variables  #>
Set-Variable -Name TotalHamFedMessages -Value 0 -Option AllScope
Set-Variable -Name TotalSpamFedMessages -Value 0 -Option AllScope
Set-Variable -Name HamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name SpamFedMessageErrors -Value 0 -Option AllScope
Set-Variable -Name LearnedHamMessages -Value 0 -Option AllScope
Set-Variable -Name LearnedSpamMessages -Value 0 -Option AllScope

Function GetBayesSubFolders ($Folder) {
	$IterateFolder = 0
	$ArrayBayesMessages = @()
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			$SubFolderID = $SubFolder.ID
			If ($SubFolder.Subfolders.Count -gt 0) {GetBayesSubFolders $SubFolder} 
			If ($SubFolder.Messages.Count -gt 0) {
				If ($BayesSubFolders) {GetBayesMessages $SubFolder}
			} 
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
	$ArrayBayesMessages.Clear()
}

Function GetBayesMatchFolders ($Folder) {
	$IterateFolder = 0
	If ($Folder.SubFolders.Count -gt 0) {
		Do {
			$SubFolder = $Folder.SubFolders.Item($IterateFolder)
			$SubFolderName = $SubFolder.Name
			If (($SubFolderName -match $HamFolders) -or ($SubFolderName -match $SpamFolders)) {
				GetBayesSubFolders $SubFolder
				GetBayesMessages $SubFolder
			} Else {
				GetBayesMatchFolders $SubFolder
			}
			$IterateFolder++
		} Until ($IterateFolder -eq $Folder.SubFolders.Count)
	}
}

Function GetBayesMessages ($Folder) {
	$IterateMessage = 0
	$ArrayHamToFeed = @()
	$ArraySpamToFeed = @()
	$HamFedMessages = 0
	$SpamFedMessages = 0
	$LearnedHamMessagesFolder = 0
	$LearnedSpamMessagesFolder = 0
	If ($Folder.Messages.Count -gt 0) {
		If ($Folder.Name -match $HamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArrayHamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
		If ($Folder.Name -match $SpamFolders) {
			Do {
				$Message = $Folder.Messages.Item($IterateMessage)
				If ($Message.InternalDate -gt ((Get-Date).AddDays(-$BayesDays))) {
					$ArraySpamToFeed += $Message.FileName
				}
				$IterateMessage++
			} Until ($IterateMessage -eq $Folder.Messages.Count)
		}
	}
	$ArrayHamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L ham < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedHamMessages++
						$LearnedHamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$HamFedMessages++
				$TotalHamFedMessages++
			}
		}
		Catch {
			$HamFedMessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding HAM message $FileName in $($hMSAccount.Address)"
			Debug "[ERROR] $Err"
		}
	}
	$ArraySpamToFeed | ForEach {
		$FileName = $_
		Try {
			If ((Get-Item $FileName).Length -lt 512000) {
				If ($DoSpamC) {
					$SpamC = & cmd /c "`"$SADir\spamc.exe`" -d `"$SAHost`" -p `"$SAPort`" -L spam < `"$FileName`""
					$SpamCResult = Out-String -InputObject $SpamC
					If ($SpamCResult -match "Message successfully un/learned") {
						$LearnedSpamMessages++
						$LearnedSpamMessagesFolder++
					}
					If (($SpamCResult -notmatch "Message successfully un/learned") -and ($SpamCResult -notmatch "Message was already un/learned")) {
						Throw $SpamCResult
					}
				}
				$SpamFedMessages++
				$TotalSpamFedMessages++
			}
		}
		Catch {
			$SpamFed0MessageErrors++
			$Err = $Error[0]
			Debug "[ERROR] Feeding SPAM message $FileName in $($hMSAccount.Address)"
			Debug "[ERROR] $Err"
		}
	}
	If ($HamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedHamMessagesFolder of $HamFedMessages HAM message$(Plural $HamFedMessages) fed from $($Folder.Name) in $($hMSAccount.Address)"
	}
	If ($SpamFedMessages -gt 0) {
		Debug "Learned tokens from $LearnedSpamMessagesFolder of $SpamFedMessages SPAM message$(Plural $SpamFedMessages) fed from $($Folder.Name) in $($hMSAccount.Address)"
	}
	$ArraySpamToFeed.Clear()
}

Function FeedBayes {
	
	$Error.Clear()
	
	$BeginFeedingBayes = Get-Date
	Debug "----------------------------"
	Debug "Begin learning Bayes tokens from messages newer than $BayesDays days"
	If (-not($DoSpamC)) {
		Debug "SpamC disabled - Test Run ONLY"
	}

	<#  Authenticate hMailServer COM  #>
	$hMS = New-Object -COMObject hMailServer.Application
	$hMS.Authenticate("Administrator", $hMSAdminPass) | Out-Null
	
	$SAHost = $hMS.Settings.AntiSpam.SpamAssassinHost
	$SAPort = $hMS.Settings.AntiSpam.SpamAssassinPort
	
	If ($hMS.Domains.Count -gt 0) {
		$IterateDomains = 0
		Do {
			$hMSDomain = $hMS.Domains.Item($IterateDomains)
			If (($hMSDomain.Active) -and ($hMSDomain.Name -notmatch $SkipDomainBayes) -and ($hMSDomain.Accounts.Count -gt 0)) {
				$IterateAccounts = 0
				Do {
					$hMSAccount = $hMSDomain.Accounts.Item($IterateAccounts)
					If (($hMSAccount.Active) -and ($hMSAccount.Address -notmatch $SkipAccountBayes) -and ($hMSAccount.IMAPFolders.Count -gt 0)) {
						$IterateIMAPFolders = 0
						Do {
							$hMSIMAPFolder = $hMSAccount.IMAPFolders.Item($IterateIMAPFolders)
							If (($hMSIMAPFolder.Name -match $HamFolders) -or ($hMSIMAPFolder.Name -match $SpamFolders)) {
								If ($hMSIMAPFolder.SubFolders.Count -gt 0) {
									GetBayesSubFolders $hMSIMAPFolder
								}
								GetBayesMessages $hMSIMAPFolder
							}
							Else {
								GetBayesMatchFolders $hMSIMAPFolder
							}
						$IterateIMAPFolders++
						} Until ($IterateIMAPFolders -eq $hMSAccount.IMAPFolders.Count)
					}
					$IterateAccounts++
				} Until ($IterateAccounts -eq $hMSDomain.Accounts.Count)
			}
			$IterateDomains++
		} Until ($IterateDomains -eq $hMS.Domains.Count)
	}

	Debug "----------------------------"
	Debug "Finished feeding $($TotalHamFedMessages + $TotalSpamFedMessages) messages to Bayes in $(ElapsedTime $BeginFeedingBayes)"
	Debug "----------------------------"
	
	If ($HamFedMessageErrors -gt 0) {
		Debug "Errors feeding HAM to SpamC : $HamFedMessageErrors Error$(Plural $HamFedMessageErrors) present"
		Email "[ERROR] HAM SpamC : $HamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalHamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedHamMessages of $TotalHamFedMessages HAM message$(Plural $TotalHamFedMessages) found"
		Email "[OK] Bayes HAM learn from $LearnedHamMessages of $TotalHamFedMessages message$(Plural $TotalHamFedMessages)"
	} Else {
		Debug "No HAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No HAM messages older than $BayesDays days to feed to Bayes"
	}

	If ($SpamFedMessageErrors -gt 0) {
		Debug "Errors feeding SPAM to SpamC : $SpamFedMessageErrors Error$(Plural $SpamFedMessageErrors) present"
		Email "[ERROR] SPAM SpamC : $SpamFedMessageErrors Errors present : Check debug log"
	}
	If ($TotalSpamFedMessages -gt 0) {
		Debug "Bayes learned from $LearnedSpamMessages of $TotalSpamFedMessages SPAM message$(Plural $TotalSpamFedMessages) found"
		Email "[OK] Bayes SPAM learn from $LearnedSpamMessages of $TotalSpamFedMessages message$(Plural $TotalSpamFedMessages)"
	} Else {
		Debug "No SPAM messages older than $BayesDays days to feed to Bayes"
		Email "[OK] No SPAM messages older than $BayesDays days to feed to Bayes"
	}

	Debug "----------------------------"
	Try {
		$BayesSync = & cmd /c "`"$SADir\sa-learn.exe`" --sync"
		$BayesSyncResult = Out-String -InputObject $BayesSync
		If ([string]::IsNullOrEmpty($BayesSyncResult)) {
			Throw "Nothing to sync"
		}
		Debug $BayesSyncResult
	}
	Catch {
		Debug "[ERROR] Bayes Journal Sync: $($Error[0])"
	}

	Debug "----------------------------"
	Try {
		If (-not(Test-Path $BayesBackupLocation)) {
			Throw "Bayes backup file does not exist - Check Path"
		} Else { 
			& cmd /c "`"$SADir\sa-learn.exe`" --backup > `"$BayesBackupLocation`""
			If ((Get-Item -Path $BayesBackupLocation).LastWriteTime -lt ((Get-Date).AddSeconds(-30))) {
				Throw "Unknown Error backing up Bayes database"
			}
			Debug "Successfully backed up Bayes database"
		}
	}
	Catch {
		Debug "[ERROR] backing up Bayes : $($Error[0])"
		Email "[ERROR] backing up Bayes db"
	}
}

FeedBayes
So the question is - is journal syncing required with spamC or not?

User avatar
SorenR
Senior user
Senior user
Posts: 4058
Joined: 2006-08-21 15:38
Location: Denmark

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by SorenR » 2021-01-03 18:53

http://spamassassin.1065346.n5.nabble.c ... 55419.html
David C. McCall wrote:
> DOH! I didn't include --sync in periodic sa-learn runs....
>
>
> slaps his forehead and returns into cave.
>
>
> :-(
>

You shouldn't need --sync, unless you want to force the journal to be
synced and deleted when you run sa-learn. In general it will decide if
it needs to be synced on its own. It's also redundant if you have
--force-expire, as SA always syncs the journal prior to doing expiry.


In general, the journal disappearing is normal. It's just a "holding
tank" for atime updates (and tokens if you have learn to journal
enabled), It periodically gets dumped into the main database and deleted
during the expiry checks.

So, don't be concerned about it disappearing, that just means it's been
synced and hasn't been recreated by mail scanning.
SørenR.

Algorithm (noun.)
Word used by programmers when they do not want to explain what they did.

palinka
Senior user
Senior user
Posts: 2333
Joined: 2017-09-12 17:57

Re: SpamAssassin Bootcamp (sa-learn) train BAYES

Post by palinka » 2021-01-03 22:17

SorenR wrote:
2021-01-03 18:53
http://spamassassin.1065346.n5.nabble.c ... 55419.html
David C. McCall wrote:
> DOH! I didn't include --sync in periodic sa-learn runs....
>
>
> slaps his forehead and returns into cave.
>
>
> :-(
>

You shouldn't need --sync, unless you want to force the journal to be
synced and deleted when you run sa-learn. In general it will decide if
it needs to be synced on its own. It's also redundant if you have
--force-expire, as SA always syncs the journal prior to doing expiry.


In general, the journal disappearing is normal. It's just a "holding
tank" for atime updates (and tokens if you have learn to journal
enabled), It periodically gets dumped into the main database and deleted
during the expiry checks.

So, don't be concerned about it disappearing, that just means it's been
synced and hasn't been recreated by mail scanning.
👍

OK, so if its optional, I suppose that also means that syncing can't hurt anything - at a minimum - or could be beneficial if done more often than whenever the automatic syncing occurs. "Don't be concerned" sounds a lot different than "DON'T PUSH THAT BIG RED BUTTON!!!!!"

I will leave it in my script for the time being, seeing as how I took the time to write it. :mrgreen:

Post Reply