Page 1 of 1

Need help creating rule

Posted: 2022-05-15 23:59
by palinka
As part of my pURI-BL project, I want to create a custom ruleset. I can write basic rules, but my issue here is how that I'm looking to test body against hundreds or thousands of URLs, so its *probably* too long to create a regex string from all the URLs.

Ideally, a custom plugin would be better so the ruleset could be created using one URL per line, but I'm totally lost with perl.

Any hints or ideas?

Re: Need help creating rule

Posted: 2022-05-16 09:39
by katip
i would suggest to test the body prior to delivery.
to avoid performance issues just first 100 lines might be sufficient i think.
oMessage.Body (HTMLBody) is a string and i suppose it can be VBS Split by VbCrLf. you RegEx-check first 100 lines, if any contains an URL (see below for a good pattern) extract and push it into an array, then lookup each item in your DB table.

Code: Select all

(("[^<>@\\]+")|([^<> @\"]+))@(\[([0-9]{1,3}\.){3}[0-9]{1,3}\]|(?=.{1,255}$)((?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9])(|\.(?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9]){1,126})
according to the result (bad URL found or not) you add a header such as X-HMS-BadURL = True
then you can play with rules as you like based on existence of this header..
just throwing out of my head :)

Re: Need help creating rule

Posted: 2022-05-16 10:29
by katip
sorry for nonsense pattern. here a good one:
https://regexr.com/39nr7

Re: Need help creating rule

Posted: 2022-05-16 12:43
by palinka
katip wrote:
2022-05-16 09:39
i would suggest to test the body prior to delivery.
to avoid performance issues just first 100 lines might be sufficient i think.
oMessage.Body (HTMLBody) is a string and i suppose it can be VBS Split by VbCrLf. you RegEx-check first 100 lines, if any contains an URL (see below for a good pattern) extract and push it into an array, then lookup each item in your DB table.

Code: Select all

(("[^<>@\\]+")|([^<> @\"]+))@(\[([0-9]{1,3}\.){3}[0-9]{1,3}\]|(?=.{1,255}$)((?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9])(|\.(?!-|\.)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9]){1,126})
according to the result (bad URL found or not) you add a header such as X-HMS-BadURL = True
then you can play with rules as you like based on existence of this header..
just throwing out of my head :)
Good idea. I already split the body at "</head>" when it exists so I don't pick up things like w3.org and style urls.