Expand Shortened URLs

Use this forum if you have problems with a hMailServer script, such as hMailServer WebAdmin or code in an event handler.
Post Reply
palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Expand Shortened URLs

Post by palinka » 2024-04-04 22:21

I've been thinking about this for a while and finally set out to do it. Expand shortened URLs in messages and test the domain against SURBL and Spamhaus DBL. I have a working framework. I still need to compile a list of common URL shorteners to use to capture these short links within messages. But I'll get to that later. Below is my working test script.

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

'Requires RvdH DNSResolver
Function IsInSurbl(strDomain) : IsInSurbl = False 
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver")
		strLookup = .DNSLookup(strDomain & ".multi.surbl.org")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.[0-9]{1,3})"
	IsInSurbl = Lookup(strRegEx, strLookup)
End Function

'Requires RvdH DNSResolver
'Requires Spamhaus DQS Key
Function IsInSpamHausDBL(strDomain) : IsInSpamHausDBL = False
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver") 
		strLookup = .DNSLookup(strDomain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.(2|4|5|6))"
	IsInSpamHausDBL = Lookup(strRegEx, strLookup)
End Function

Function ExpandURL(ShortURL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP")
	Request.Open "HEAD", ShortURL, False 
	Request.Send
	ExpandURL = Request.GetOption(-1)
End Function

Dim ShortURL, LongURL, Domain, Match, Matches

ShortURL = "https://tinyurl.com/y9b46am6"
LongURL = ExpandURL(ShortURL)

Set Matches = oLookup("(?:\bhttps?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+\b)", LongURL, False)
If Matches.Count > 0 Then 
	For Each Match In Matches
		Domain = Match.SubMatches.Item(0)
	Next
End If

WScript.Echo Domain & " is the domain of the long URL"

If IsInSurbl(Domain) Then WScript.Echo Domain & " LISTED IN SURBL!!!"
If IsInSpamHausDBL(Domain) Then WScript.Echo Domain & " LISTED IN SPAMHAUS DBL!!!"


palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-04 22:25

Well, the list of URL shorteners was easy to find.

https://raw.githubusercontent.com/Peter ... aster/list

I'll make a function to find short links within message body using that list later.

User avatar
katip
Senior user
Senior user
Posts: 1176
Joined: 2006-12-22 07:58
Location: Istanbul

Re: Expand Shortened URLs

Post by katip » 2024-04-05 04:20

Good idea.
IIRC SA has a plugin for this.
Katip
--
HMS 5.7, MariaDB 10.4.10, SA 4.0.0, ClamAV 0.103.8

User avatar
RvdH
Senior user
Senior user
Posts: 3342
Joined: 2008-06-27 14:42
Location: The Netherlands

Re: Expand Shortened URLs

Post by RvdH » 2024-04-05 08:58

katip wrote:
2024-04-05 04:20
Good idea.
IIRC SA has a plugin for this.
Exactly, I use it with MySQL/MariaDB as a database backend to cache decoded entries (url_shortener entries are included in default rules and also via KAM_urlshorteners.cf)
https://spamassassin.apache.org/full/4. ... tURLs.html
CIDR to RegEx: d-fault.nl/cidrtoregex
DNS Lookup: d-fault.nl/dnstools
DKIM Generator: d-fault.nl/dkimgenerator
DNSBL Lookup: d-fault.nl/dnsbllookup
GEOIP Lookup: d-fault.nl/geoiplookup

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-05 15:12

Powershell to download list and format to vbs regex. Run once a day, once a year... I doubt the list changes often, but the last update to the github file was last week.

Code: Select all

<#  Set script variables  #>
# https://github.com/PeterDaveHello/url-shorteners
$URL = "https://raw.githubusercontent.com/PeterDaveHello/url-shorteners/master/list"
$ShortLinkFile = "$PSScriptRoot\ShortLinkProviders.txt"
$ShortLinkRegexFile = "C:\hMailServer\Events\short_link_provider_regex.vbs"

<#  Download Shortlink Provider List  #>
Try {
	Start-BitsTransfer -Source $URL -Destination $ShortLinkFile -ErrorAction Stop
}
Catch {
	$Err = $Error[0]
	Write-Output "$((Get-Date).ToString('yyyy-MM-dd HH:mm:ss')) : Error downloading Public Suffix List : `n$Err" | Out-File "$PSScriptRoot\PubSufError.log" -Append
	Exit
}

<#  Read data file and output list formatted for RegEx (surround each with ^ and $)  #>
Get-Content $ShortLinkFile | Where {((-not([string]::IsNullOrEmpty($_))) -and ($_ -notmatch "^#"))} | ForEach {
	Write-Output "^$_$"
} | Out-File $ShortLinkRegexFile

<#  Convert list to RegEx pattern  #>
(Get-Content -Path $ShortLinkRegexFile) -Replace '$','|' | Set-Content -NoNewline -Path $ShortLinkRegexFile
(Get-Content -Path $ShortLinkRegexFile) -Replace '\.','\.' | Set-Content -NoNewline -Path $ShortLinkRegexFile
(Get-Content -Path $ShortLinkRegexFile) -Replace '^','Dim ShortLinkProviderRegEx : ShortLinkProviderRegEx = "' | Set-Content -NoNewline -Path $ShortLinkRegexFile
(Get-Content -Path $ShortLinkRegexFile) -Replace '\|$','' | Set-Content -NoNewline -Path $ShortLinkRegexFile
(Get-Content -Path $ShortLinkRegexFile) -Replace '$','"' | Set-Content -NoNewline -Path $ShortLinkRegexFile

Working vbs test script.

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"
Private Const ShortLinkRegExPath = "C:\hMailServer\Events\short_link_provider_regex.vbs"

Function Include(sInstFile)
	Dim f, s, oFSO
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	On Error Resume Next
	If oFSO.FileExists(sInstFile) Then
		Set f = oFSO.OpenTextFile(sInstFile)
		s = f.ReadAll
		f.Close
		ExecuteGlobal s
	End If
	On Error Goto 0
	Set f = Nothing
	Set oFSO = Nothing
End Function

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

'Requires RvdH DNSResolver
Function IsInSurbl(strDomain) : IsInSurbl = False 
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver")
		strLookup = .DNSLookup(strDomain & ".multi.surbl.org")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.[0-9]{1,3})"
	IsInSurbl = Lookup(strRegEx, strLookup)
End Function

'Requires RvdH DNSResolver
'Requires Spamhaus DQS Key
Function IsInSpamHausDBL(strDomain) : IsInSpamHausDBL = False
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver") 
		strLookup = .DNSLookup(strDomain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.(2|4|5|6))"
	IsInSpamHausDBL = Lookup(strRegEx, strLookup)
End Function

Function ExpandURL(ShortURL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP")
	Request.Open "HEAD", ShortURL, False 
	Request.Send
	ExpandURL = Request.GetOption(-1)
End Function

Include(ShortLinkRegExPath)

Dim ShortURL, LongURL, Domain, Match, Matches, SLMatch, SLMatches, UMatch, UMatches, MessageBody, strRegEx, URLToInspect, ShortLinkDomainToInspect

MessageBody = 	"Howdy, y'all! Right here's a message with some URLs." & _
				"Here's a bony fide short link https://tinyurl.com/y4aa47p4 right in the middle of the message." & _
				"And here's a long link https://hmailserver.com/forum/viewtopic.php?f=9&t=41891 to use as a control." & _
				"And here's a short link embedded in html <a href=""https://shorturl.at/AHRX5"">Come 'n' git some spam!</a>" & _
				"And here's a long link embedded in html <a href=""https://hmailserver.com/forum/viewtopic.php?f=9&t=41891"">Expand Shortened URLs - hMailServer Forum</a>" & _
				"And lastly, just an extra line for fun."


strRegEx = "(\b((https?(:\/\/|%3A%2F%2F))((([a-zA-Z0-9-]+)\.)+[a-zA-Z0-9-]+)(:\d+)?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)\b)"
Set Matches = oLookup(strRegEx, MessageBody, True)
WScript.Echo Matches.Count & " URL matches found in message body"
If Matches.Count > 0 Then 
	For Each Match in Matches
		URLToInspect = Match.SubMatches.Item(0)
		WScript.Echo " "
		WScript.Echo "URL found in message body: " & URLToInspect

		Set UMatches = oLookup("(?:\bhttps?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+\b)", URLToInspect, False)
		If UMatches.Count > 0 Then 
			For Each UMatch in UMatches
				ShortLinkDomainToInspect = UMatch.SubMatches.Item(0)
			Next
		End If

		If Lookup(ShortLinkProviderRegEx, ShortLinkDomainToInspect) Then
			WScript.Echo "Shortlink found in message body: " & URLToInspect

			LongURL = ExpandURL(URLToInspect)
			WScript.Echo "Expanded URL: " & LongURL

			Set SLMatches = oLookup("(?:\bhttps?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+\b)", LongURL, False)
			If SLMatches.Count > 0 Then 
				WScript.Echo "Domain for expanded URL found"
				For Each SLMatch In SLMatches
					Domain = SLMatch.SubMatches.Item(0)
					WScript.Echo "Domain for expanded URL: " & Domain
					REM - 
					REM - Here is where you operate
					REM - 
					If IsInSurbl(Domain) Then WScript.Echo Domain & " LISTED IN SURBL!!!" Else WScript.Echo "not listed in SURBL"
					If IsInSpamHausDBL(Domain) Then WScript.Echo Domain & " LISTED IN SPAMHAUS DBL!!!" Else WScript.Echo "not listed in Spamhaus DBL"
					REM - 
					REM - Here is where you operate
					REM - 
				Next
			End If

		End If

	Next
End If
Tonight I'll integrate it into eventhandlers.vbs.

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-06 17:46

Looks like this is a nothing burger. I made a script that looks for shortlinks in my mysql message log. There are about 5,300 messages in the log including all spam received and deleted by spam rules.

1. Looks for every URL.

2. If URL domain matches the shortlink provider list, it expands the shortlink through a recursive expansion function in order to actually get the final destination - i noticed that there are often many hops and I think that's intentional.

3. If the final destination URL contains a referrer, the referrer URL is used instead of the destination URL.

4. The domain is parsed from the destination URL or referred URL and gets tested against SURBL and Spamhaus DBL.

Result: 0 hits from 5,300 messages containing a lot of spam.

I would guess (purely a guess based on observation, but not actually counting anything) that about half of the shortlinks were malicious spam links.

Maybe I'll run it again to check both destination URLs and referred URLs. I had a preference for referred URLs thinking that why go though all the trouble of making them if the referrer was the malicious link? But I'm not expecting anything different.

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-06 20:08

Whoops - had a stupid issue with my script. 35 links were listed: 32 in SURBL, 3 in Spamhaus DBL.

5 expanded shortlinks had redirects in the URL that were listed in SURBL (none in Spamhaus DBL).

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-07 09:11

I ran it again also inspecting the shortlink domain. Now there are 3 items being inspected: the domain of the shortlink, the domain of the expanded shortlink and the domain of redirects found in expanded shortlinks.

34 shortlink domain hits on SURBL (all for clck.ru)
0 shortlink domain hits on Spamhaus DBL
56 expanded shortlink domain hits on SURBL (some overlap with the shorlink domains - most of the clck.ru shortlinks expanded to clck.ru long links)
3 expanded shortlink domain hits on Spamhaus DBL
5 expanded shortlink redirect domain hits on SURBL
0 expanded shortlink redirect domain hits on Spamhaus DBL

That's 98 hits out of 5300 messages.

I'm not sure why there's a big jump from yesterday. Maybe SURBL added some domains overnight.

I also noticed something strange. Some bitly shortlinks were "long", like this:

Code: Select all

https://bitly.ws/xxxxx#cl/4910_md/2001/3662/436/31/1268829
- I don't know what that's all about. I tried goolaging it but since goolag is broken now thanks to DIE and TDS, nothing relevant came up.

But anyway, this looks like a worthwhile endeavor, so I think I'll continue building it for eventhandlers.vbs.
Last edited by palinka on 2024-04-07 09:33, edited 1 time in total.

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-07 09:33

Improved redirect capture regex added 3 more SURBL hits.

https://regex101.com/r/dF4olP/2

Code: Select all

/(?:\bhttps?:\/\/.+\?.+=https?(:\/\/|\%3A\%2F\%2F))([a-zA-Z0-9-.]+)(?:(\/|\%2F)[^\s\"]+\b)?/gm

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-07 19:08

Working test script to evaluate all URL domains, shortlink domains, expanded shortlink domains and redirects within expanded shortlink domains. Includes recursive shortlink URL expansion function that stops when the final redirect is found.

WARNING - 2 links below are actual spam. They could be malicious. Use caution.

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"
Private Const ShortLinkRegExPath = "C:\hMailServer\Events\short_link_provider_regex.vbs"

Function Include(sInstFile)
	Dim f, s, oFSO
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	On Error Resume Next
	If oFSO.FileExists(sInstFile) Then
		Set f = oFSO.OpenTextFile(sInstFile)
		s = f.ReadAll
		f.Close
		ExecuteGlobal s
	End If
	On Error Goto 0
	Set f = Nothing
	Set oFSO = Nothing
End Function

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

'Requires RvdH DNSResolver
Function IsInSurbl(strDomain) : IsInSurbl = False 
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver")
		strLookup = .DNSLookup(strDomain & ".multi.surbl.org")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.[0-9]{1,3})"
	IsInSurbl = Lookup(strRegEx, strLookup)
End Function

'Requires RvdH DNSResolver
'Requires Spamhaus DQS Key
Function IsInSpamHausDBL(strDomain) : IsInSpamHausDBL = False
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver") 
		strLookup = .DNSLookup(strDomain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.(2|4|5|6))"
	IsInSpamHausDBL = Lookup(strRegEx, strLookup)
End Function

Function ExpandURL(ShortURL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP")
	Request.Open "HEAD", ShortURL, False 
	Request.Send
	ExpandURL = Request.GetOption(-1)
End Function

Function GetToRootURL(URL)
	Dim Link, Match, Matches, LinkDomain
	Include(ShortLinkRegExPath)
	Link = ExpandURL(URL)
	WScript.Echo "URL: " & URL
	WScript.Echo "Link: " & Link
	If Link = URL Then
		GetToRootURL = URL
	Else 
		Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", Link, False)
		If Matches.Count > 0 Then 
			For Each Match in Matches
				LinkDomain = Match.SubMatches.Item(0)
			Next
		End If
		If Lookup(ShortLinkProviderRegEx, LinkDomain) Then
			GetToRootURL = GetToRootURL(Link)
		Else
			GetToRootURL = Link
		End If
	End If
End Function

Include(ShortLinkRegExPath)

Dim ShortURL, LongURL, Domain, URL, URLsFound, SLMatch, SLMatches, UMatch, UMatches, MessageBody, URLFinderRegEx, ExtractDomainFromURLRegEx, RedirectURLFinderRegEx, MsgURL, MsgURLDomain

URLFinderRegEx = "(\b((https?(:\/\/|%3A%2F%2F))((([a-zA-Z0-9-]+)\.)+[a-zA-Z0-9-]+)(:\d+)?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)\b)"
ExtractDomainFromURLRegEx = "(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?"
RedirectURLFinderRegEx = "(?:\bhttps?:\/\/.+\?.+=https?(:\/\/|\%3A\%2F\%2F))([a-zA-Z0-9-.]+)(?:(\/|\%2F)[^\s\""]+\b)?"

MessageBody = 	"Howdy, y'all! Right here's a message with some URLs." & _
				"Here's a bony fide short link https://t.co/Ej4jsWGIuY right in the middle of the message." & _
				"Here's another: https://clck.ru/39CGSZ - by the way these are real spam - exercise caution!!" & _
				"And here's a long link https://hmailserver.com/forum/viewtopic.php?f=9&t=41891 to use as a control." & _
				"And here's a short link embedded in html <a href=""https://shorturl.at/AHRX5"">Come 'n' git some spam!</a>" & _
				"And here's a long link embedded in html <a href=""https://hmailserver.com/forum/viewtopic.php?f=9&t=41891"">Expand Shortened URLs - hMailServer Forum</a>" & _
				"And lastly, just an extra line for fun."

REM - Find URLs in message body
Set URLsFound = oLookup(URLFinderRegEx, MessageBody, True)
WScript.Echo URLsFound.Count & " URL matches found in message body"
If URLsFound.Count > 0 Then 
	For Each URL in URLsFound
		MsgURL = URL.SubMatches.Item(0)
		WScript.Echo " "
		WScript.Echo "URL found in message body: " & MsgURL


		REM - If URLS found then get domain from each URL
		Set UMatches = oLookup(ExtractDomainFromURLRegEx, MsgURL, False)
		If UMatches.Count > 0 Then 
			For Each UMatch in UMatches
				MsgURLDomain = UMatch.SubMatches.Item(0)
			Next
		End If
		
		REM - Might as well test it as long as we're here
		If IsInSurbl(MsgURLDomain) Then WScript.Echo MsgURLDomain & " LISTED IN SURBL!!!" Else WScript.Echo "not listed in SURBL"
		If IsInSpamHausDBL(MsgURLDomain) Then WScript.Echo MsgURLDomain & " LISTED IN SPAMHAUS DBL!!!" Else WScript.Echo "not listed in Spamhaus DBL"

		REM - Test URL domain against shortlink provider list
		If Lookup(ShortLinkProviderRegEx, MsgURLDomain) Then
			WScript.Echo "Shortlink found in message body: " & MsgURL

			REM - If URL is a shortlink then expand it
			LongURL = GetToRootURL(MsgURL)
			WScript.Echo "Expanded URL: " & LongURL

			REM - Get domain from expanded shortlink and test it against SURBL and Spamhaus DBL
			Set SLMatches = oLookup(ExtractDomainFromURLRegEx, LongURL, False)
			If SLMatches.Count > 0 Then 
				WScript.Echo "Domain for expanded URL found"
				For Each SLMatch In SLMatches
					Domain = SLMatch.SubMatches.Item(0)
					WScript.Echo "Domain for expanded URL: " & Domain
					REM - 
					REM - Here is where you operate
					REM - 
					If IsInSurbl(Domain) Then WScript.Echo Domain & " LISTED IN SURBL!!!" Else WScript.Echo "not listed in SURBL"
					If IsInSpamHausDBL(Domain) Then WScript.Echo Domain & " LISTED IN SPAMHAUS DBL!!!" Else WScript.Echo "not listed in Spamhaus DBL"
					REM - 
					REM - Here is where you operate
					REM - 
				Next
			End If
			
			REM - Now lets look for redirects in the expanded shortlink and test them as well
			Set SLMatches = oLookup(RedirectURLFinderRegEx, LongURL, False)
			If SLMatches.Count > 0 Then 
				WScript.Echo "Redirect found in expanded URL"
				For Each SLMatch In SLMatches
					Domain = SLMatch.SubMatches.Item(1)
					WScript.Echo "Domain for expanded URL: " & Domain
					REM - 
					REM - Here is where you operate
					REM - 
					If IsInSurbl(Domain) Then WScript.Echo Domain & " LISTED IN SURBL!!!" Else WScript.Echo "not listed in SURBL"
					If IsInSpamHausDBL(Domain) Then WScript.Echo Domain & " LISTED IN SPAMHAUS DBL!!!" Else WScript.Echo "not listed in Spamhaus DBL"
					REM - 
					REM - Here is where you operate
					REM - 
				Next
			End If

		End If

	Next
End If
			

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-04-15 17:31

Here's eventhandlers.vbs script for this. I've been running it a few days. No errors. Found a few shortlinks but so far no hits on SURBL or Spamhaus DBL.

Use the powershell script above to create "short_link_provider_regex.vbs".

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"
Private Const ShortLinkRegExPath = "C:\hMailServer\Events\short_link_provider_regex.vbs"

Function Include(sInstFile)
	Dim f, s, oFSO
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	On Error Resume Next
	If oFSO.FileExists(sInstFile) Then
		Set f = oFSO.OpenTextFile(sInstFile)
		s = f.ReadAll
		f.Close
		ExecuteGlobal s
	End If
	On Error Goto 0
	Set f = Nothing
	Set oFSO = Nothing
End Function

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

'Requires RvdH DNSResolver
Function IsInSurbl(strDomain) : IsInSurbl = False 
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver")
		strLookup = .DNSLookup(strDomain & ".multi.surbl.org")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.[0-9]{1,3})"
	IsInSurbl = Lookup(strRegEx, strLookup)
End Function

'Requires RvdH DNSResolver
'Requires Spamhaus DQS Key
Function IsInSpamHausDBL(strDomain) : IsInSpamHausDBL = False
	Dim strLookup
	With CreateObject("DNSLibrary.DNSResolver") 
		strLookup = .DNSLookup(strDomain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net")
	End With
	Dim strRegEx : strRegEx = "(127\.0\.1\.(2|4|5|6))"
	IsInSpamHausDBL = Lookup(strRegEx, strLookup)
End Function

Function ExpandURL(ShortURL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP.6.0")
	Request.Open "HEAD", ShortURL, False 
	On Error Resume Next
	Request.Send
	If Err.Number <> 0 Then
		EventLog.Write("--------------------------------")
		EventLog.Write("Error       : ExpandURL Function could not expand shortlink")
		EventLog.Write("Domain      : " & ShortURL)
		EventLog.Write("Description : " & Err.Description)
		ExpandURL = ShortURL
	Else
		ExpandURL = Request.GetOption(-1)
	End If
	On Error GoTo 0
End Function

Function GetToRootURL(URL)
	Dim Link, Match, Matches, LinkDomain
	Include(ShortLinkRegExPath)
	Link = ExpandURL(URL)
	If Link = URL Then
		GetToRootURL = URL
	Else 
		Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", Link, False)
		If Matches.Count > 0 Then 
			For Each Match in Matches
				LinkDomain = Match.SubMatches.Item(0)
			Next
		End If
		If Lookup(ShortLinkProviderRegEx, LinkDomain) Then
			GetToRootURL = GetToRootURL(Link)
		Else
			GetToRootURL = Link
		End If
	End If
End Function

Sub OnAcceptMessage(oClient, oMessage)

	Dim strMsgSLBody, LongURL, Domain, URL, URLsFound, Match, Matches, UMatch, UMatches, RedirectMatch, RedirectMatches, URLFinderRegEx, ExtractDomainFromURLRegEx, RedirectURLFinderRegEx, MsgURL, MsgURLDomain
	
	REM - Convert message body to text string
	If oMessage.HTMLBody = Empty Then
		strMsgSLBody = oMessage.Body
	Else
		strMsgSLBody = oMessage.HTMLBody
	End If

	REM - Test Expanded Shortlinks
	Include(ShortLinkRegExPath)

	URLFinderRegEx = "(\b((https?(:\/\/|%3A%2F%2F))((([a-zA-Z0-9-]+)\.)+[a-zA-Z0-9-]+)(:\d+)?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)\b)"
	ExtractDomainFromURLRegEx = "(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?"
	RedirectURLFinderRegEx = "(?:\bhttps?:\/\/.+\?.+=https?(:\/\/|\%3A\%2F\%2F))([a-zA-Z0-9-.]+)(?:(\/|\%2F)[^\s\""]+\b)?"

	REM - Find URLs in message body
	Set URLsFound = oLookup(URLFinderRegEx, strMsgSLBody, True)
	If URLsFound.Count > 0 Then 
		For Each URL in URLsFound
			MsgURL = URL.SubMatches.Item(0)

			REM - If URLS found then get domain from each URL
			Set UMatches = oLookup(ExtractDomainFromURLRegEx, MsgURL, False)
			If UMatches.Count > 0 Then 
				For Each UMatch in UMatches
					MsgURLDomain = UMatch.SubMatches.Item(0)
				Next
			End If
			
			REM - Might as well test it as long as we're here
			If IsInSurbl(MsgURLDomain) Then 
				EventLog.Write("SURBL hit on message URL: " & MsgURLDomain)
				'
				' Do Stuff
				'
			End If
			If IsInSpamHausDBL(MsgURLDomain) Then 
				EventLog.Write("Spamhaus DBL hit on message URL: " & MsgURLDomain)
				'
				' Do Stuff
				'
			End If
			
			REM - And look for redirects and test them as well
			Set RedirectMatches = oLookup(RedirectURLFinderRegEx, LongURL, False)
			If RedirectMatches.Count > 0 Then 
				For Each RedirectMatch In RedirectMatches
					Domain = RedirectMatch.SubMatches.Item(1)
					If IsInSurbl(Domain) Then 
						EventLog.Write("SURBL hit on URL redirect: " & Domain)
						'
						' Do Stuff
						'
					End If
					If IsInSpamHausDBL(Domain) Then 
						EventLog.Write("Spamhaus DBL hit on URL redirect: " & Domain)
						'
						' Do Stuff
						'
					End If
				Next
			End If

			REM - Test URL domain against shortlink provider list
			If Lookup(ShortLinkProviderRegEx, MsgURLDomain) Then

				REM - If URL is a shortlink then expand it
				LongURL = GetToRootURL(MsgURL)
				EventLog.Write("Shortlink Found: " & MsgURL & " ---> " & LongURL)

				REM - Get domain from expanded shortlink and test it against SURBL and Spamhaus DBL
				Set Matches = oLookup(ExtractDomainFromURLRegEx, LongURL, False)
				If Matches.Count > 0 Then 
					For Each Match In Matches
						Domain = Match.SubMatches.Item(0)
						If IsInSurbl(Domain) Then 
							EventLog.Write("SURBL hit on expanded shortlink: " & Domain)
							'
							' Do Stuff
							'
						End If
						If IsInSpamHausDBL(Domain) Then 
							EventLog.Write("Spamhaus DBL hit on expanded shortlink: " & Domain)
							'
							' Do Stuff
							'
						End If
					Next
				End If
				
				REM - Look for redirects in the expanded shortlink and test them as well
				Set Matches = oLookup(RedirectURLFinderRegEx, LongURL, False)
				If Matches.Count > 0 Then 
					For Each Match In Matches
						Domain = Match.SubMatches.Item(1)
						If IsInSurbl(Domain) Then 
							EventLog.Write("SURBL hit on expanded shortlink redirect: " & Domain)
							'
							' Do Stuff
							'
						End If
						If IsInSpamHausDBL(Domain) Then 
							EventLog.Write("Spamhaus DBL hit on expanded shortlink redirect: " & Domain)
							'
							' Do Stuff
							'
						End If
					Next
				End If
			End If
		Next
	End If
				
End Sub

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-05-06 12:54

I got my first bona fide hit - a SURBL hit on an expanded shortlink url.

Spamassassin (bayes) already scored it high enough to hit the delete threshold since I've received plenty of these messages ("ukraine girls/dating") before. I guess SURBL just added the expanded URL.

Still evaluating if this is useful or not. It seems as though SURBL and especially very conservative Spamhaus are very late in updating their records. I'm sure they're avoiding false positives, but still, bayes does a better/faster job of picking these messages out.

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-05-17 18:41

Based on something RvdH mentioned in another thread about SURBL, I decided to make the list of URLs unique so they don't get tested twice. Also, If there's a hit, it will exit without further testing since one hit is enough. There's no need to test every single URL if one already tripped SURBL.

Also, using the array greatly shortens the code. There are 3 new functions to help with that.

The code searches the message body for URLs. If found it will do the following:
1) add the FQDN to the domain array
2) check if the FQDN is a shortlink domain, expand it, then extract the FQDN and add it to the domain array
3) check if the URL contains a redirect and if it does, it will extract the redirect URL FQDN and add it to the domain array
4) check if the FQDN is a shortlink domain, expand it, look for redirects in the expanded URL and if found, extracts the redirect URL FQDN and adds it to the domain array

If any of those items already exist in the array, it will be ignored so that the array contains unique FQDNs only.

Then it takes that array of FQDNs and tests them against SURBL.org and Spamhaus DBL. If there's a hit on either one, you handle the message however you want, then it will exit the for each loop. I use Soren's blacklist function and report the IP address to AbuseIPDB.

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"
Private Const ShortLinkRegExPath = "C:\hMailServer\Events\short_link_provider_regex.vbs"

Function Include(sInstFile)
	Dim f, s, oFSO
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	On Error Resume Next
	If oFSO.FileExists(sInstFile) Then
		Set f = oFSO.OpenTextFile(sInstFile)
		s = f.ReadAll
		f.Close
		ExecuteGlobal s
	End If
	On Error Goto 0
	Set f = Nothing
	Set oFSO = Nothing
End Function

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

Function IsInSURBL(Domain)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command (Resolve-DnsName -Name " & Domain & ".multi.surbl.org -DnsOnly -NoHostsFile).IPAddress")
	IsInSURBL = Lookup("(127\.0\.0\.[0-9]+)", objExec.StdOut.ReadAll)
End Function

'Requires Spamhaus DQS Key
Function IsInSpamhausDBL(Domain)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command (Resolve-DnsName -Name " & Domain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net -DnsOnly -NoHostsFile).IPAddress")
	IsInSpamhausDBL = Lookup("(127\.0\.1\.[2456])", objExec.StdOut.ReadAll)
End Function

Function ExpandURL(ShortURL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP.6.0")
	Request.Open "HEAD", ShortURL, False 
	On Error Resume Next
	Request.Send
	If Err.Number <> 0 Then
		EventLog.Write("--------------------------------")
		EventLog.Write("Error       : ExpandURL Function could not expand shortlink")
		EventLog.Write("Domain      : " & ShortURL)
		EventLog.Write("Description : " & Err.Description)
		ExpandURL = ShortURL
	Else
		ExpandURL = Request.GetOption(-1)
	End If
	On Error GoTo 0
End Function

Function GetToRootURL(URL)
	Dim Link, Match, Matches, LinkDomain
	Include(ShortLinkRegExPath)
	Link = ExpandURL(URL)
	If Link = URL Then
		GetToRootURL = URL
	Else 
		Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", Link, False)
		If Matches.Count > 0 Then 
			For Each Match in Matches
				LinkDomain = Match.SubMatches.Item(0)
			Next
		End If
		If Lookup(ShortLinkProviderRegEx, LinkDomain) Then
			GetToRootURL = GetToRootURL(Link)
		Else
			GetToRootURL = Link
		End If
	End If
End Function

Function GetFQDN(URL)
	Dim Match, Matches
	Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", URL, False)
	If Matches.Count > 0 Then 
		For Each Match in Matches
			GetFQDN = Match.SubMatches.Item(0)
		Next
	End If
End Function

Function GetURLRedirectDomain(URL)
	Dim Match, Matches
	Set Matches = oLookup("(?:\bhttps?:\/\/.+\?.+=https?(:\/\/|\%3A\%2F\%2F))([a-zA-Z0-9-.]+)(?:(\/|\%2F)[^\s\""]+\b)?", URL, False)
	If Matches.Count > 0 Then 
		For Each Match in Matches
			GetURLRedirectDomain = Match.SubMatches.Item(1)
		Next
	End If
End Function

Function IsInArray(Arr(), Item) : IsInArray = False
	Dim A
	For Each A In Arr
		If A = Item Then
			IsInArray = True
			Exit Function
		End If
	Next
End Function

Sub OnAcceptMessage(oClient, oMessage)

	Dim MessageBody, URL, URLsFound, Match, Matches, URLFinderRegEx, MsgURL, MsgURLDomain, ExpandedShortLinkDomain, MsgURLRedirectDomain, ExpandedShortlinkRedirectDomain, FQDNArr(), FQDN, DomainIterator, strMsgBody

	URLFinderRegEx = "(\b((https?(:\/\/|%3A%2F%2F))((([a-zA-Z0-9-]+)\.)+[a-zA-Z0-9-]+)(:\d+)?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)\b)"

	REM - Test URLs, Expanded Shortlinks and Redirects against SURBL
	REM - Convert message body to text string
	If oMessage.HTMLBody = Empty Then
		MessageBody = oMessage.Body
	Else
		MessageBody = oMessage.HTMLBody
	End If
	
	REM - Find URLs in message body
	DomainIterator = 0
	Set URLsFound = oLookup(URLFinderRegEx, MessageBody, True)
	If URLsFound.Count > 0 Then 
		For Each URL in URLsFound
			MsgURL = URL.SubMatches.Item(0)
			MsgURLDomain = GetFQDN(MsgURL)
			ExpandedShortLinkDomain = GetFQDN(GetToRootURL(MsgURL))
			MsgURLRedirectDomain = GetURLRedirectDomain(MsgURL)
			ExpandedShortlinkRedirectDomain = GetURLRedirectDomain(GetToRootURL(MsgURL))
		
			REM - Add URL FQDN's uniquely to array
			If MsgURLDomain <> "" Then
				If IsInArray(FQDNArr, MsgURLDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = MsgURLDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If ExpandedShortLinkDomain <> "" Then
				If IsInArray(FQDNArr, ExpandedShortLinkDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = ExpandedShortLinkDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If MsgURLRedirectDomain <> "" Then
				If IsInArray(FQDNArr, MsgURLRedirectDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = MsgURLRedirectDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If ExpandedShortlinkRedirectDomain <> "" Then
				If IsInArray(FQDNArr, ExpandedShortlinkRedirectDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = ExpandedShortlinkRedirectDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
		Next
			
		REM - Test Unique FQDN array against SURBL and Spamhaus DBL
		For Each FQDN In FQDNArr
			If IsInSurbl(FQDN) Then 
				EventLog.Write("SURBL hit on message URL: " & FQDN)
				'
				' Do stuff
				'
				Exit For 
			End If
		Next
		For Each FQDN In FQDNArr
			If IsInSpamHausDBL(FQDN) Then 
				EventLog.Write("Spamhaus DBL hit on message URL: " & FQDN)
				'
				' Do stuff
				'
				Exit For 
			End If
		Next
			
	End If

End Sub

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-05-19 10:13

I was getting errors on some messages due to the URLs being URLEncoded. So I made a URLDecode function.

Code: Select all

Function URLDecode(URL)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command [uri]::UnescapeDataString(" & Chr(39) & URL & Chr(39) & ")")
	URLDecode = Trim(Replace(objExec.StdOut.ReadAll, vbCrLf, ""))
End Function

...........

	REM - Find Unique URLs in message body
	DomainIterator = 0
	Set URLsFound = oLookup(URLFinderRegEx, MessageBody, True)
	If URLsFound.Count > 0 Then 
		For Each URL in URLsFound
			MsgURL = URL.SubMatches.Item(0)
			MsgURLDomain = GetFQDN(URLDecode(MsgURL))
			ExpandedShortLinkDomain = GetFQDN(GetToRootURL(URLDecode(MsgURL)))
			MsgURLRedirectDomain = GetURLRedirectDomain(URLDecode(MsgURL))
			ExpandedShortlinkRedirectDomain = GetURLRedirectDomain(GetToRootURL(URLDecode(MsgURL)))

...........

But wow... this takes a long time because every URL gets passed through the shortlink expander twice (2nd one to look for redirects - that will be fixed). Do not use this script on a busy server.

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-05-20 07:55

Only bona fide shortlinks should be expanded. This greatly cuts the processing time.

Code: Select all

Option Explicit

Private Const SpamhausDQSKey = "supersecretkey"
Private Const ShortLinkRegExPath = "C:\hMailServer\Events\short_link_provider_regex.vbs"

Function Include(sInstFile)
	Dim f, s, oFSO
	Set oFSO = CreateObject("Scripting.FileSystemObject")
	On Error Resume Next
	If oFSO.FileExists(sInstFile) Then
		Set f = oFSO.OpenTextFile(sInstFile)
		s = f.ReadAll
		f.Close
		ExecuteGlobal s
	End If
	On Error Goto 0
	Set f = Nothing
	Set oFSO = Nothing
End Function

Function Lookup(strRegEx, strMatch) : Lookup = False
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = False
		.MultiLine = True
		.IgnoreCase = True
		If .Test(strMatch) Then Lookup = True
	End With
End Function

Function oLookup(strRegEx, strMatch, bGlobal)
	If strRegEx = "" Then strRegEx = StrReverse(strMatch)
	With CreateObject("VBScript.RegExp")
		.Pattern = strRegEx
		.Global = bGlobal
		.MultiLine = True
		.IgnoreCase = True
		Set oLookup = .Execute(strMatch)
	End With
End Function

Function URLDecode(URL)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command [uri]::UnescapeDataString(" & Chr(39) & URL & Chr(39) & ")")
	URLDecode = Trim(Replace(objExec.StdOut.ReadAll, vbCrLf, ""))
End Function

Function URLEncode(URL)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command [uri]::EscapeDataString(" & Chr(39) & URL & Chr(39) & ")")
	URLEncode = Trim(Replace(objExec.StdOut.ReadAll, vbCrLf, ""))
End Function

Function IsInSURBL(Domain)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command (Resolve-DnsName -Name " & Domain & ".multi.surbl.org -DnsOnly -NoHostsFile).IPAddress")
	IsInSURBL = Lookup("(127\.0\.0\.[0-9]+)", objExec.StdOut.ReadAll)
End Function

Function IsInSpamhausDBL(Domain)
	Dim objShell : Set objShell = CreateObject("Wscript.Shell")
	Dim objExec : Set objExec = objShell.Exec("Powershell -Command (Resolve-DnsName -Name " & Domain & "." & SpamhausDQSKey & ".dbl.dq.spamhaus.net -DnsOnly -NoHostsFile).IPAddress")
	IsInSpamhausDBL = Lookup("(127\.0\.1\.[2456])", objExec.StdOut.ReadAll)
End Function

Function ExpandURL(URL)
	Dim Request : Set Request = CreateObject("MSXML2.ServerXMLHTTP.6.0")
	Request.Open "HEAD", URL, False 
	On Error Resume Next
	Request.Send
	If Err.Number <> 0 Then
		EventLog.Write ( "--------------------------------" )
		EventLog.Write ( "Error       : ExpandURL Function could not expand URL" )
		EventLog.Write ( "URL         : " & URL )
		EventLog.Write ( "Description : " & Err.Description  )
		EventLog.Write ( "--------------------------------" )
		ExpandURL = URL
	Else
		ExpandURL = Request.GetOption(-1)
	End If
End Function

Function GetToRootURL(URL)
	Dim Link, Match, Matches, LinkDomain
	Include(ShortLinkRegExPath)
	Link = ExpandURL(URL)
	If Link = URL Then
		GetToRootURL = URL
	Else 
		Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", Link, False)
		If Matches.Count > 0 Then 
			For Each Match in Matches
				LinkDomain = Match.SubMatches.Item(0)
			Next
		End If
		If Lookup(ShortLinkProviderRegEx, LinkDomain) Then
			GetToRootURL = GetToRootURL(Link)
		Else
			GetToRootURL = Link
		End If
	End If
End Function

Function GetFQDN(URL)
	Dim Match, Matches
	Set Matches = oLookup("(?:https?:\/\/)([a-zA-Z0-9-.]+)(?:\/[^\s]+)?", URL, False)
	If Matches.Count > 0 Then 
		For Each Match in Matches
			GetFQDN = Match.SubMatches.Item(0)
		Next
	End If
End Function

Function IsShortlink(FQDN) : IsShortlink = False
	Include(ShortLinkRegExPath)
	If Lookup(ShortLinkProviderRegEx, FQDN) Then IsShortlink = True
End Function

Function GetURLRedirectDomain(URL)
	Dim Match, Matches
	Set Matches = oLookup("(?:\bhttps?:\/\/.+\?.+=https?(:\/\/|\%3A\%2F\%2F))([a-zA-Z0-9-.]+)(?:(\/|\%2F)[^\s\""]+\b)?", URL, False)
	If Matches.Count > 0 Then 
		For Each Match in Matches
			GetURLRedirectDomain = Match.SubMatches.Item(1)
		Next
	End If
End Function

Function IsInArray(Arr(), Item) : IsInArray = False
	Dim A
	For Each A In Arr
		If A = Item Then
			IsInArray = True
			Exit Function
		End If
	Next
End Function

Sub OnAcceptMessage(oClient, oMessage)

	Dim MessageBody, URL, URLsFound, Match, Matches, URLFinderRegEx, ExtractDomainFromURLRegEx, RedirectURLFinderRegEx, MsgURL, MsgURLDomain, ExpandedShortLink, ExpandedShortLinkDomain, MsgURLRedirectDomain, ExpandedShortlinkRedirectDomain, FQDNArr(), FQDN, DomainIterator

	URLFinderRegEx = "(\b((https?(:\/\/|%3A%2F%2F))((([a-zA-Z0-9-]+)\.)+[a-zA-Z0-9-]+)(:\d+)?((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)\b)"

	REM - Convert message body to text string
	If oMessage.HTMLBody = Empty Then
		MessageBody = oMessage.Body
	Else
		MessageBody = oMessage.HTMLBody
	End If

	REM - Find Unique URLs in message body
	DomainIterator = 0
	Set URLsFound = oLookup(URLFinderRegEx, MessageBody, True)
	If URLsFound.Count > 0 Then 
		For Each URL in URLsFound
			MsgURL = URL.SubMatches.Item(0)
			MsgURLDomain = GetFQDN(URLDecode(MsgURL))
			MsgURLRedirectDomain = GetURLRedirectDomain(URLDecode(MsgURL))
			If IsShortlink(MsgURLDomain) Then 
				ExpandedShortLink = GetToRootURL(URLDecode(MsgURL))
				ExpandedShortLinkDomain = GetFQDN(ExpandedShortLink)
				ExpandedShortlinkRedirectDomain = GetURLRedirectDomain(ExpandedShortLink)
			End If
		
			If MsgURLDomain <> "" Then
				If IsInArray(FQDNArr, MsgURLDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = MsgURLDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If ExpandedShortLinkDomain <> "" Then
				If IsInArray(FQDNArr, ExpandedShortLinkDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = ExpandedShortLinkDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If MsgURLRedirectDomain <> "" Then
				If IsInArray(FQDNArr, MsgURLRedirectDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = MsgURLRedirectDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
			If ExpandedShortlinkRedirectDomain <> "" Then
				If IsInArray(FQDNArr, ExpandedShortlinkRedirectDomain) = False Then 
					REDIM PRESERVE FQDNArr(DomainIterator)
					FQDNArr(DomainIterator) = ExpandedShortlinkRedirectDomain
					DomainIterator = DomainIterator + 1
				End If
			End If
		Next
		
		REM - Test Unique FQDN against SURBL and Spamhaus DBL
		For Each FQDN In FQDNArr
			If IsInSurbl(FQDN) Then 
				EventLog.Write("SURBL hit on message URL: " & FQDN)
				' 
				' Do Stuff
				' 
				Exit For 
			End If
		Next
		For Each FQDN In FQDNArr
			If IsInSpamHausDBL(FQDN) Then 
				EventLog.Write("Spamhaus DBL hit on message URL: " & FQDN)
				' 
				' Do Stuff
				' 
				Exit For 
			End If
		Next
			
	End If

End Sub

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-06-06 07:53

I noticed that my URL regex missed a quoted printable URL. Specifically:

Code: Select all

https://e=2Enewyorktimes=2Ecom/pub...
So I found a quoted printable decode function for vbs here: ' https://www.motobit.com/tips/detpg_quot ... le-decode/

Here's the function without the notes (just to save space):

Code: Select all

' https://www.motobit.com/tips/detpg_quoted-printable-decode/
Function QuotedPrintableDecode(SourceData, CharSet)
	Dim Message: Set Message = CreateObject("CDO.Message")
	Message.BodyPart.ContentTransferEncoding = "quoted-printable"
	Dim Stream 'As ADODB.Stream
	Set Stream = Message.BodyPart.GetEncodedContentStream
	If VarType(SourceData) = vbString Then
		Stream.charset = "windows-1250"
		Stream.WriteText SourceData
	Else
		Stream.Type = 1
		Stream.Write SourceData
	End If
	Stream.Flush
	Set Stream = Message.BodyPart.GetDecodedContentStream
	Stream.CharSet = CharSet
	QuotedPrintableDecode = Stream.ReadText
End Function
And here is where you run the function - before attempting to search for URLs:

Code: Select all

	REM - Convert message body to text string
	If oMessage.HTMLBody = Empty Then
		MessageBody = QuotedPrintableDecode(oMessage.Body, "UTF-8")
	Else
		MessageBody = QuotedPrintableDecode(oMessage.HTMLBody, "UTF-8")
	End If

palinka
Senior user
Senior user
Posts: 4610
Joined: 2017-09-12 17:57

Re: Expand Shortened URLs

Post by palinka » 2024-06-06 09:32

oof... not enough coffee this morning. This function ^^ is not needed.

But pretty cool function. I'll probably find a use for it somewhere else.

Post Reply