Skip to main content

How Spam Filtering Works

Introduction

Email spam, also known as unsolicited bulk email is the practice of sending unwanted email messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. Spam started to become a problem when the Internet was opened up to the general public in the mid-1990s. It grew exponentially over the following years, and today composes some 80 to 85 percent of all the e-mail in the World. Increasingly, e-mail spam today is sent via networks of virus- or worm-infected personal computers in homes and offices around the globe. Many modern worms install a backdoor that allows the spammer to access the computer and use it for malicious purposes. This complicates attempts to control the spread of spam, as in many cases the spam does not obviously originate from the spammer. Other sources of spam emails are from email address harvesting, which is an industry dedicated to collecting email addresses and selling compiled databases. Consequently, it is plain to see that it’s an exceedingly complex effort to take measures against spamming in general.

Our platform uses a tool called Open Source Spam filter (OSS) as a mail filtering installation. OSS is built up by commonly used, open source software, and as it is simple to be customized for our unique needs, it’s fully integrated with Machpanel. This type of installation is widely used by both enterprise and hosting companies such as ourselves.

 

Spam filtering high level summary

The basic principal of mail filtering is to validate a mail against several tests and checks, each giving a pre-set score, which are summed up to a total score. We assign positive numbers for failing tests and negative numbers for passing tests. The lower the total score is, the lower is the chance that an email is spammy. After calculating the total score, based on the OSS scoring rules, the email gets rejected (10 and above), handed over to Exchange as spammy (3-9) or as clear from spam (below 3). All actions are thoroughly logged, and all rejections (for any reason) result in a Non-Delivery Report (NDR), which is sent back to the sender. The total score can be viewed in the Email Header.

The scoring system of OSS correlates with the one used on Exchange, which gives us the possibility to provide a means of mail filtering control on a per-user basis. These settings are called Spam Confidence Level (SCL) settings. Based on the user’s SCL settings in Exchange, the score given by OSS is evaluated on Exchange, and so the given email may be rejected, delivered into the Junk email folder, or delivered to the Inbox of the recipient.

In essence, spam filtering is based on the scores given by OSS, which are evaluated on OSS and then on Exchange as well, based on the user’s SCL settings. To see how the filtering process basically works, please check Figure 1 below.

image-1610644971073.png