Every day 28 billion SPAM mails are sent. That is a large part of the entire e-mail traffic and hardly any e-mail account is spared from unwanted advertising mails. So far, the only way to protect your mailbox from them is a good and working spam filter. SpamAssassin is also such a spam filter, but it is mainly its literally hand-picked algorithm and its high flexibility that make it a very special piece of software.
A brief overview
Although a whole team is now working on the software, the foundation stone for SpamAssassin was laid by Justin Mason, who was previously responsible for a similar program called filter.plx. In 2001 the program parts were adapted and rewritten and renamed as the new project "Spamassassin". Since 2004 it belongs to the Apache Software Foundation and is continued open source. Apart from the open-source nature of the program, there is also a license that makes the program free software, whose private and commercial use, for example as a module of other paid software, is completely free.
A big advantage of the software is the high flexibility regarding the usage level, because SpamAssassin can be used either as own application, as subroutine (module) of another software, as client program of a mailer daemon or as call of a MDA (Mail Delivery Agent) like Procmail. Important: Even if different instances of the software are included in the distribution chain, the flag does not cause a E-mail analyzed twice, which prevents process redundancy and reduces resource utilization.
This is how the sorting process works
SpamAssassin works according to a point system. Each incoming email is analyzed according to certain criteria and assigned points that indicate how likely it is to be Spam acts. A user-adjustable threshold defines the point at which the email is treated as spam and sorted out. To achieve the most accurate score possible, the program has a whole range of different methods at its disposal, such as:
- DNS-based white- and blacklisting
- Checksum filter from open source and commercial providers such as DCC or Vipul's Razor
- Expression filter based on Keywords or keyphrases
- The Hashcash System (Proof of Work)
- DIM and SPF
- URL listings as output by trackers like uribl.com
Bayesian filter
In addition to the more or less static filter systems, SpamAssassin has an algorithm applied to the Bayesian filter which is able to "learn" certain heuristics, i.e. recurring, distinctive elements, from spam emails already sent and thus adapt to the user's behaviour. The system is based on the Byesian concept of probability, but has often been criticized because an unlearned filter system often produces false positives.
In order to prevent this, it is recommended to first check the e-mails classified as spam more closely. The Bayesian filter contains a tool, accessible via the sa-learn command line, which can be used to search for heuristics in either individual emails or entire mailboxes. If this process has been performed a few times, the number of false positives is significantly reduced.
Current development
In the course of the last updates to version 3.3 and 3.4, SpamAssassin has been given a major new feature that should have a major impact on resource usage. Although performance optimizations have been made in advance, the program now behaves like a deterministic finite automaton. Basically, the hardware can better adapt to DEAs because the possible states always run deterministically, i.e. predetermined. An appropriately programmed system - like today's server hardware - can thus reserve and free up resources in a time-critical manner.
Support for programmes
As already mentioned, the high flexibility of the program code is a unique selling point, but SpamAssassin also has a successful API that allows data exchange with many other programs. Among the open source projects these are the frequently used email clients Mozilla Thunderbird, KMail as well as Novell Evolution or exotics like Citadel or Claws Mail. With the commercial products SpamAssassin is already integrated in the program code. This is the case with the Icewarp server, McAfee SpamKiller, Mac OS X Server or Sophos PureMessage.