Presented at BSDCon on October 19, 2000
(Slides at http://www.brettglass.com/spam/index.html)
Abstract: A properly configured BSD mail server can protect users (including those running other operating systems on client machines) from spam and Trojan horses while rejecting virtually no legitimate content. This tutorial describes how to configure BSD systems to use DNS blacklists, procmail, mail "sanitizing" scripts, daemons that watch logs for evidence of spamming and "mail bombing," and similar utilities. Prevention of unauthorized relaying and detection and blocking of outbound spam are also discussed. Countermeasures against address harvesting and privacy invasion techniques such as "Rumplestiltskin" attacks, fingerd scans, tracking via identd, e-mail cookies, and malicious image tags in HTML mail are covered in detail. Links to source materials and relevant software tools are provided.[Note: This paper is formatted for viewing via a Web browser. References to additional information and helpful source material are presented within this paper in the form of HTML links, rather than as footnotes, for easy online access. If you receive this paper in printed form and wish to follow up the references, access the master copy of this document online at http://www.brettglass.com/spam/paper.html and follow the links, which the author will endeavor to keep current.]
sendmail was originally distributed under the Berkeley (or "BSD") license, but is no longer. It now comes in two versions: a free version with source (http://www.sendmail.org/), released under a license called the Sendmail License, and a commercial version with Web-based configuration utilities (http://www.sendmail.com/).
The status of Unix-like operating systems the most popular operating system for mail servers follows directly from the choice of sendmail as the MTA. As sendmail author Eric Allman notes in the Sendmail FAQ:
Generally speaking, I adhere to the old axiom that you should choose what software you want to run first, then choose the platform (hardware and OS) that best runs this software. By this token, if sendmail is the software, then a recent version of BSD Unix would probably be best, since sendmail was developed at UC Berkeley on BSD Unix.The BSDs are thus the recommended platforms for sendmail MTAs. Solaris (originally BSD-derived, though it now incorporates Unix System V) is also common, especially when e-mail services for a large organization are concentrated in a single multiprocessor server. Linux, which does not share BSD's pedigree but works similarly, is often used as well. Sendmail, Inc. offers a commercial port of sendmail for Microsoft Windows NT/2000 called Sendmail for NT Mail Routing. However, due to the greater demands of Microsoft's operating system platforms, a Windows NT/2000 mail server requires substantially more memory and computing power than those which run Unix-like OSes. (Sendmail, Inc. recommends a minimum of 256 MB of RAM and a 300+ Mhz CPU for Windows 2000 servers that run its port of sendmail.)
Because sendmail and a BSD-derived operating system are the most robust, economical, and popular choice for industrial strength mail servers, this presentation will assume the use of this software configuration. As mentioned above, many of the techniques described in this paper may be useful with other configurations as well.
You may have to hunt a bit to find the directory where the original .mc file for your operating system distribution is hidden. (Looking at the top of the default sendmail.cf is often helpful, since it will tell you the location of the .mc file on the machine where sendmail.cf was created.) You may even find that the default .mc file is not present at all! For example, releases of FreeBSD prior to 4.1.1 did not install the default .mc file (freebsd.mc), nor the components required to rebuild sendmail.cf from it, unless one installed the full source distribution.
If you can find the original .mc file on your system, a Makefile that will generate a .cf file from it is usually present in the same directory. Make a copy of the default file (e.g. cp default.mc localhost.mc) and edit the copy. Typically, you can use the command make localhost.cf to build a new .cf file. If all goes well, you can often install the new .cf file with the command make localhost.install. Finally, issue the command kill -HUP `head -1 /var/run/sendmail.pid` to restart the main sendmail process.
More detailed instructions describing how to build and install sendmail.cf are available at the sendmail.org Web site or in the sendmail book by Bryan Costales and Eric Allman.
The divert(-1) macro command you'll find near the top of many .mc files turns off all output from the macro processor until the following divert(0) command. This pair of commands is sometimes used to comment out a large block of text such as a copyright notice. However, it is really not wise to do this. Even though output is disabled, there could be untoward side effects if any of the text in between is misinterpreted as a macro command. The author therefore uses the "dnl" macro command (see below) for all comments in a .mc file.
The dnl you'll see at the end of nearly every line of a .mc file stands for "delete to newline," and is roughly equivalent to a double forward slash (//) in a C++ program. Used after a command, it prevents the m4 macro processor from being affected by comments at the end of the line or from copying extraneous trailing whitespace into sendmail.cf. Placing "dnl " (including the space) at the beginning of a line turns the entire line into a comment. A common mistake among system administrators is to place "#" at the beginnings of lines in a .mc file, believing that m4 will treat them as comments. But because of sendmail's macro files, m4 sees these as ordinary input! At best, m4 will just graft the lines verbatim into sendmail.cf, where they will be treated as comments. At worst, parts of these lines will be interpreted as macro commands, with undesirable results. So, it is important to set off lines that are intended to go no farther than the .mc file with "dnl ".
By default, a host running sendmail 8.9 or later will not relay mail
from a second host to a third host; exceptions must be specified explicitly when
relaying is desired. For example, to allow relaying to or from a group of hosts
that share the same domain name suffix, one can add that suffix to the file
/etc/mail/relay-domains. The FEATURE commands in Table 1, which can be placed in
a .mc file, change the criteria which sendmail uses to determine
whether a message should be relayed.
Feature | Effect |
FEATURE(`relay_entire_domain')dnl | Enables relaying to or from any host in the same domain as the server. Usually safe so long as sendmail is configured properly and network and DNS are secure. Can create an open relay if server is misconfigured. |
FEATURE(`relay_based_on_MX')dnl | Enables relaying to (but not from) any host for which this server is a mail exchanger. Convenient, but somewhat risky in that anyone can make you an MX for a host in a domain he or she owns. Sometimes fails when source routing (e.g. "%", "@host:", or "!") is used. Use of /etc/mail/relay-domains or access_db is preferable. |
FEATURE(`access_db')dnl | Enables fine-grained control of relaying (as well as blocking by user and domain) via an access control database. By default, the database is /etc/mail/access.db; it's built from the text file /etc/mail/access by a Makefile in /etc/mail.) This is the most powerful and flexible option. |
FEATURE(`relay_hosts_only')dnl | Normally, domains listed in /etc/mail/relay-domains (and also in RELAY entries in an access control database) are treated as domain name suffixes; that is, bar.com also matches foo.bar.com, etc. Enabling this feature "tightens" the interpretation of these entries so that domain names are always treated as individual host names only. Use of this feature may make maintenance of the server more labor-intensive, since new hosts must be listed individually before they will be allowed relaying privileges. This feature is a somewhat awkward attempt to compensate for the fact that neither access control mechanism uses wildcards (e.g. *.domain.com) to distinguish between entries that indicate a single host and those that indicate a domain suffix. |
For more information on commands that affect relaying and what can go wrong with them, see the sendmail.org Web site.
Note that relaying is not the same as forwarding, which is done at the request of the recipient or system administrator rather than at the sender's request. Forwarding addresses may be specified in a user's .forward file, or in the global /etc/mail/aliases or /etc/mail/virtusertable file, regardless of the system's relaying policy.
The sample /etc/mail/access file shown in Listing 1 demonstrates many access
control database features, including the To:, From:, and Connect: tags. Note
that more specific rules win out over more general ones. For example, a line
accepting mail from innocent.bystander.spamhaven.net will take priority over one
rejecting mail from spamhaven.net. Also, three separate checks, triggered by
different rules in sendmail.cf, are normally done on each message.
"Connect:" tags are checked after the HELO command, "From:" tags after the MAIL
FROM: command, and "To:" tags after each RCPT TO: command. Untagged records are
checked at all three stages. The delay_checks feature macro causes the other
checks to be delayed until after recipients are checked. (The "Connect:" check
is skipped altogether if the connecting host has authenticated itself to the
server by this time.) If a message is rejected at any stage, it won't proceed to
the next.
# Access control database. This database overrides the policies set by # FEATURE(`relay_entire_domain'), FEATURE('relay_based_on_MX'), and # blacklists. Records match connecting host, envelope MAIL FROM:, # and/or envelope RCPT TO: (if FEATURE(`blacklist_recipients') is on). # # Domain names in this file refer to hosts if FEATURE('relay_hosts_only') # is activated and entire domains or subdomains otherwise. # # Block most traffic to/from spamhaven.net but exempt innocent customer. # The "OK" overrides any DNS blacklists that might be in use. spamhaven.net REJECT innocent.bystander.spamhaven.net OK # # Customized rejection messages for specific situations cyberpromo.com 550 Nice try, Spamford # # Block messages from this user name in any domain. With # FEATURE(`blacklist_recipients') enabled, block mail to it as well. FREE.STEALTH.MAILER@ 550 Stealth mailer detected by radar # # Relay mail from internal workstations using reserved IPs. (Do not # relay to them, however.) "Connect:" tag, not "From:" tag, is used # to control relaying of mail coming from a host or domain. To: tag # controls relaying to a host or domain. Connect:192.168.0 RELAY # # Silently discard mail from an annoying user. "From:" tag # filters on "envelope From:" (RFC 821 MAIL FROM:). This tag # should be used only to reject or discard mail, since the # RFC 821 MAIL FROM: can be spoofed. From:kvetch@aol.com DISCARD # # Discard mail to local user joe who is no longer with the company. If # there are other local recipients, they still get the mail! "To:" tag # requires FEATURE(`blacklist_recipients') to work. To:joe@ourdomain.com DISCARD |
The syntax and semantics of records in the access control database, and the subtly different criteria used for each of the three checks, create many pitfalls for the unwary. For example, RELAY -- counterintuitively -- is a superset of OK and is more permissive. RELAY allows mail to be relayed or received while OK only allows it to be received.
The behavior of DISCARD is likewise subtle. DISCARD causes the server to feign acceptance of a message and then not deliver it to some or all of the intended recipients. When a DISCARD record matches the sender (MAIL FROM:), the message will not reach any recipient. However, when it does not match the sender but does match a recipient (this will only occur if FEATURE(`blacklist_recipients') is present in the .mc file), it prevents only the specified recipient from getting the message. Others will receive it unless they too are "blacklisted."
One cannot use wildcards in domain names, so every domain name in the file must be interpreted either as a suffix or as a complete host name. (The interpretation is controlled by the presence or absence of the relay_hosts_only feature.) Adding or removing the relay_hosts_only feature without a careful review of the database can cause unexpected and very undesirable results. Finally, despite the use of hashing, pattern matching is not as efficient as it might be. Multiple lookups must be done, so searches of a large database can be time-consuming.
For more information about the use of access control databases, see the readme file in the /cf directory of the most recent sendmail distribution.
Feature or Option | Effect |
FEATURE(smrsh)dnl | Limits programs into which sendmail pipes mail (for example, as a result of an entry in a .forward file or /etc/mail/aliases) to those in a specific directory, usually /usr/libexec/sm.bin. The administrator usually creates symbolic links in this directory to programs such as vacation and majordomo. smrsh, the "sendmail restricted shell," also rejects commands with certain metacharacters and strips the directory path from the front of the program name. This feature is not useful if procmail is installed and users can supply their own .procmailrc files, since other applications can then be invoked through procmail. |
define(`confPRIVACY_FLAGS',`goaway')dnl | Use of "goaway" privacy setting turns off SMTP commands, such as EXPN or VRFY, that may reveal local user names to spammers. |
define(`confMAX_HEADERS_LENGTH',16384)dnl | Limits the combined size of all RFC 822 headers. Some spamming programs try to prevent MTAs from reporting accurate transaction information by adding very long headers that "push" legitimate information out of the buffer used to hold headers. (While this may not cause a destructive overflow, it does hide information.) Limiting header sizes also makes it more difficult to exploit recently discovered buffer overflows in UW IMAP, Outlook and Outlook Express. |
define(`confCONNECTION_RATE_THROTTLE',3)dnl | Limits the number of new connections per second. This caps the overhead incurred due to forking new sendmail processes. May be useful against DoS attacks or barrages of spam. (As mentioned below, a per-IP address limit would be useful but is not available as an option at this writing.) |
define(`confSMTP_LOGIN_MSG', `$j server ready at $b')dnl |
Changes the sendmail welcome message. MTA name and version number can be removed or forged, making it more difficult for would-be intruders to probe for vulnerabilities. (This is security by obscurity, but is nonetheless very effective against scripted probes.) "ESMTP" will be inserted in the first whitespace in the wecome message to signal ESMTP capability. |
define(`confMAX_RCPTS_PER_MESSAGE,25)dnl | Limits the number of recipients per message. Some spamming software does not know how to respond to the SMTP 452 error code (which asks to defer the remaining recipients to another session) and gives up when the limit is reached. |
#vers 2 smtp I'm sorry, Dave, I'm afraid I can't do that.Note that the first area of whitespace on each line above must be a tab character, not spaces. The message may be altered to suit the administrator's taste.
This capability allows sendmail to reject messages that claim, for
example, to be from all-numeric America Online screen names. (AOL does not allow
such names.) sendmail is not the optimal tool for this sort of
filtering, since one must be a highly competent sendmail.cf hacker to draft the
rules properly. Nonetheless, this facility can be used to reject messages
bearing certain obvious signs of abuse. Listing 2 contains several commonly used
snippets that do useful things with RFC 822 headers.
LOCAL_CONFIG # # Regular expression to reject: # * numeric-only localparts from aol.com and msn.com # * localparts starting with a digit from juno.com # Kcheckaddress regex -a@MATCH ^([0-9]+<@(aol|msn)\.com|[0-9][^<]*<@juno\.com)\.?> # # Names that won't be allowed in a To: line (local-part and domains) # C{RejectToLocalparts} friend you C{RejectToDomains} public.com LOCAL_RULESETS HTo: $>CheckTo SCheckTo R$={RejectToLocalparts}@$* $#error $: "553 Header error" R$*@$={RejectToDomains} $#error $: "553 Header error" HMessage-Id: $>CheckMessageId # make sure message ID has two parts separated by an @ SCheckMessageId R< $+ @ $+ > $@ OK R$* $#error $: "553 Header error" LOCAL_RULESETS SLocal_check_mail # check address against various regex checks R$* $: $>Parse0 $>3 $1 R$+ $: $(checkaddress $1 $) R@MATCH $#error $: "553 Header error" LOCAL_RULESETS HSubject: $>Check_Subject # crude check for Melissa virus D{MPat}Important Message From D{MMsg}This message may contain the Melissa virus. SCheck_Subject R${MPat} $* $#error $: 553 ${MMsg} RRe: ${MPat} $* $#error $: 553 ${MMsg} |
sendmail communicates with the filter process via sockets, allowing the filter to run on a different host if desired. The filter process is persistent and can therefore watch for patterns in message traffic.
sendmail can be instructed to pass messages through filters via commands such as
INPUT_MAIL_FILTER(`archive', `S=local:/var/run/archivesock, F=R')dnl INPUT_MAIL_FILTER(`spamcheck', `S=inet:2525@localhost, F=T')dnlin the .mc file. For documentation of the INPUT_MAIL_FILTER macro, see /libmilter/README in the most recent sendmail distribution.
Examples in the "libmilter" directory of the sendmail distribution show how to create sendmail filters in C. However, the easiest and fastest way to get started is to use the Sendmail::Milter Perl module. This module can be found on SourceForge.
At this writing, the "Milter" interface is labeled "for future release" and may change before it is finalized and officially supported.
POP before SMTP creates a minor security risk in that someone who is "sniffing" network traffic may be able to use the temporarily authorized IP address to send spam. (This could occur, for example, on a network which a hotel provides to guests, or at an Internet café where users plug their laptops into a common hub.) However, the chance of exploitation is slight, and the amount of damage that can be done is limited -- especially if the mail server is monitored for outgoing spam and mail bombing. (See Detecting Outgoing Spam and Mail Bombing below.)
If all POP clients and servers supported XTND XMIT, it would be possible to reserve SMTP for communication between mail servers and prohibit SMTP traffic from dial-up Internet connections and/or from hosts without fixed IP addresses. This would make it much easier to control spam.
Berkeley's popper and Qualcomm's qpopper both support XTND XMIT. Unfortunately, not all mail user agents support it, and in some the feature is present but well hidden. In Eudora, for example, use of XTND XMIT cannot be enabled via the graphical user interface. One must edit the EUDORA.INI file manually, adding the line
UsePOPSend=1to the [Settings] section. (If one is using more than one "persona," the line shown above must be added to the section describing each "persona" for which XTND XMIT will be used.)
Perhaps the most important advantage of XTND XMIT is that it works even if a user's ISP has blocked outgoing SMTP connections (a restriction which will cause POP before SMTP to fail). Also, if one has a limited number of roving clients, it may require much less effort to reconfigure them for XTND XMIT (or write a simple script to do so) than to reconfigure the server for POP before SMTP.
XTND XMIT does have one minor disadvantage. Because the POP server does not invoke an MTA until it has received an outgoing message in its entirety, it does not validate recipient addresses as they are submitted. Some POP servers will notify the sender of invalid addresses after the message has finished uploading, while others silently drop invalid addresses and send the message to the rest.
A mail server checks an IP address against a DNS blacklist by attempting to resolve a specially constructed dummy host name. By convention, the name consists of the IP address in dotted decimal format, reversed and prepended to a domain name suffix. For example, when querying ORBS to see whether the host at 1.2.3.4 is in the database, a mail server would attempt to resolve 4.3.2.1.relays.orbs.org. If the name resolves, the address is blacklisted. In some cases, the address to which it resolves is a sentinel value that provides additional information about why the host was listed. ORBS, for example, returns 127.0.0.2 if the host was blacklisted due to a relaying test. It returns 127.0.0.3 if the host was entered into the database manually, 127.0.0.4 if the host was untestable, and 127.0.0.5 if the address is within a block of IP addresses controlled by a known spammer.
At this writing, anyone on the Internet can configure his or her server to
query most of these databases at no charge. (MAPS' RBL+, a premium service,
requires an access fee; see below.) Each blacklist is claimed by its maintainer
to contain only hosts which meet specific criteria, as shown in Table 3.
Database | Contents |
MAPS Real Time Black Hole List (RBL) | Hosts and networks which MAPS believes to be "friendly, or at least neutral, to spammers who use these networks either to originate or relay spam." Many mistakenly believe that the "real time" in this list's name refers to the speed with which hosts are added to the list. However, the name is intended to indicate that a mail server can block a message from the offending site in real time by querying the list. Getting a host or network added to the MAPS RBL is usually quite difficult. Entries are added only after substantial evidence has been presented that spam has been sent and complaints ignored. Queries of this database use the domain suffix rbl.maps.vix.com. |
MAPS Dial-Up List (DUL) | IP addresses used by ISPs' dial-up modems, as well as some pools of addresses which are dynamically assigned to DSL and cable modems. Since it is normally desirable to send mail via a server with a fixed IP address, mail sent by direct SMTP from a non-dedicated address directly to its destination is often spam. Use of the DUL to filter messages is highly effective in that it blocks much of the spam sent from "throwaway" or trial accounts with dial-up ISPs. Queries use the domain suffix dialups.mail-abuse.org. |
MAPS Relay Spam Stopper (RSS) | Mail servers whose bandwidth and computing power have been used to duplicate and relay spam to its final destination. In the overwhelming majority of cases, this has been done without the owner's knowledge or consent. These "open relays" are usually mail servers which run old or improperly configured mail transfer agents. Unlike ORBS (see below), the MAPS RSS only lists hosts which have been reported to have sent spam and have been confirmed to be open relays (that is, servers which will relay mail from an untrusted, outside source). It does not list "multihop relays" -- relays which consist of two or more servers working together. Queries use the domain suffix relays.mail-abuse.org. |
MAPS RBL+ Master Service (RBL+) | IP addresses contained in any of the MAPS databases (RBL, DUL, or RSS). The RBL+ database combines results from the three others so that a mail server can "vet" a message with a single query. A flat fee, used to support the maintenance of all of the databases, is charged for the ability to query this blacklist or mirror it via DNS zone transfers. |
Open Relay Behaviour-modification System (ORBS) | Mail servers which test positive for relaying, are manually entered into the ORBS database, block access when ORBS probes them, or are within the address blocks of known spammers. All servers which participate in a multihop relay are listed as open relays. The full ORBS database is queried via the domain suffix relays.orbs.org. Queries that use the suffix inputs.orbs.org instead will return a positive result only if a server has tested positive as an open relay, but not if it was blacklisted for other reasons. Some administrators see this so-called "ORBS Lite" as a good compromise between the cautious policy of the MAPS RSS and the very aggressive policy of the full ORBS database. |
In general, it is best to consult blacklists as early as possible when processing mail -- that is, at the MTA (or an MTA wrapper such as smtpd) rather than in a spam filter that is applied afterward. The next section explains how to configure sendmail to use one or more DNS blacklists.
FEATURE(dnsbl,`rbl.maps.vix.com',`Rejected - see http://www.mail-abuse.org/rbl/')dnl FEATURE(dnsbl,`dul.mail-abuse.org',`Dialup - see http://www.mail-abuse.org/dul/')dnl FEATURE(dnsbl,`relays.mail-abuse.org',`Open relay - see http://www.mail-abuse.org/rss/')dnl FEATURE(dnsbl,`input.orbs.org',`Open relay - see http://www.orbs.org/')dnlFor instructions on how to configure other mail transfer agents, including older versions of sendmail, for DNS blacklists, see http://maps.vix.com/rss/how.html.
Normally, sendmail rejects connections from blacklisted servers so quickly that it doesn't even wait for any RCPT TO: commands. It is thus impossible to tell from the sendmail log files which unlucky user was the target of the spam. MAPS therefore suggests that administrators add the line
FEATURE(`delay_checks')dnl
to the .mc file. This changes the order in which the rules in sendmail.cf are applied so that the names of the intended recipients are logged before the sender's IP address and domain are checked. This slightly increases overhead but coaxes useful information out of the spammer's server; it reveals which users of a system are on commercially distributed spamming lists or have had their addresses "harvested."
The most popular filters consist of "recipes" written for procmail, a local mail delivery agent (abbreviated LDA or MDA) which sorts and filters mail based on content. (For an excellent overview of what procmail does and how, see Procmail Minl-Tutorial: Automated Mail Handling by Jim Dennis.) procmail is available as a ported application for all of the BSDs which have port collections, and compiles from source on all of them as well. procmail is licensed under the GNU GPL. (Unfortunately, a non-GPLed equivalent is not currently available.)
Most procmail spam filters do not use procmail alone, but combine it with other programs or subroutine packages. These include perl, formail (included with procmail), mktemp, mimencode (part of metamail), or Mime::Base64 (a CPAN module for perl).
The Spam Bouncer, developed on FreeBSD by Catherine Hampton, is among the most popular Procmail spam fighting kits. It uses Procmail and formail to detect, block, and optionally complain to the host ISP about spam. SpamDunk, by Walter Dnes, is similar, as is Greg Sutter's junkfilter. John Hardin's Procmail Filters Kit is noteworthy in that it sorts its many spam detecting rules into easily recognizable categories. (For example, if an administrator or user does not want to block messages that appear to be unwanted religious diatribes, he or she can disable the "proselytize-trap" filter.) Other Procmail-based spam filters include Bob's Spam Filter, Steve Tucker's Spamkill, and Farhad Anklesaria's Spamtrap.
Administrators who use sophisticated procmail filters to catch spam and/or malware should be warned that they may incur significant overhead. An instance of procmail (as well as instances of other programs, such as the perl "comterpreter") will be spawned for each incoming message. At least one copy of the entire message -- perhaps more! -- will likely be created in memory. It is thus wise to provide plenty of RAM and swap space, limit the number of simultaneous incoming SMTP connections, limit the maximum system load at which sendmail will fork new processes, and/or set a maximum message size in the mail transfer agent. Otherwise, an extremely large message or a barrage of spam could soak up much or all of the server's memory as it passes through the filter. (The author has watched elaborate procmail filters exhaust virtual memory on heavily burdened servers.) To limit the sizes of incoming messages in sendmail, add a line such as
define(`confMAX_MESSAGE_SIZE',2097152)dnlto the .mc file from which sendmail.cf is built. (The example above limits messages to 2 megabytes.) The number of concurrent sendmail processes can be limited by a line such as
define(`confMAX_DAEMON_CHILDREN',12)dnl(This line limits the number of processes sendmail can fork to accept incoming messages or process its message queues to 12.) sendmail refuses to accept connections once it has reached its quota of child processes.
To prevent sendmail from forking processes to accept or deliver mail when the system load average is very high, add commands such as
define(`confREFUSE_LA',8)dnland
define(`confQUEUE_LA',6)dnlto the .mc file. All of the commands mentioned here are documented in the readme file in the /cf directory of the sendmail distribution.
FEATURE(`local_procmail')dnlto the .mc file from which sendmail.cf is built. (All other feature macros beginning with "local_" should be removed to avoid conflicts.) Global filters can then be added to the global procmailrc file (often at /usr/local/etc/procmailrc on BSD systems), and individuals can choose to invoke additional filters via ~/.procmailrc files in their home directories.
On most of the systems which the author administers, malware is filtered globally and unconditionally for safety's sake. sendmail (or, optionally, smtpd) is configured as a first line of defense against spam; it blocks unauthorized relaying, validates "From" addresses, and consults one or more DNS blacklists to identify suspect hosts. However, at all but a few sites, the use of content-based spam filter kits is optional; they are activated for individual users via ~/.procmailrc files.
Because smtpd is far more "lightweight" than sendmail, it reduces the overhead of rejecting mail that fails to satisfy the rules. Also, use of smtpd and smtpfwdd to "wrap" sendmail may have security advantages. These daemons are much smaller and simpler than sendmail, and their code has been thoroughly audited. They also run in a chroot "jail" that confines them to /var/spool/smtpd. Like sendmail, smtpd can "fib" about its identity in the SMTP welcome message. A "skript kiddie" probing for an MTA with a known security hole may be misled by this ruse and not mount an attack on an otherwise vulnerable server.
Perhaps the best reason to use smtpd to wrap sendmail, however, is that the semantics of the smtpd access control rules are much more straightforward than the idiosyncratic ones of sendmail access_db patterns.
smtpd and smtpfwdd come as part of the standard OpenBSD installation but are not enabled by default. They must be installed as ported software or compiled from source on most other operating system distributions.
Needless to say, it wasn't long before spammers realized that this was a great way to harvest addresses. In 1996, the author resumed use of a shell account on The WELL, a public access Unix host and conferencing system. He didn't send mail, post to newsgroups, or browse the Web from the account, yet within a day his mailbox -- which had received mail only occasionally before -- was suddenly awash in spam. His user name had been harvested via the finger daemon.
Harvesting via fingerd was such a serious problem on this particular system that administrators provided users with a way to delete their names from the output of the finger program. But by the time the author learned of this feature it was too late; the address was already on several widely circulated spam CD-Rs. The vacation program now directs those who send mail to that account to a Web e-mail form.
Fortunately, it is relatively easy to secure fingerd, the Berkeley finger daemon, and its derivatives against harvesting. The author often uses the following entry in /etc/inetd.conf:
finger stream tcp nowait nobody /usr/libexec/fingerd fingerd -s -l -p /usr/local/bin/nonetfingerThe -s option prevents the daemon from listing all of the users on the system if no user name is specified. -l enables logging, and -p directs network finger requests to a program called nonetfinger. The source for this almost trivial program appears in Listing 3.
#include <stdio.h> main() { puts("Sorry; for security reasons, and to prevent our users"); puts("from being targeted for unsolicited \"junk\" e-mail, this"); puts("site does not honor network finger requests. We apologize"); puts("for any inconvenience."); } |
Because the -s, -l, and -p options apply to fingerd (the finger daemon) but not the finger program, typing finger or w still works properly from the shell. One can also shut down fingerd altogether, or rewrite nonetfinger.c to provide bogus or useless output to a spammer, but in the author's experience a warning has proven to be the best policy.
Most versions of identd included with BSD-based operating systems can be configured to supply only the user's uid number (a unique identifier that isn't useful as an e-mail address) by placing the following line in /etc/inetd.conf:
ident stream tcp wait root /usr/local/sbin/identd identd -w -o -t120 -F%UThis protects against spamming, but makes it possible for the operator of a server (or a cross-site advertising service) to tell when the same user returns.
Alternatively, many versions of identd can send an encrypted 32-character string which can be decoded to reveal a uid number, IP addresses, port addresses, and a timestamp. The encrypted token allows the administrator of a remote system which has been subjected to abuse to forward the string to the local administrator so that the perpetrator can be identified. The encryption is weak (single DES) and is subject to known plaintext attacks, but it is sufficient to discourage tracking of users. Note that this mechanism offers protection against address harvesting even if the encryption is cracked, because only a uid number (not a user name) is present in the encrypted data. To use this facility, replace -F%U in the line above with -C, and place a passphrase in the first line of the file /etc/identd.key.
OpenBSD's version of identd dispenses with the weak encryption and offers a better solution. It can send a completely opaque token and record the token and the user's identity in the system log. Abusers can be identified so long as logs are retained.
There is also a GPLed identd replacement program called ident2 which can generate random replies or give each user control over what it sends. Since user-defined responses allow a user to hang the blame for mischief on someone else, and random responses prevent the administrator from identifying those who engage in network abuse, neither is a good option.
Perhaps the simplest way Webmasters obscure addresses and mailto: links is via the use of HTML entities. Instead of
mailto:clueless@newbie.comthe Webmaster might code the link as
mailto:clueless@newbie.comSurprisingly, the majority of address harvesting programs do not recognize addresses obscured in this manner. Even if more are created which do understand this ruse, many spammers will use outdated or pirated software. So, it may be worthwhile (and certainly cannot hurt) to process one's pages to add this small bit of obscurity. The task of creating a perl script which automatically obscures colons, "@" signs, and some or all of the other characters of e-mail addresses in an HTML source file is left as a simple exercise for the reader. Unfortunately, it's easy to forget to run pages through such a filter, and the average user who creates his or her own personal or business Web pages may not understand the dangers of placing addresses there. Therefore, it may be desirable to create a filter which automatically performs this transformation on all of the outgoing traffic from a Web server.
Another good way to avoid being placed on many spammers' lists is to obtain an address in a .edu or (especially) .gov domain. Most (though not all) spammers purge their lists of such addresses to avoid running afoul of government entities.
Some address harvesting robots betray themselves via the "HTTP_USER_AGENT" field, allowing you to recognize them and refuse access. Charles Brabec's list of HTTP_USER_AGENT fields returned by known "spambots" explains how to to block access in this way. Apache, running on BSD, makes it easy to do this via the mod_rewrite module.
Greg Subino Mullane's excellent Spambot Beware site describes other methods by which spambots can be recognized, and explains how to "poison the well" by feeding spambots bogus addresses. All of the scripts on his site run unchanged, using perl and Apache, on all of the BSDs.
When all is said and done, however, the most certain way to prevent Web pages from providing addresses is not to put them there -- at least not in a form that a spambot can understand. Some users, for example, have taken to rendering their addresses as bitmaps. This prevents automatic scanning but may make addresses inaccessible to users of text-based browsers, who are frequently blind or visually impaired. (One way to avoid this pitfall is to place text which can be pronounced by a screen reader -- but doesn't look like an e-mail address -- in the image's ALT tag, e.g. "clue less at new bee dot com".) It is also possible to create a mailto: link on the fly via a client-side script (which a spambot cannot execute), or supply the link via a POST method (which a spambot generally cannot use). Mullane covers these and other techniques in the avoidance section of his site.
Finally, some enlightened ISPs now provide Web e-mail input forms for every user. These allow a stranger to make an initial contact; if the message is legitimate, the recipient can respond via e-mail, revealing his or her address.
Unfortunately, e-mail clients with newsgroup reading capabilities, such as Netscape, often supply one's address whether one likes it or not. (In other clients, such as Opera, one can assume multiple identities with different addresses, but this feature requires some technical skill to configure.) Therefore, the best spam avoidance technique for the average user may be to use Web-based services such as Deja.com instead of a standard news server.
Aug 11 211601 myhost sendmail[5119] VAA05119 <mark@myhost.com>... User unknown Aug 11 211601 myhost sendmail[5120] VAA05120 <brian3@myhost.com>... User unknown Aug 11 211602 myhost sendmail[5119] VAA05119 <mark1@myhost.com>... User unknown Aug 11 211602 myhost sendmail[5120] VAA05120 <brian4@myhost.com>... User unknown Aug 11 211606 myhost sendmail[5120] VAA05120 <brian5@myhost.com>... User unknown Aug 11 211607 myhost sendmail[5119] VAA05119 <mark2@myhost.com>... User unknown Aug 11 211607 myhost sendmail[5126] VAA05126 <smith@myhost.com>... User unknown Aug 11 211608 myhost sendmail[5126] VAA05126 <smith1@myhost.com>... User unknown Aug 11 211610 myhost sendmail[5126] VAA05126 <smith2@myhost.com>... User unknown Aug 11 211610 myhost sendmail[5135] VAA05135 <wilson3@myhost.com>... User unknown Aug 11 211611 myhost sendmail[5137] VAA05137 <me@myhost.com>... User unknown Aug 11 211613 myhost sendmail[5119] VAA05119 <mark3@myhost.com>... User unknown |
Other attacks have been observed in which larger numbers of addresses were tested per connection. Often, the RCPT TO: commands sent in these attacks were "pipelined;" that is, they were sent before the server had responded to the previous command. This consumed buffer space and rendered the server's attempts to pace requests by delaying responses less effective.
Because it acts as if it is actually about to send a message and is naming the recipients, this type of address harvesting "bot" is able to extract addresses even from mail servers whose administrators have wisely disabled the VRFY (Verify Address) command. Worse still, its modus operandi can bring mail servers to a screeching halt. The overhead of checking many addresses can choke sendmail, which runs addresses through many rules to verify them. As mentioned above, at least one of these "bots" never bothered to send a message or close the connection to the server properly; it simply left the server hanging until it timed out. Because mail servers traditionally have very generous timeouts (to accommodate slow modem connections and network congestion), and because most of them limit the number of incoming connections they'll accept at one time, mail servers attacked by this program simply stopped working.
Recent versions of sendmail automatically slow their responses after a certain number of bad commands have been sent. However, an analysis of the code of sendmail 8.11.0 shows that RCPT TO: commands specifying invalid recipient addresses are not counted as "bad" -- nor are they counted toward the quota of recipients per message! Also, slowing responses does not help if the attacker is patient. In fact, it may prolong the attack, increasing the amount of time for which the server is at its maximum number of connections. Thus, the current safeguards in sendmail do not defend well against this particular attack. (Most other MTAs are also susceptible.)
One way to prevent an attacker from consuming the MTA's quota of incoming connections, or slowing the server with many quick connections, is to use inetd (or a similar program, such as juniperd) to limit the number of connections the server will accept from a single IP address in one minute. (sendmail can limit the number of connections it accepts per minute, but not by IP address.) However, since sendmail is designed to run best as a stand-alone daemon, a better solution would be to modify sendmail itself so that it takes IP addresses into account. sendmail and other MTAs should limit the number of connections per minute and the number of simultaneous connections from any one address. It may also be desirable to limit the number of invalid recipients per message, and/or to count bad recipients toward the maximum number of bad commands and the maximum number of recipients.
Finally, because most robots of the type described above send several (sometimes many!) RCPT TO: commands without waiting for the SMTP response after each one, it may be desirable for the server to insist that the conversation be synchronous. RFC 821 allows the server to do this:
The communication between the sender and receiver is intended to be an alternating dialogue, controlled by the sender. As such, the sender issues a command and the receiver responds with a reply. The sender must wait for this response before sending further commands.Thus, a server that has not specifically authorized SMTP pipelining (RFC 1854) is "within its rights" to cut off any connection where another command is received before a response to the previous one has been sent. An option which turns off pipelining, and cuts off clients which attempt to pipeline commands anyway, should be considered for future releases of sendmail and other MTAs. (Postfix is the only MTA that, to the author's knowledge, already offers this feature.)
If the MTA does not provide internal protection against name guessing attacks, it is possible to add it, either via patches or via a daemon which monitors the system's mail logs. This daemon would note when a particular host was generating many error messages, and could block it via a firewall rule, via a "blackhole" route in the system's routing table, or via an access control mechanism such as the smtpd rule file or the sendmail access_db feature. It is also possible to cause sendmail to limit the number or rate of connections per IP address via a "Milter" filter. However, this would incur much more overhead than inserting the feature directly into the program.
One controversial technique used by an increasing number of ISPs is to block connections destined for hosts outside the local network on IP port 25 (SMTP), especially when they originate from dial-up ports or from cable or DSL modems without fixed IP addresses. This is roughly equivalent to the function of the MAPS DUL blacklist, but since it operates at the source it protects those who do not subscribe to the blacklist as well. If this strategy is implemented, provisions should be made to make exceptions for users who have a legitimate need for outgoing SMTP and can be verified not to be engaged in spamming.
A more subtle technique which avoids complaints caused by outright blocking of outgoing SMTP is to redirect outgoing mail through a transparent proxy. This can be done in a router or by the IP Filter firewall software, which is available for the three cooperatively developed BSDs as well as several other BSD-derived OSes. A redirect rule for ipnat (part of the IP Filter package) which looks like
rdr ed0 0.0.0.0/0 port smtp -> 127.0.0.1 port smtp(where ed0 is the internal interface on the gateway router) redirects all outbound SMTP connections to the mail server on the router regardless of the mail's final destination. If the server is correctly configured, it will relay and log all outgoing mail without becoming an "open" relay. For the sake of efficiency, trusted hosts can be allowed to bypass the proxy.
Once this mechanism is in place, either a sendmail "Milter" filter or a log monitor can be used to detect and stop abuse. A log monitor may visit the sendmail log file periodically and look at the last few entries; it may also be designed to accept piped output from syslogd, the Unix system log daemon. If there is evidence of abuse, the monitor can alert the system administrator via e-mail or pager and/or lock out the offending party. The author has found log monitors to be highly effective in his own work.
Kai's SpamShield (which was developed for BSD/OS and runs on all of the BSDs) is a good example. A simple log monitor written in Perl, it periodically examines the most recent entries in the sendmail log file and notes the number of recipients to which a host or local user is sending mail. If it sees an excessive amount of traffic in a short period of time, it can notify the administrator and/or quarantine an offending host by creating a "blackhole" route on the mail server. In some network and server configurations, it may be more efficient (or even necessary) to modify the script to block connections via other means, including the mail server's access control mechanism and/or firewall rules. 2swatch, a more general log monitoring program, may also be adapted for this purpose.
These programs often exploit "social engineering" techniques to persuade users to activate them. The ILOVEYOU worm, for example, makes use of a hidden extension exploit to make it appear that an executable attachment is an innocuous text file. ExploreZip -- perhaps the most cunning example of a "social engineering" Trojan worm to date -- is even more subtle. It operates as an e-mail autoresponder, replying immediately to incoming mail with a message that reads:
Attached to the message is an executable file which appears at first glance to be a self-extracting archive file. It is actually a copy of the worm.I received your email and I shall send you a reply ASAP. Till then, take a look at the attached zipped docs.
Because the Subject: header of the automatic response matches that of a previously sent incoming message, and the From: address is familiar, the correspondent believes that the automatically generated message is part of an ongoing conversation and trustingly runs the attachment. Unfortunately, this particular worm carries a nasty payload: it destroys files not only on the victim's hard disk but on any shared drives or directories to which he or she has access.
Malware which taps users' e-mail address books for the addresses of likely victims, or otherwise attempts to exploit existing relationships between correspondents, is sometimes called a "Friends and Family virus," after MCI's famous promotional program for its long distance service.
A hostile script embedded in e-mail may "take control" of the recipient's machine by opening an advertising or pornographic Web page in the user's browser. It can then prevent him or her from closing the window or shifting the focus. A malicious script can freeze the browser or the entire machine. (Many of these exploits are cross-platform due to the cross-platform nature of HTML, JavaScript, and Java.) A message with intentional formatting errors may crash some vulnerable e-mail clients or even some MTAs. For example, it was recently reported that some versions of Microsoft Exchange will halt, refusing to revive until queue files are manually deleted, if they encounter a null MIME boundary string.
The most common way of detecting such exploits is via procmail filter kits designed for this purpose. John Hardin's Procmail Sanitizer, perhaps the best of these, disables active content, "mangles" file extensions, optionally disables image tags, and can quarantine messages which are likely to contain malicious code. It can also scan Microsoft Office documents for macros and score them according to their potential virulence. Bjarni R. Einarsson's Anomy is similar and credits Hardin's work as inspiration.
procmail malware filter kits are installed in the same way as procmail filters for spam. (See the sections titled Open Source Spam Filter Kits and Using Procmail as the Local Mail Delivery Agent (LDA or MDA) above.) Since procmail is a mail delivery agent, procmail filters normally process only mail which is bound for local users on the system where they run. However, with special changes to rules in sendmail.cf, they can work with sendmail to filter all the mail that passes through a server.
Ideally, filtering on the mail server should be heuristic (that is, it should be able to recognize new as well as existing malware) and should focus on catching malware for which e-mail is an important vector. The best approach, in the author's experience, is a combination of rule-based checking on the server and a regularly updated commercial virus checker on each client. While the repertoires of the server and client software can and should overlap, both are necessary for good security.
The elimination of spam is more art than science. So long as people allow themselves to be contacted by strangers (something which is not always undesirable and is sometimes quite delightful), opportunities will exist for spammers to send them unsolicited junk mail. However, the hijacking of computer systems and networks to send mass quantities of spam should be eliminated, as should spam which is an unwarranted intrusion or which is fraudulent. The spam fighting tools and techniques mentioned here rank among the best that have been developed to date.
Heuristic e-mail filters have proven to be remarkably effective at checking the spread of new malware. The filters on the author's servers have caught every copy of Melissa, ExploreZip, ILOVEYOU (all variants), Happy99, PrettyPark, and similar malicious programs sent to or through them, protecting users from massive damage and saving untold hours of cleanup work. Equally impressive has been the total absence of false positives in two years of operation. Nonetheless, because malware also spreads via means other than e-mail, the use of filters on the server does not and cannot eliminate the need for malware-eliminating tools on client machines. Since there is sometimes a delay of a week or more between the start of an outbreak and the time when new pattern sets for the "brand name" virus checkers are ready, the two can work in concert, with the server "holding the fort" until patterns are ready for the client machines. Ultimately, only a check at the client can deal with all malicious software regardless of the vector via which it arrived.