Spam Blocking: Problem and Solutions

Summary:

Because of deficient e-mail handling practices, a growing number of mail providers and anti-spam organizations have designated the DNC as a spammer. As a result, fewer and fewer of our mailings reach their intended recipients. In order to resolve this problem, we must bring our email usage into compliance with modern best practices for responsible mailers.

What is Spam?

According to Wikipedia [http://en.wikipedia.org/wiki/Spam_(electronic)], spamming is is the abuse of electronic messaging systems to send unsolicited, bulk messages. Any such use is automatically considered abuse. Unsoluticited Bulk Email (UBE) can be formally defined as: a substantially identical e-mail message sent to two or more recipients who did not explicitly ask to receive such mailings.

The difference between spam and legitimate email can be effectively demonstrated with some examples:

I type a message and send it to my friend. NOT SPAM.
I type a message and send it to someone I don't know. NOT SPAM: I only sent one message.
I type a message and send it to three friends. NOT SPAM. My friends expect to receive email from me.
I set up a mailing list about fish and allow folks on the Internet to subscribe. I type a message and send it to the 10,000 folks who subscribed. NOT SPAM. The subscribers specifically asked to receive such mailings and can remove themselves from the list any time they want to.
I send an email to the 12-year old girl next door telling her where to buy adult toys and pornography. NOT SPAM. I'll probably be arrested but the content has nothing to do with whether or not the message is spam.
I read an interesting news article, so I send my friend an email with a the URL of the article and tell them they should go read it. NOT SPAM. I only sent it to one person and the message is from me, not from the news site.
I type a party invitation and send it to everyone in the company I work for. GRAY AREA. Only the folks who know me reasonably expect to receive such messages but I do work for the same company. In general this practice should be avoided.
I copy and paste an interesting news article and email it to two dozen friends and family. GRAY AREA. These folks expect personal communication from me. Many will consider such blast forwarding to be rude.
I program a web site to send party invitations to my friends. GRAY AREA. I am not permitted to authorize a third party to e-mail someone other than myself however the message is reasonably personal in nature and I am only mailing people who know me.
I program a web site to send party invitations to my 300 closest friends including a number of celebrities who I only wish I knew. SPAM. The web site sent identical messages to folks who don't want or expect emails from me. They and I have spammed.
I type a message and send it to 50 celebrities because I want attention. SPAM.
I read an intresting news article on CNN and instruct their web site to send a copy to a dozen friends and family. SPAM. I may not authorize a third party to e-mail someone other than myself.
I read an intresting news article on CNN and instruct their web site to send a copy to one friend I think might be interested. SPAM. I may not authorize a third party to e-mail someone other than myself and presumably I'm not the only one who told CNN to send that article to someone other than myself.
I set up a mailing list about fish and allow folks on the Internet to subscribe. Some jerk subscribes 50 people he knows. I type a message and send it to the 10,000 folks who allegedly subscribed. SPAM. As a mailing list operator, it is my responsibility to take reasonable precautions to prevent third-parties from subscribing folks against their will. I failed that responsibility.
I send an email about fish to 10,000 addresses I picked up from web sites and other mailing lists. SPAM. The classic example.

The rule of thumb is this: If I send (or am a party to sending) an email to two folks who neither know me nor asked me to send such a message then I'm spamming. If I send any other kind of mail then I'm not.

Best Practices

In the decade that email spam has been around, a series of best practices have been developed that help legitimate bulk mailers steer clear of the warfare between the spammers and anti-spammers. By updating our systems to follow these best practices, we can avoid running afoul of the many and varied spam filters.

The antispam groups have spent the last 24 months debating, "the growing problem of political spam." Worse, less than 25% of the antispam systems out there reveal their existance. Most silently discard offending messages or shuffle them off to never-reviewed spam folders. For example, the Mailwise system the DNC subscribes to silently stores offending messages in a folder that is invisible to the recipient unless he establishes a login at mailwise.com. Without careful attention to this problem, we can expect trouble.

We're in the cross-hairs this year. Best to tread lightly.

Best Practice: Opt-In

Only recipients who explicitly authorize us to send bulk mailings from us should receive email from us. Under no circumstances may we send email to an address provided by a third party where the recipient can opt-out of further mailings. Such "opt out" lists are a well known and thoroughly reviled spammer technique which will get our email blocked.

In some cases we will want to invite members of another mailing list (e.g. the DCCC's list or the DSCC's list) to join our list as well. The legitimate operator of those lists may send such a mailing, however we may not. Likewise, we can send a mailing to our opt-in list asking our subscribers to join the DSCC and DCCC mailing lists but we may not provide those lists to the DSCC or DCCC.

Best Practice: Subscription

Responsible mailing lists use a two-step confirmation model to validate the subscription of a particular email address. First, the request is received by the list. This can be an email message sent to subscribe@mylistserve.com or an email address entered in to a web page. Next, a brief message is sent to that email address asking them to confirm the subscription by either responding with a particular code or clicking on a link. Once the code or link is received by the list, the email address is subscribed and will receive all mailings delivered to the list.

The subscription code or link is randomly generated and stored along with the email address on the mailing list server. When the server gets that code back, it uses it to look up the address. It should not be possible to forge subscription codes.

The subscription request messages should be rate-limited so that it is not possible for some jerk to cause our computers to mailbomb someone he doesn't like. In particular, we should originate only one subscription confirmation request to any given email address in a five-minute period and we should originate no more than 20 such in a calendar month. Our software should enforce this limit.

The subscription confirmation message should include the headers of the original mail message or the timestamp and originating IP address of the web request so that the recipient can pursue the offender if a forged request is made. Doing so also shows anti-spam maintainers a good faith effort to be a responsible mailer. That is, they're less likely to block us on receiving a complaint whose attached message shows the real source.

Some folks will use a single-step opt in. Such behavior is discouraged, but if such an opt-in is used then the mailing list should do two more things:

Once someone has unsubscribed, do not allow resubscription without a two-step confirmation.
Initiate a second-step confirmation if after a while there is no indication the recipient reads the messages. This allows spamtraps to eventually be cleansed from the list.

Best Practice: Unsubscription

Every message delivered via the mailing list must include instructions on how to unsubscribe.

It should be possible to unsubscribe by sending a message with a particular code in the subject to an unsubscribe mailbox. This is less critical than it once was since virtually everybody with email now also has web access.

The code provided with the unsubscribe message or URL should provide the mailing list server with all of the information it needs to authenticate the recipient to whom the message was sent. That is, the person unsubscribing should not have to type their address or even know the address under which they were subscribed.

It should not be possible to unsubscribe someone who does not have access to a valid unsubscription code from one of the emails. False unsubscriptions are every bit as abusive as false subscriptions.

Best Practice: Bounces

Mailing lists must effectively manage messages which were undeliverable by removing subscribers whose email address no longer works. If not managed, mail delivery for legitimate recipients will be delayed and mail admins upset about the excessive failed deliveries may block the server. This is made difficult by two problems:

There was no standard format for bounce messages. There is one now but only about 75% of the mail systems use it.
Just because a message was undeliverable does not mean the subscriber is no longer there or wants to be unsubscribed. Messages can be undeliverable due to short term failure, full mailboxes and administrative errors.

The first part of the problem can be handled by by using a variable envelope return path (VERP) encoded envelope sender. The destination address to which the bounce message goes then uniquely identifies the recipient to whom the original letter was sent.

The second part is more difficult. There are a number of solutions, but the easiest to implement is: on receiving a bounce message, suspend the subscriber. Then, send one confirmation request per day for a week to the recipient until the recipient confirms that he still wants to receive the email. If no confirmation is received, unsubscribe the recipient after one week. If the confirmation is received, ignore any further bounces for one month.

Best Practice: Feedback Loops

Groups like Hotmail and AOL offer feedback loops which provide copies of the mailing list messages sent that the subscribers flagged as "spam." Messages so flagged should be treated as unsubscribes.

Best Practice: Images & HTML

Many email clients refuse to show images located on an external web server. Make sure the message looks reasonable without those images or else embed the images in the mailing itself.

Best Practice: Subscriber Pedigree

This information forms the start of a pedigree which shows when and how an individual subscribed to the mailing list. It should be retained by the mailing list server. Later on, the server should record information about what mailings were opened, what links were clicked and whether there were any bounces. Ideally the subscriber should be able to request a copy of the data at any time so that if there is any question about whether the subscription was opt-in it can be readily addressed.

Best Practice: Sender Policy Framework

SPF provides a mechanism by which bulk mailers can specify which servers their email comes from. This makes it more difficult to forge a return address since forged address will fail the SPF check. Anti-spam systems tend to give these messages a slight advantage since SPF eliminates dialup spam.

Simultaneous Connections

Small email systems can rarely accomodate more than a few simultaneous delivery connections. Parallel connections should generally be limited to 2 except for hosts like hotmail and aol which have a demonstrated ability to handle more.

The DNC's problems and potential solutions

Blocked by spam filters

Hotmail has blocked us twice, as have a number of other spam filters. There are two main solutions to this problem.

Solve the problems described in the rest of the document. Then politely ask the filtering folks not to block us since we comply with email best practices.
Whackamole. We can threaten/cajole/browbeat the various ISPs into allowing us to mail their subscribers. Most of the time this won't work. Where it does it will consume hours for each ISP of which there could easily be more than 10,000. Where we do convince the ISP to unblock us, the overruled mail administrator will invariably complain on news.admin.net-abuse.email inducing others to add us to their manual block-lists.

Perhaps I should have said there is ONE solution to this problem.

The DNC generates three types of automated mailings:

Confirmations generated from the web site
Bulk mailings to one of the mailing lists
Mail your friends mailings, e.g. after signing a petition

Each has problems that should be addressed.

Confirmations generated from the web site

Problem: The confirmations email all originates from apache@webX.in.dnc.org. These addresses are not valid externally and would be blocked by spam filters were they not deliberately mangled by mail1 to read apache@democrats.org.

Solution: All scripts generating email from the site should be corrected to use a valid VERP-encoded envelope sender.

Problem: Bounces from the confirmation emails are not processed.

Solution: An appropriate bounce catcher on an email server should catch any bounces and let the original scripts know so that they can properly deal with the failure.

Problem: Some mailers corrupt the return address in the bounce message by converting the address to lower case, replacing O's with Zeros, etc.

Solution: Checksum the data encoded into the VERP and verify the checksum before taking any action. Use only lower case letters and digits and avoid numbers and letters that look alike (0o1li).

Problem: Mailing list subscriptions are single-opt in.

Solution: Make them two-step opt-in. If this is not possible, make sure that some other mechanism is in place to confirm that there is a live person receiving the message and fall back on a two-step opt in if no evidence surfaces that the recipient ever received the message.

Problem: Some domains absorb email to any address in a single email box. This allows an abuser to subscribe many different addresses at the domain effectively mail-bombing their target with DNC subscriptions. This wastes our resources and gets us spam-listed.

Solution: Permit entire-domain unsubscriptions/do not mail listings by verifying control of postmaster@, webmaster@ and root@.

Problem: Mail generated from the web site does not adequately identify its origin and offers no way for the recipient to request that information if the message is forged.

Solution: Provide recipients with the IP address and port from which the web request that generated the email came from along with the timestamp at which it connected. This will allow the victims of abuse of the DNC's systems to take action against the perpetrators.

Problem: Outbound messsages from the web servers share the same server (mail1) as the incoming bounce messages. Normal slowness in the bounce processing negatively impacts the outbound messages.

Solution: Install a separate mail server to handle traffic unrelated to bounce processing.

Problem: Unconfirmed email addresses are more likely to cause antispam systems to block our IP address than confirmed addresses.

Solution: Split the mailing operation so that mail associated with destination addresses which have not been confirmed go through different mail servers than addresses which have been confirmed.

Problem: Confirmation emails share a mail server with emails associated with potentially spammy recipients.

Solution: Split the mailing operation so that mail associated subscription confirmations comes from a different IP address than the other types of messages.

Problem: The bulk mailing logs provide only mediocre information about messages sent, delivered, opened, etc.

Solution: Create a detailed file-based log sorted by subscriber for each mailing. The log should contain one line for each action, such as: mailed to, bounced from, opened, clicked a link, etc.

Bulk mailings

Problem: The bulk mailer (composer) has experienced unexpected mason crashes.

Solution1: Identify the source of the crash and fix it. This is a shot in the dark until/unless the problem can be replicated in a controlled environment.

Solution2: Replace mason as the message generator. Build the message once and then use simple string substitution to instantiate the letters to individual subscribers.

Problem: Bounces from the bulk mailings are caught but not processed.

Solution: Build a reaper script which reads the bounced messages and manages them by suspending the recipient address, attempting a confirmation for 7 days and then unsubscribing the recipient.

Problem: Some mailers corrupt the return address in the bounce message by converting the address to lower case, replacing O's with Zeros, etc.

Solution: Checksum the data encoded into the VERP and verify the checksum before taking any action. Use only lower case letters and digits and avoid numbers and letters that look alike (0o1li). This is implemented in a testing version of the bulk mailer on betacomposer, but it has not been implemented in the reporting system.

Problem: AOL spam reports are not processed.

Solution: process AOL spam reports as unsubscribes. Update AOL with our new IP addresses.

Problem: Hotmail offers spam reports but we aren't subscribed.

Solution: Subscribe to hotmail spam reports. Process them as unsubscribes.

Problem: Unconfirmed email addresses are more likely to cause antispam systems to block our IP address than confirmed addresses.

Solution: Split the mailing operation so that mail associated with destination addresses which have not been confirmed go through different mail servers than addresses which have been confirmed.

Problem: Unsubscription requests require the recipient to know the address under which he is subscribed, but this information is not necessarily available.

Solution: Update the unsubscribe request to use the VERP encoded data in the URL to determine the unsubscription address.

Problem: Anyone can unsubscribe anyone else, an easy and effective form of harassment against well-known democrats.

Solution: Anyone requesting an unsubscription by entering their IP address should be forced to click a link in a confirmation message in order to unsubscribe.

Problem: Mailing lists are cut in advance of the actual mailings. As a result, mail can be sent to folks who have already unsubscribed.

Solution: Composer should maintain a do-not-mail list based in part on information fed from the list cutter and the unsubscribe web page. Addresses found on the do not mail list should not receive emails during the delivery phase.

Problem: Mason is inefficient and does not scale. This makes the mailings slow.

Solution: Generate a message through mason once at the start of a mail run. From there on, instantiate each message with string substitution. Use a simple C program to handle this loop. Sort the recipients by server and try to deliver to each server directly first. On failure, fall back to delivering the messages to the Ironport for further delivery attempts. Make the C program self contained so that the data can be rsynced to other delivery machines for parallel operations.

Problem: We don't know when folks are blocking our mailings.

Solution: We should monitor for situations where all mails from a large domain bounce or where we fail to see opens from any mails associated with a large domain. We should also put a form on the two-step opt-in page that allows someone to report that they did not receive the subscription confirmation so that we can investigate the problem.

Problem: Josh has contributed some composer code from the Kerry campaign which may be use.

Solution: The code should be evaluated and useful components integrated into betacomposer.

Mail Your Friends

Problem: The email originates with an envelope sender of apache@webX.in.dnc.org. These addresses are not valid externally and would be blocked by spam filters were they not deliberately mangled by mail1 to read apache@democrats.org.

Solution: All scripts generating email from the site should be corrected to use a valid VERP-encoded envelope sender.

Problem: Bounces from the emails are not processed.

Solution: An appropriate bounce catcher on an email server should catch any bounces and let the original scripts know so that they can properly deal with the failure.

Problem: Some mailers corrupt the return address in the bounce message by converting the address to lower case, replacing O's with Zeros, etc.

Solution: Checksum the data encoded into the VERP and verify the checksum before taking any action. Use only lower case letters and digits and avoid numbers and letters that look alike (0o1li).

Problem: As presently implemented, messages from these tools are unsolicited bulk email (spam).

Solution: Validate the return address before allowing anyone to send invites. Make it very clear IN BOLD TYPE that invites should only be sent to friends and acquaintances, not to mailing lists or folks the sender does not personally know. Do not allow the sender to send the unaltered stock text.

Problem: From addresses for Mail Your Friends and Letter to the Editor are not validated and can be trivially faked.

Solution: Only allow mailing list subscribers who present a valid code attached to their email address to send such messages. Do not allow them to alter the from address. Go ahead and let folks sign petitions with any address they enter but don't offer to forward it to their friends unless its a validated email address.

Problem: Although the web site allows Mail Your Friends and Letter to the Editor letters to be altered from the stock text, few folks do. As a result, the identical messages get picked up by various content filters eventually resulting in our entire mail server being blocked.

Solution: Do not prefill a subject line. Require the user to think of one. Also require the user to add a few words at the front before mailing.