Spam Wars in the Blogoverse

spam commentsSpam, hacks, phishes, scams – come with a webmaster as he battles alien forces of Evil that threaten to sap your online experience and impurify all of your precious bodily fluids!

When he said something like that, General Jack D. Ripper was ranting at the (to him) very visible international Communist conspiracy. However, today everyone who wants to be visible on the Internet must battle invisible forces. Those forces cannot be nuked into submission because we don’t know where to drop the bomb. They could be armies of Russian nerds or Chinese geeks. But they could also be that snotty kid that lives in your neighborhood.

Today’s blog relates my hand-to-hand combat with the Satanic Empire as I strive to keep this and my other websites open and free from pollution.

Spam as Poop

The digital age brought us instant communication and access to data. It also has forced us to view every e-mail suspiciously and to guard our identities.

I will modestly state that I am not a deep-dyed expert in the web world. Being a geezer, I would be excused if I knew practically nothing about the Internet. But I am also a writer – I want to inform and entertain, and I think I sometimes have something worth saying. That’s why I started artchester.net, which in turn threw me into the battlezone of website defense.

As I have mentioned before, owning a website is akin to owning a pet. If the pet is going to be healthy and a good companion, it needs to be fed and exercised. You also need to clean up its poop.

I am currently a poopmaster for three websites: this site artchester.net, our Maui condo website maui114.net, and the Honokeana Cove vacation rental site honokeana.net. None of these is supported by an Information Technology department with dozens of brilliant programmers. The IT department for the first two consists of Art Chester, and on honokeana.net I share that honor with fellow condo owner Phil Wolken.

Different websites attract different problems, as I will explain using these websites as examples.

Spam Offers to Domain Holders

When you register an internet domain, you immediately start receiving new kinds of e-mail spam. This happens because you are required to provide your e-mail and address to ICANN, the International Corporation for Assigned Names and Numbers, which is like putting your name on the Send-Me-Spam-Forever e-mail list.

Most of the e-mail spam you and I receive has nothing to do with being a domain owner. The ads for diet pills and sexual enhancers are harmless chaff so long as we don’t click any links. More of a risk are spoofs trying to fool us into downloading a virus, revealing bank log-in information, or claiming a share of an imaginary Nigerian bank account.

Domain ownership, however, attracts several additional types of spam e-mail:
– Official-looking invoices for domain hosting or services such as SEO (search engine optimization) that I never signed up for;
– Illiterately composed offers to improve my website, coming from people with no Internet storefront or verifiable address;
– Requests to add links to someone’s website to my blog – something that I’m willing to do when the link is relevant, but the requests I get are usually far afield.

Once in a while there’s a message that looks as if a human being actually thought about it and composed it, but most of the time it’s an obvious form letter, not even addressed to me by name.

It’s possible to pay an extra fee to your domain hosting company to have “private registration,” in which your personal information is not published. However, that will not keep you from receiving this kind of spam mail addressed to webmaster@artchester.net or info@artchester.net. If receiving mail from strangers seems too risky, perhaps you’re not the person to have a website. Or an e-mail address?

Hackers Trying to Seize Your Website

OK, so now you own a domain. Once you start using it, by posting a page on it, you have a website. And when you have a website, hackers want to sign on to your account. They may try to sign in at your domain hosting company, or sign in as if they are the webmaster for your website.

Hackers have many reasons to impersonate you in this way:
– If you were lucky enough to get a highly-desired domain name (such as mauiguest.com), people want to take ownership of your domain so they can sell it at one of the online auctions;
– They want to change the sign-on credentials so you can’t get into your own website, then charge you ransom (“security assistance fee”) to get it back again;
– They want to add pages and posts, junk of their choosing;
– They want to have fun at your expense;
– They want to place political or offensive ads on your home page.

Here are three lines of defense against website hackers:
– Change the Administrator name from the default value (often “admin”) to something else; that something else should not be your name or e-mail address or anything easy to guess, or hackers will use that and then start trying to guess your password. I changed my sign-in name at the beginning yet 254 people still tried to sign in as “admin” or “ArtChester” until I put a blanket block on all unregistered user names.
– Change the password to a “strong” password; it should not include any word found in the dictionary, and should be different from any other password you use.
– If you are using WordPress, use comprehensive security plug-ins; among other things, they can block repeated attempts to guess your password and can hide your sign-in page by renaming it.

Spam Contact Messages

Most websites provide a way to contact the site owner such as a comment or contact form. If you publish a simple comment form, it’s like honey to ants: it will attract spam messages that will flood your In Box.

Fortunately, spam contact messages can be almost totally blocked by adding a CAPTCHA field to the contact form. There are plug-in apps to perform this task.

To Comment or Not to Comment? That is the Question…

There’s one class of spam that does not affect every website: spam comments. If your posts are set up to allow comments, they will also attract spam comments.

Some websites don’t need a commenting function. The website maui114.net describes Honokeana Cove and our condo, and also includes feature stories about Hawaiian culture and life. The posts don’t change much from month to month so they do not have comments enabled. If people want to comment, they can use the contact form to reach us. This is a relatively small website: although it contains many photos, there are only 31 total pages.

Honokeana.net is a transitional type of website from the standpoint of comments. At present, it contains 97 web pages. About half of the pages have significant content; the other half serve as frames for videos that show each individual condo.

Phil and I set up honokeana.net so that most of its content would not change too often, to reduce the demands on our rental staff. However, a website that never changes is b‑o‑r‑i‑n‑g. So we added three continually changing features:
– Phil installed a web cam at Honokeana Cove that shows live images of people having fun in paradise. Making the cam work and broadcast for an affordable price is a whole lot more trouble than you might think. I give him full credit for an excellent accomplishment!
– A menu item that links to Honokeana Cove’s Facebook page.
– TripAdvisor Certificates of Excellence on the Honokeana Cove website that link to current TripAdvisor reviews of the Cove.

Phil and I would like to go one step better: Su writes a monthly newsletter for our condo owners, and at least half of its content would be interesting to our vacation guests too. We have discussed adding a page to the website with all the manager’s newsletters; since the letters contain up-to-date news, readers will want to add comments. And if we permit comments, we will have to be prepared to defend against spam comments.

Compared with these two examples, artchester.net is a comment-rich website. It currently contains 1,846 web pages with 226 posts and 400 comments. Thus it is a juicy target for spam commenters.

Why Do Spam Comments Exist?

It may not be obvious why someone would want to post spam as a comment on someone’s website except to annoy. But mischief is not the usual motivation: it’s money.

When you enter a query in Google and other search engines, most searches return a huge number of responses: for example, a Google search on striped cat yielded 38,300,000 results. If you’re selling tabbies, there’s a lot of competition out there! You can probably sell a lot more cats if your website appears on the first page of Google search results, preferably high on the page.

Search engines sort their results in hopes of making them as useful as possible to the searcher. If one of the websites on the list is linked to by many other sites, the search algorithm may give it a higher ranking. So if you are selling a product or service, you would like to have many people post links to your website. In fact, even if you don’t sell anything, you might have ads on your website that earn you income when they are clicked; or you may be pumping up your website’s visibility so that you can sell it for more money.

Where there is a need, there is someone willing to satisfy that need, for a fee. You can pay people for search engine optimization in order to raise your website in the search rankings. Those people may use either legitimate or deceptive means to accomplish that for you. One of the “black hat” techniques is to insert a web address within comments on a huge number of different websites. Those spam comments are generated not by humans, even minimum wage humans, but by computer programs – “bots.”

Valid human-written comments may or may not contain internet links; however, every spam comment that I have seen contains at least one link.

Spam Commenters

Fortunately, there are ways to fend off the river of spam trying to flow into your webpages. In addition to attracting 400 legitimate comments, my 226 blog posts have so far received 632 spam comments. All but one were intercepted as spam by my WordPress plug-ins, which include Akismet and Conditional CAPTCHA. I have all comments held for moderation unless the author has a previously approved comment; the one spam comment that slipped through was held for my review so I was able to delete it before it posted.

I was receiving several times this many spam comments until I modified my website so that comments are disabled one month after I publish an article. Readers who want to comment after that date have to send me a note through the contact form, an extra step that bots are not programmed to handle.

I thought you might be interested to know the source and content of these comments so I analyzed as many of them as I could capture: 396 examples. Here is what I found:

Popular Subjects. Some of my blogs attracted much more spam than others. Eleven blogs had ten or more spam comments. The top winners of this dubious popularity contest and their numbers of spam comments are:
Crazy for Nuts (29)
Fish Invaders 1 (29)
Snorkel Skills (27)
Metabolic Rate (26)
Cave Painting Women (26)

Why these blogs and not others? I’ve looked at the tags and keywords and in most cases I can’t see why these particular blogs got so many comments. Perhaps they were randomly selected by the spambots. However, in the case of the Metabolic Rate blog the comments appear to be targeted. This spam, received just in the last two weeks, comes largely from usernames containing words (garcinia, bodybuilding, testosterone, weight) that are related to some of the tags (diet, exercise, metabolism, weight) that I assigned to that article. So the bots are exercising at least some intelligence.

Countries. So far as I know, it’s not possible to determine where the spammers live and operate. They are probably using other people’s computers to send their spam. What we do know is the IP address of the spamming computer and we can easily find out what country that address is registered to.

Over the last few years I thought I noticed a shift from foreign IPs to domestic ones. To see whether my impression was borne out by facts, I divided the spam into two approximately equal batches, before and after year-end 2014. Here is what I discovered:

spam comments

This is my guess: so much spam was originating from China (and surprisingly, France) that website owners installed plug-ins to filter out comments originating from overseas, especially China. This motivated the spammers to get busy and hack a lot of computers in the U.S. so that their comments would not get blocked by the country filters. My spam includes as large a variety of hacked countries as ever, but looking at country of origin is now useless for blocking spam comments.

Repeated IP Addresses. Some of the security plug-ins have the ability to automatically block IP addresses that repeatedly submit spam comments to a website. However, IP addresses are rarely repeated, as the following data will show.

I sorted spam comments by IP address and found the following:
– Total spam comments surveyed: 396
– Number from unique (never repeated) IP addresses: 329 (83%)
– Number from repeated IP addresses, only in the month of December 2013: 51 (13%)
– All other spam comments from repeated IP addresses: 16 (4%)

Obviously, blocking repeated IP addresses is not an efficient way to filter out spam. The big bulge of repeated addresses in December 13 all came from China, except for two from Ukraine. It’s as if a spammer using Chinese computers ran an experiment to see whether websites would automatically block spam comments from repeated addresses. And presumably the spammer gave up on the idea, deciding that unique addresses would have a better chance of getting through the spam filters.

Something else is also going on: if you check the IP address of your computer connection at home, it’s likely to be different each time you wake up your computer. The reason is that your internet service provider (ISP) has a range of assigned addresses, and assigns them randomly to their subscribers as needed. This dynamic addressing allows them to provide service to many more customers than their assigned range of addresses, because only a fraction of their retail customers will be using an internet connection at any given time. I suspect that many hacked computers are being used to send spam over and over, but because their ISPs use dynamic addressing, the IP addresses change frequently. Thus it only appears that spam rarely originates from the same IP computer.

Other Repeated Features. Is anything else repeated in the spam? Not very often. Most user names are used only a few times, then never again. Similarly, the web addresses that the spammers are trying to insert rarely repeat. And spam arrives at all hours of the day and night.

However, the spirit of the spam comments repeats again and again, using different words. Consider four typical comments, which were submitted on my Metabolic Rate blog:

Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to more added agreeable from you! However, how could we communicate?

Hmm it looks like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I submitted and say, I’m thoroughly enjoying your blog. I as well am an aspiring blog blogger but I’m still new to the whole thing. Do you have any tips for rookie blog writers? I’d genuinely appreciate it.

Wow! After all I got a web site from where I be able to in fact obtain valuable facts concerning my study and knowledge.

Great beat ! I wish to apprentice even as you amend your website, how could I subscribe for a weblog website? The account helped me a appropriate deal. I have been tiny bit familiar of this your broadcast offered vibrant clear concept

None of the authors appears to have even glanced at the subject matter of the blog, and their tone is obnoxiously fawning. In addition, virtually every comment contains errors in grammar or word usage.

One or more spam comments arrives almost daily. With proper filters in place, the spam is collected in a holding file until I can get to it. I review it every day or two, to make sure that a legitimate comment did not get erroneously put there.

Non-Spam Website Maintenance

Not all of webmastering involves battling spammers. Some of it involves dusting and straightening up. Every time WordPress tweaks its software, each of the two dozen plug-ins that I use tweaks its software, as do the “themes” I use. The result is that almost every day one of these needs to have an update installed.

Posting a blog generates a sequence of other types of work. First I research the subject, collecting both news articles and original research publications. If I believe that I have something to contribute, I write an article and upload it to my website as a draft. I find and add suitable photos and Internet links. Then I preview the blog, testing every link and fine-tuning the text. Finally, I publish the article. At that point I am almost done: I verify that the article looks OK on my website; I check that it exported to social media (Facebook, LinkedIn, Twitter and Google Plus); and I add it to my Topic Index page.

Once the article has been published it enters the archives. However, that doesn’t mean I can ignore it. ArtChester.net has gradually increasing traffic. There are more and more articles, and each existing article attracts more and more visitors. To keep the website from going stale, the archives have to be maintained.

In a few cases I go back to add information or rewrite an article. However, there’s another kind of maintenance that is much more common. I make a point of including lots of Internet links in my blogs, and a surprising number of these links go out of date: websites are abandoned, websites are reorganized, and in some cases articles I reference simply disappear. You might say that while I have been away battling the purveyors of spam, the infrastructure back home has been breaking down – high winds have knocked down cable lines and some of my connections to the world have gone dead.

Fortunately, I don’t have to manually check every one of the thousands of links in my past blogs. I use an online tool (brokenlinkcheck.com) to scan all three of the websites that I handle. The service tests every link to see whether it goes to a live webpage and generates a list of those that show a problem. Maui114.net and honokeana.net don’t contain many links, but ArtChester.net has so many links that typically 20 or 30 of them go bad every couple of months.

Once I have a list of nonresponsive links I look up each article that contains one of those links and update the link. Sometimes I can find the same reference, moved to another address; sometimes I can find an equivalent source to link to; sometimes I have to re-write the text to not require that link; and a few times I have had to upload my PDF copy of the missing article and let my blog link to that file. It takes most of a day to fix all the links, but it gives me a good feeling that I’m providing a more useful source for readers.

I hope you’ve found this stroll through spam land interesting. Perhaps you need a website of your own? If not, you’re always welcome to guest-post here!

Have you ever read an article online and seen spam comments below the article that contained junk? Was the accumulated junk ever so bad that you gave up reading in disgust?

Image Credits:
– Notre Dame Cathedral grotesque / gargoyle, photo by John Cornellier at English Wikipedia
– “Laptop” by metalmarious on openclipart.org

Comments

Spam Wars in the Blogoverse — 5 Comments

  1. My email rules on my iMac closely parallel what Charles has set up. Works well.

    I greatly appreciate all the work Art puts into maintaining this blog and website. It’s a breath of fresh air (Hawaiian?) in a messy digital world. I think we all, of a certain age, want to maintain some creativity, and he’s chosen a very substantial means for doing so. Because of the challenges he lists, I’ve chosen a much simpler model for digital creativity: I periodically (e.g., monthly) post short videos on YouTube (type in “rha90272”) regarding adventures with my O-gauge model train layout. I get occasional supportive comments, but don’t have to deal with the spam that Art mentions (perhaps YouTube is doing some effective filtering). So I post content, but don’t enter into textual dialogs that can wander into the spam universe. Chacun a son gout…

    • Bob, thanks much for your blog! I went to your YouTube channel and loved your “ode” to model railroading (https://www.youtube.com/watch?v=3r8XODbm_CU). I see you have embraced a world whose possibilities are endless – in fact, anything that you choose to make them!

      I’m glad to hear that YouTube doesn’t attract the steady diet of spam that blog sites do. I placed an appreciative comment on your video but also used my comment to run an experiment: I included a link to artchester.net to see whether YouTube would suppress it. YouTube in fact posted the comment, with the link (I then edited the comment to remove the link so I would not be spamming). That disproved one hypothesis, that YouTube blocks links and for that reason does not attract spam. That leaves me with another hypothesis, that perhaps YouTube only accepts comments from registered users, and that would discourage spammers from posting. I couldn’t see an easy way of testing this hypothesis.

      So thank you also for helping me see that YouTube is running a protected world, where in fact people can maintain a presence without a lot of spam overhead. A provocative thought…

  2. Art, regarding your comment:

    “I fear that if I sorted my mail into folders I would not have the discipline to open them and review them.”

    My mail rule leaves the message “unread” when it moves a message (matching one of my mail rules) to a subordinate folder. That means that a red “badge” shows up on the mail folder which has received that new message via a mail rule (indicating there is at least one unread message waiting in that folder), so I don’t have to exert any discipline to check all the folders. The badge gets my attention, and I only have to open those folders which have gotten new messages moved into them by a mail rule and are flagged in that way. It’s a pretty painless way to sequester some messages away from the Inbox, and is what I would have done anyway (manually, without using a mail rule) because I usually want to save those messages semi-permanently. Using a mail rule to automatically move messages like that saves doing it manually, and also guarantees that I won’t overlook some important message that might otherwise be buried in spam messages in my Inbox.

    Of course, I haven’t even touched on the issue of having phones and tablets which share access with one or more computers to mail accounts. I’ve simplified my life there too so spam doesn’t overwhelm my poor phone, by limiting my phone so it only sees the one mail account associated with my “most private” email address. It’s my other mail accounts that are flooded with spam, by comparison, so if my phone can’t see them … no spam!! That’s a good thing when it comes to smartphones.

  3. Thanks, Charles! Your suggestions on e-mail address management are very helpful!

    I’m also impressed with the thoroughness of your filtering and classifying system. I fear that if I sorted my mail into folders I would not have the discipline to open them and review them. Instead, I use the “guilt” system to push myself – I route everything to a single “In Box” folder, where I feel obliged to deal with it as soon as possible.

    However, I’m unable to get all my mail into that one folder. I turned off the spam filters on my Caltech address but have been unable to turn them off at Yahoo, which handles my SBCglobal address. The best I have been able to do is to force Yahoo to download the supposed spam to my computer so at least I can handle them off-line. So I have to deal with both “In Box” and “Bulk Mail” (SBCglobal’s term).

    Yahoo’s spam filter is not only impossible (for me) to turn off, it’s not very smart. Regularly, important messages are classified as spam by Yahoo, such as notices from Delta Airlines that it’s time to check in for a flight. And phishing messages pretending to be from Paypal or GoDaddy sail right through the Yahoo spam filter.

    You’re right, this digital age brings with it many new challenges!

  4. Having spent my career in the IT field, I have a greater appreciation for the lengths Art is going to behind the scenes to keep all of his sites “useful” to his community. Granted, the problems he is dealing with in today’s world are an order of magnitude larger than what I had to deal with years ago just because the extent and sophistication (not to mention international access) of the Internet has grown since then.

    One thing Art doesn’t mention is worth stressing. If you are going to have a “public” personna that you wish to maintain, either in a website or a Facebook page, you must deal with the spam flood that will result from your address appearing in a public place, because there are “bots” which crawl the Internet “harvesting” such nuggets of gold. But it’s worse than that, because almost every business you communicate with periodically sells or shares its client list with other companies, who in turn share with others. Once you have given one business your email address, you have to assume everyone knows it (much like the old saw about a secret shared, is a secret that does not exist).

    To deal with that, there are two useful strategies. One is to use multiple email addresses, and use those to separate friends & relatives from other public use. The other is to understand the distinctions between a whitelist, blacklist, and greylist with respect to mail-handling rules within your mail client application.

    Using multiple email addresses isn’t that hard — some Internet Service Providers (ISPs) that bring Internet service to your house will give you multiple addresses, and allow you to change them periodically. I prefer not to take advantage of that option because if you ever change ISPs those addresses will disappear as soon as you move, which will cause you to suddenly have to contact multiple people with the new address. There are several “free” services such as Yahoo and Gmail which you can use to create email addresses anytime you need them, and some such as Earthlink that will charge you a modest monthly fee to maintain addresses like that.

    Jealously guard your own private email address that you use with your closest friends and never permit them to post your address in some public forum. Use a slightly less stable address for your ongoing business affairs. And keep one or more addresses for unimportant businesses that you don’t care about and don’t trust not to share with others. You can periodically “clean house” by simply deleting one of those temporary addresses, much like giving a dog a medicated bath to kill all the fleas at once.

    The other approach, using mail handling rules is important if you are like Art or myself, and get a lot of email. Using relatively simple rules (which are usually built-in with most modern mail systems, such as the Macintosh) you can automate the handling of certain messages based on message-specific information such as the sender address or keywords in the title. These rules are most useful when it comes to either deleting the message outright so you never see it, or shuffling off the messages to a folder in your mail system.

    Some mail systems such as the one on the Mac have more sophisticated rules which make things a step easier. For example, I can set up in my Contacts app all the businesses I deal with regularly, creating a category to hold all their cards (such as Businesses). Then I can tie a single mail rule to say that if an incoming message matches any of those businesses, to shunt that message off to a special Businesses folder in my mail system. Within the mail system, I can set up a Smart Folder (perhaps titled Bank) which is a logical partition of the Businesses folder so it only shows messages from one or more banks I might be patronizing.

    In my case, my mail rules are set up to keep my Inbox empty … my rules act on EVERY message and after taking action, that message no longer exists in my Inbox. Those rules are set up to classify every message into one of the following categories.

    A “whitelist” is a known-good (blessed) message, such as from a relative or a business with whom you know you have a relationship. In my case, I have as many mail rules as I have destination folders for such messages, diverting the messages off to the appropriate folder so I can look at them when I’m ready (so there is a Family folder, a Friends folder, and several folders titled with a descriptive name, such as my investment company).

    A “Blacklist” is the opposite … if the message matches any of specific attributes I have predefined (such as the sender’s email address, or a word like Viagra in the title) then I delete the message immediately and never see it.

    A “Greylist” is anything that doesn’t fit one of the two categories above, so is a default trapdoor which will clear out the Inbox and move all such messages somewhere else for later review. In my case, I have a folder named Junk to which all unclassified messages are moved, and their “unread mail” flag is turned off so these messages don’t draw my attention. Several times a day I open the Junk folder, scan through the waiting messages, and clean it out. I start by selecting ALL the messages in the Junk folder and then unhighlight specific messages from that list that I actually wish to read, then hit the Delete All key (which whisks all the true-junk away and leaves behind only the unselected messages). I then handle the remaining messages one at a time, either reading-and-deleting or reading-and-filing into a mail folder.

    As in Art’s “pet” analogy, steps like these are necessary care-and-feeding (and poop-patrol) required in today’s world even for those without a website to manage.