When he said something like that, General Jack D. Ripper was ranting at the (to him) very visible international Communist conspiracy. However, today everyone who wants to be visible on the Internet must battle invisible forces. Those forces cannot be nuked into submission because we don’t know where to drop the bomb. They could be armies of Russian nerds or Chinese geeks. But they could also be that snotty kid that lives in your neighborhood.
Today’s blog relates my hand-to-hand combat with the Satanic Empire as I strive to keep this and my other websites open and free from pollution.
Spam as Poop
The digital age brought us instant communication and access to data. It also has forced us to view every e-mail suspiciously and to guard our identities.
I will modestly state that I am not a deep-dyed expert in the web world. Being a geezer, I would be excused if I knew practically nothing about the Internet. But I am also a writer – I want to inform and entertain, and I think I sometimes have something worth saying. That’s why I started artchester.net, which in turn threw me into the battlezone of website defense.
As I have mentioned before, owning a website is akin to owning a pet. If the pet is going to be healthy and a good companion, it needs to be fed and exercised. You also need to clean up its poop.
I am currently a poopmaster for three websites: this site artchester.net, our Maui condo website maui114.net, and the Honokeana Cove vacation rental site honokeana.net. None of these is supported by an Information Technology department with dozens of brilliant programmers. The IT department for the first two consists of Art Chester, and on honokeana.net I share that honor with fellow condo owner Phil Wolken.
Different websites attract different problems, as I will explain using these websites as examples.
Spam Offers to Domain Holders
When you register an internet domain, you immediately start receiving new kinds of e-mail spam. This happens because you are required to provide your e-mail and address to ICANN, the International Corporation for Assigned Names and Numbers, which is like putting your name on the Send-Me-Spam-Forever e-mail list.
Most of the e-mail spam you and I receive has nothing to do with being a domain owner. The ads for diet pills and sexual enhancers are harmless chaff so long as we don’t click any links. More of a risk are spoofs trying to fool us into downloading a virus, revealing bank log-in information, or claiming a share of an imaginary Nigerian bank account.
Domain ownership, however, attracts several additional types of spam e-mail:
– Official-looking invoices for domain hosting or services such as SEO (search engine optimization) that I never signed up for;
– Illiterately composed offers to improve my website, coming from people with no Internet storefront or verifiable address;
– Requests to add links to someone’s website to my blog – something that I’m willing to do when the link is relevant, but the requests I get are usually far afield.
Once in a while there’s a message that looks as if a human being actually thought about it and composed it, but most of the time it’s an obvious form letter, not even addressed to me by name.
It’s possible to pay an extra fee to your domain hosting company to have “private registration,” in which your personal information is not published. However, that will not keep you from receiving this kind of spam mail addressed to firstname.lastname@example.org or email@example.com. If receiving mail from strangers seems too risky, perhaps you’re not the person to have a website. Or an e-mail address?
Hackers Trying to Seize Your Website
OK, so now you own a domain. Once you start using it, by posting a page on it, you have a website. And when you have a website, hackers want to sign on to your account. They may try to sign in at your domain hosting company, or sign in as if they are the webmaster for your website.
Hackers have many reasons to impersonate you in this way:
– If you were lucky enough to get a highly-desired domain name (such as mauiguest.com), people want to take ownership of your domain so they can sell it at one of the online auctions;
– They want to change the sign-on credentials so you can’t get into your own website, then charge you ransom (“security assistance fee”) to get it back again;
– They want to add pages and posts, junk of their choosing;
– They want to have fun at your expense;
– They want to place political or offensive ads on your home page.
Here are three lines of defense against website hackers:
– Change the Administrator name from the default value (often “admin”) to something else; that something else should not be your name or e-mail address or anything easy to guess, or hackers will use that and then start trying to guess your password. I changed my sign-in name at the beginning yet 254 people still tried to sign in as “admin” or “ArtChester” until I put a blanket block on all unregistered user names.
– Change the password to a “strong” password; it should not include any word found in the dictionary, and should be different from any other password you use.
– If you are using WordPress, use comprehensive security plug-ins; among other things, they can block repeated attempts to guess your password and can hide your sign-in page by renaming it.
Spam Contact Messages
Most websites provide a way to contact the site owner such as a comment or contact form. If you publish a simple comment form, it’s like honey to ants: it will attract spam messages that will flood your In Box.
Fortunately, spam contact messages can be almost totally blocked by adding a CAPTCHA field to the contact form. There are plug-in apps to perform this task.
To Comment or Not to Comment? That is the Question…
There’s one class of spam that does not affect every website: spam comments. If your posts are set up to allow comments, they will also attract spam comments.
Some websites don’t need a commenting function. The website maui114.net describes Honokeana Cove and our condo, and also includes feature stories about Hawaiian culture and life. The posts don’t change much from month to month so they do not have comments enabled. If people want to comment, they can use the contact form to reach us. This is a relatively small website: although it contains many photos, there are only 31 total pages.
Honokeana.net is a transitional type of website from the standpoint of comments. At present, it contains 97 web pages. About half of the pages have significant content; the other half serve as frames for videos that show each individual condo.
Phil and I set up honokeana.net so that most of its content would not change too often, to reduce the demands on our rental staff. However, a website that never changes is b‑o‑r‑i‑n‑g. So we added three continually changing features:
– Phil installed a web cam at Honokeana Cove that shows live images of people having fun in paradise. Making the cam work and broadcast for an affordable price is a whole lot more trouble than you might think. I give him full credit for an excellent accomplishment!
– A menu item that links to Honokeana Cove’s Facebook page.
– TripAdvisor Certificates of Excellence on the Honokeana Cove website that link to current TripAdvisor reviews of the Cove.
Phil and I would like to go one step better: Su writes a monthly newsletter for our condo owners, and at least half of its content would be interesting to our vacation guests too. We have discussed adding a page to the website with all the manager’s newsletters; since the letters contain up-to-date news, readers will want to add comments. And if we permit comments, we will have to be prepared to defend against spam comments.
Compared with these two examples, artchester.net is a comment-rich website. It currently contains 1,846 web pages with 226 posts and 400 comments. Thus it is a juicy target for spam commenters.
Why Do Spam Comments Exist?
It may not be obvious why someone would want to post spam as a comment on someone’s website except to annoy. But mischief is not the usual motivation: it’s money.
When you enter a query in Google and other search engines, most searches return a huge number of responses: for example, a Google search on striped cat yielded 38,300,000 results. If you’re selling tabbies, there’s a lot of competition out there! You can probably sell a lot more cats if your website appears on the first page of Google search results, preferably high on the page.
Search engines sort their results in hopes of making them as useful as possible to the searcher. If one of the websites on the list is linked to by many other sites, the search algorithm may give it a higher ranking. So if you are selling a product or service, you would like to have many people post links to your website. In fact, even if you don’t sell anything, you might have ads on your website that earn you income when they are clicked; or you may be pumping up your website’s visibility so that you can sell it for more money.
Where there is a need, there is someone willing to satisfy that need, for a fee. You can pay people for search engine optimization in order to raise your website in the search rankings. Those people may use either legitimate or deceptive means to accomplish that for you. One of the “black hat” techniques is to insert a web address within comments on a huge number of different websites. Those spam comments are generated not by humans, even minimum wage humans, but by computer programs – “bots.”
Valid human-written comments may or may not contain internet links; however, every spam comment that I have seen contains at least one link.
Fortunately, there are ways to fend off the river of spam trying to flow into your webpages. In addition to attracting 400 legitimate comments, my 226 blog posts have so far received 632 spam comments. All but one were intercepted as spam by my WordPress plug-ins, which include Akismet and Conditional CAPTCHA. I have all comments held for moderation unless the author has a previously approved comment; the one spam comment that slipped through was held for my review so I was able to delete it before it posted.
I was receiving several times this many spam comments until I modified my website so that comments are disabled one month after I publish an article. Readers who want to comment after that date have to send me a note through the contact form, an extra step that bots are not programmed to handle.
I thought you might be interested to know the source and content of these comments so I analyzed as many of them as I could capture: 396 examples. Here is what I found:
Popular Subjects. Some of my blogs attracted much more spam than others. Eleven blogs had ten or more spam comments. The top winners of this dubious popularity contest and their numbers of spam comments are:
– Crazy for Nuts (29)
– Fish Invaders 1 (29)
– Snorkel Skills (27)
– Metabolic Rate (26)
– Cave Painting Women (26)
Why these blogs and not others? I’ve looked at the tags and keywords and in most cases I can’t see why these particular blogs got so many comments. Perhaps they were randomly selected by the spambots. However, in the case of the Metabolic Rate blog the comments appear to be targeted. This spam, received just in the last two weeks, comes largely from usernames containing words (garcinia, bodybuilding, testosterone, weight) that are related to some of the tags (diet, exercise, metabolism, weight) that I assigned to that article. So the bots are exercising at least some intelligence.
Countries. So far as I know, it’s not possible to determine where the spammers live and operate. They are probably using other people’s computers to send their spam. What we do know is the IP address of the spamming computer and we can easily find out what country that address is registered to.
Over the last few years I thought I noticed a shift from foreign IPs to domestic ones. To see whether my impression was borne out by facts, I divided the spam into two approximately equal batches, before and after year-end 2014. Here is what I discovered:
This is my guess: so much spam was originating from China (and surprisingly, France) that website owners installed plug-ins to filter out comments originating from overseas, especially China. This motivated the spammers to get busy and hack a lot of computers in the U.S. so that their comments would not get blocked by the country filters. My spam includes as large a variety of hacked countries as ever, but looking at country of origin is now useless for blocking spam comments.
Repeated IP Addresses. Some of the security plug-ins have the ability to automatically block IP addresses that repeatedly submit spam comments to a website. However, IP addresses are rarely repeated, as the following data will show.
I sorted spam comments by IP address and found the following:
– Total spam comments surveyed: 396
– Number from unique (never repeated) IP addresses: 329 (83%)
– Number from repeated IP addresses, only in the month of December 2013: 51 (13%)
– All other spam comments from repeated IP addresses: 16 (4%)
Obviously, blocking repeated IP addresses is not an efficient way to filter out spam. The big bulge of repeated addresses in December 13 all came from China, except for two from Ukraine. It’s as if a spammer using Chinese computers ran an experiment to see whether websites would automatically block spam comments from repeated addresses. And presumably the spammer gave up on the idea, deciding that unique addresses would have a better chance of getting through the spam filters.
Something else is also going on: if you check the IP address of your computer connection at home, it’s likely to be different each time you wake up your computer. The reason is that your internet service provider (ISP) has a range of assigned addresses, and assigns them randomly to their subscribers as needed. This dynamic addressing allows them to provide service to many more customers than their assigned range of addresses, because only a fraction of their retail customers will be using an internet connection at any given time. I suspect that many hacked computers are being used to send spam over and over, but because their ISPs use dynamic addressing, the IP addresses change frequently. Thus it only appears that spam rarely originates from the same IP computer.
Other Repeated Features. Is anything else repeated in the spam? Not very often. Most user names are used only a few times, then never again. Similarly, the web addresses that the spammers are trying to insert rarely repeat. And spam arrives at all hours of the day and night.
However, the spirit of the spam comments repeats again and again, using different words. Consider four typical comments, which were submitted on my Metabolic Rate blog:
Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to more added agreeable from you! However, how could we communicate?
Hmm it looks like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I submitted and say, I’m thoroughly enjoying your blog. I as well am an aspiring blog blogger but I’m still new to the whole thing. Do you have any tips for rookie blog writers? I’d genuinely appreciate it.
Wow! After all I got a web site from where I be able to in fact obtain valuable facts concerning my study and knowledge.
Great beat ! I wish to apprentice even as you amend your website, how could I subscribe for a weblog website? The account helped me a appropriate deal. I have been tiny bit familiar of this your broadcast offered vibrant clear concept
None of the authors appears to have even glanced at the subject matter of the blog, and their tone is obnoxiously fawning. In addition, virtually every comment contains errors in grammar or word usage.
One or more spam comments arrives almost daily. With proper filters in place, the spam is collected in a holding file until I can get to it. I review it every day or two, to make sure that a legitimate comment did not get erroneously put there.
Non-Spam Website Maintenance
Not all of webmastering involves battling spammers. Some of it involves dusting and straightening up. Every time WordPress tweaks its software, each of the two dozen plug-ins that I use tweaks its software, as do the “themes” I use. The result is that almost every day one of these needs to have an update installed.
Posting a blog generates a sequence of other types of work. First I research the subject, collecting both news articles and original research publications. If I believe that I have something to contribute, I write an article and upload it to my website as a draft. I find and add suitable photos and Internet links. Then I preview the blog, testing every link and fine-tuning the text. Finally, I publish the article. At that point I am almost done: I verify that the article looks OK on my website; I check that it exported to social media (Facebook, LinkedIn, Twitter and Google Plus); and I add it to my Topic Index page.
Once the article has been published it enters the archives. However, that doesn’t mean I can ignore it. ArtChester.net has gradually increasing traffic. There are more and more articles, and each existing article attracts more and more visitors. To keep the website from going stale, the archives have to be maintained.
In a few cases I go back to add information or rewrite an article. However, there’s another kind of maintenance that is much more common. I make a point of including lots of Internet links in my blogs, and a surprising number of these links go out of date: websites are abandoned, websites are reorganized, and in some cases articles I reference simply disappear. You might say that while I have been away battling the purveyors of spam, the infrastructure back home has been breaking down – high winds have knocked down cable lines and some of my connections to the world have gone dead.
Fortunately, I don’t have to manually check every one of the thousands of links in my past blogs. I use an online tool (brokenlinkcheck.com) to scan all three of the websites that I handle. The service tests every link to see whether it goes to a live webpage and generates a list of those that show a problem. Maui114.net and honokeana.net don’t contain many links, but ArtChester.net has so many links that typically 20 or 30 of them go bad every couple of months.
Once I have a list of nonresponsive links I look up each article that contains one of those links and update the link. Sometimes I can find the same reference, moved to another address; sometimes I can find an equivalent source to link to; sometimes I have to re-write the text to not require that link; and a few times I have had to upload my PDF copy of the missing article and let my blog link to that file. It takes most of a day to fix all the links, but it gives me a good feeling that I’m providing a more useful source for readers.
I hope you’ve found this stroll through spam land interesting. Perhaps you need a website of your own? If not, you’re always welcome to guest-post here!