Jesse Lawson

buy me a coffee ☕ / home / blog / tutorials / portfolio / hire me! / contact

Dec 18, 2013 - sysadmin WordPress

WP Engine Hotfix: Preventing Spam and Bad Bot Traffic, Part I

WP Engine counts traffic from “bad” bots (like harvesters and spam bots) the same way it tracks human visitors. While some people have gone to great lengths to talk about how this has dissatisfied them to the point of leaving WP Engine, steps can be taken to take charge of your website’s defense and disallow these bots from ever making it to your pages. In this article, I discuss how to find out who is really visiting your blog (raw metrics), how to filter out the “bad” bot traffic, and (hopefully) reduce your visits in WP Engine’s algorithm.

Problem: Bad bot traffic counts against your WP Engine visitors/month limit.

Solution: Log every request for a period of time and deny all traffic from known and probable bots.

I’ll admit that this is a really, really raw way of attacking bad bot traffic, but there’s a reason I am doing this: I have a fairly popular site that receives a decent amount of steady traffic, and I am constantly inundated with people telling me that they’re upset with WP Engine for counting bad bot traffic. To nip this whole notion of “WP Engine is evil because they count bad bot traffic” in the bud, let’s take a quick step back and discuss the Myth of the Website Visitors.

The Website Visitors Myth: Google Analytics and JetPack Site Stats show me my accurate traffic, and WP Engine is inflating my traffic numbers (probably for more money!).

Most visitor statistic tracking programs are going to do their best to filter out non-human traffic because that’s the only traffic that we really care about. This means that the entire time we’ve been watching our visitors grow and grow and in all the conversations we’ve had with our friends about how many people are visiting our sites, we have misunderstood the very idea of what a “visit” actually is.

Think about this from the server’s perspective: some requesting entity says, “give me jesselawson.org, please.” The server, being a server, says, “you got it!” and delivers the site. In between those transactions a lot of things could be happening, and a lot of frustrated former WP Engine customers are justified (to a point) in saying that a some things should be happening there.

What sort of things? Bot filtering. Spam filtering. IP checks. Honeypots. Blackholes. If you’re running something like Akismet you know that running a (semi) popular WP site will yield you thousands of junk comments per month. In the eyes of a server, all of these comments are one or more visits so they’re going to be added to your visitor totals. Additionally, all the harvesters, trolls, bad bots, and whatever else kids are making these days are going to be served up by WP Engine nice and happily, all while inflating your visitor count to multiples of 4-5 times what you see in Google Analytics.

But let’s think about this for a second. You’re on a managed WordPress hosting platform, one that promises a fast, high-availability, rock-solid architecture to deliver your WordPress content in the quickest way possible. Where in that sentence did you see “and will guard you from bots and spam traffic”? 

Stay with me, stay with me. I know that’s frustrating to hear. As someone who runs a managed WordPress hosting service myself, I sort of cringe when I hear customers on a platform that says they’re a “managed” platform complaining about spam traffic and bot traffic. In my mind, the success of my hosting enterprise is only as successful as each and every one of my client sites. That means that I should be going out of my way to ensure that the bot visitors to my clients’ sites are genuine, well-intentioned archiving tools.

At the end of the day, though, we can’t expect WP Engine to be protecting our site from unwanted traffic because they haven’t really promulgated a definition for what unwanted traffic is on their network. The flip side to this is that everyone can generally agree that no one wants Chinese spam bots flooding our comment sections.

So where do we meet in the middle?

First, we have to understand that the common definition of a website visitor is wrong. 

A visit to a website is any instance where server resources are committed to serving your website’s content to a requesting entity. Plain and simple. There is no discrimination between human or bot, spammer or not. When we look at “unique pageviews” on Google Analytics and our “visitors” on WP Engine, they’re not reporting the same thing. 

Second, we have to take measures to block unwanted traffic ourselves. 

I don’t think there’s any excuse for a website owner to not know how to utilize the tools that are the backbone of their online business. That being said, I also think that a managed hosting company should make it extremely easy to manage these security options, even if it’s just via email.

For example, at DashingWP we scrape our visit logs for IPs and process them through a proprietary reputation analysis system that we call BadBlackHoney. As the name implies, it scrubs IPs against known bad bot and spam IPs, checking reputation and reported abuse and activity across over 100 different blacklists, and utilizes a special type of honeypot to catch the real bad ones. Armed with a list of bad IPs that changes daily, we regularly scrub every client .htaccess file in order to ensure that new abusers are properly handled.

Does WP Engine have to do this? I don’t think so, because it would be unfeasible to do so at their level. Remember that WP Engine is a managed WordPress hosting service; if you’re looking for a system that manages WordPress *and actively enforces bad bot/spam traffic mitigation, *then you can either learn to do this yourself or move your site to a host that does this already.

What WP Engine can do, however, is my next point.

Third, WP Engine could give their customers simple tools to help get rid of unwanted bad bot traffic. 

You’ll have to tackle this from two fronts, WP Engine:

  • On the one hand, bad bot and other unwanted traffic helps to push users over their account limits, which in turn results in higher net profit per account and a quality of revenue (strictly from an accounting perspective);
  • On the other hand, customers who don’t understand how pageviews versus true visits and Google Analytics versus real traffic logging work continue to become frustrated and leave your service. Worst of all, these people who leave are writing bad reviews on the internet that are poisoning your image in the blogosphere. Combined with the saturation of WP Engine affiliate links in articles that are obviously written to promote affiliate sales, prospective customers are left churning through poor sales dribble (affiliate articles) and frustrated customers in the comment sections of these articles.

Here’s what I suggest you do, WP Engine:

  1. Get one of your engineers to maintain a list of bad bots and known spammers.
  2. Give users the option to submit a ticket to “enable IP reputation discrimination,” and tell your customers that this will actively block IP addresses with poor reputation (i.e., known for spam emails, bad bot traffic, scraping, harvesting, etc).
  3. Ensure customers understand that by turning this system on, they may experience a slight loss in performance.
  4. Block the list of IPs at the server level so that your resources are never consumed (or the least amount of your resources are never consumed). If Apache is used to serve dynamic pages and you have Nginx in front, you can have a simple list of IPs to deny located somewhere in your filesystem (for example, /var/bad-ips) and each server block of each customer that activates this can have an extra call added that includes that list.

Alternatively, you could commit an engineer to a few articles about how users can take charge of their traffic, but I feel that this might start to segregate WP Engine from the brand image that I feel you maintain in your social media community (regardless of whether or not what people think of you is what you actually do).

At the end of the day, prospective hosting clients need to understand that WP Engine is perfect for you if you want to have a lot of control over your WordPress site hosting. They give you backup + restore, a one-click staging area, and a pretty inclusive support system.* With all this control means that they’re not going to discriminate against traffic because that’s something you should be doing.

We talked a lot about hosting, visitor analytics, and expectation management here. For those of you who are determined to take matters into your own hands, let’s now talk about how we can actively mitigate bad bot and spam traffic ourselves in the second part of this WP Engine Hotfix:

WP Engine Hotfix: Preventing Spam and Bad Bot Traffic, Part II

*: I will say that the majority of articles on WP Engine are garbage. I feel like searching for support articles on your site is a toss-up between finding an okay article and finding one that was written by a developer after writing 10,000 lines of code and just trying to go home.