Google shows how to block bots and increase site performance

Google’s Martin Splitt answered a question about malicious bots affecting website performance and offered suggestions that every SEO and website owner should know and implement.

Malicious bots are an SEO problem

Many SEOs who perform website audits commonly overlook security and bot traffic as part of their audits because it is not widely understood by digital marketers that security incidents affect website performance and can explain why a website is inadequately crawled. Improving core web vitals will do nothing to improve site performance when a poor security posture contributes to poor site performance.

Every website is under attack, and the effects of excessive crawling can trigger a “500 server error” response code, signaling an inability to serve web pages and hindering Google’s ability to crawl web pages.

How to defend against bot attacks

The person asking the question wanted Google’s advice on how to fight back against the waves of scraper bots affecting their server performance.

This is the question:

“Our website is experiencing significant disruption due to targeted scraping by automated software, leading to performance issues, increased server load and potential data security issues. Despite IP blocking and other preventative measures, the problem persists. What can we do?”

Google’s Martin Splitt suggested identifying the service that acts as the source of the attacks and notify them of a misuse of their services. He also recommended the firewall features of a CDN (Content Delivery Network).

Martin replied:

“This sounds like something of a distributed denial-of-service issue if the crawl is so aggressive that it causes performance degradation.

You can try to identify the owner of the network where the traffic is coming from, thank “their host” and send a notice of abuse. You can usually use WHOIS information for that.

Alternatively, CDNs often have features to detect bot traffic and block it, and by definition they take the traffic away from your server and distribute it nicely, so that’s a win. Most CDNs recognize legitimate search engine bots and won’t block them, but if that’s a big concern for you, consider asking them before you start using them.”

Does Google’s advice work?

It is good advice to identify the cloud provider or server data center hosting the malicious bots. But there are many scenarios where it doesn’t work.

Three Reasons Why Contacting Resource Providers Doesn’t Work

1. Many bots are hidden

Bots often use VPNs and the open source “Tor” network, which hides the source of the bots, defeating any attempt to identify the cloud services or web host that provides the infrastructure for the bots. Hackers also hide behind compromised home and business computers, called botnets, to launch their attacks. There is no way to identify them.

2. Bots change IP addresses

Some bots respond to IP blocking by instantly switching to another network to immediately resume their attacks. An attack may originate from a German server and, once blocked, will switch to a network provider in Asia.

3. Inefficient use of time

It is useless to contact network providers about abusive users when the source of the traffic is obscured or from hundreds of sources. Many website owners and SEOs may be surprised to discover how intensive the attacks on their websites are. Even taking action against a small group of offenders is an inefficient use of time because there are literally millions of other bots that will replace those blocked by a cloud provider.

And what about botnets made up of thousands of compromised computers around the world? Think you have time to notify all these ISPs?

These are three reasons why notifying infrastructure providers is not a viable approach to stopping bots from affecting site performance. Realistically, it is a futile and inefficient use of time.

Use a WAF to block bots

Using a Web Application Firewall (WAF) is a good idea and this is the feature that Martin Splitt suggests when he mentioned using a CDN (content delivery network). A CDN, like Cloudflare, sends browsers and crawlers the requested web page from a server located closest to them, speeding up website performance and reducing server resources for the website owner.

A CDN also has a WAF (Web Application Firewall) which automatically blocks malicious bots. Martin’s suggestion of using a CDN is definitely a good option, especially since it has the added benefit of improving site performance.

One option that Martin didn’t mention is to use a WordPress plugin WAF like Wordfence. Wordfence has a WAF that automatically shuts down bots based on their behavior. For example, if a bot requests ridiculous amounts of pages, it will automatically create a temporary IP block. If the bot rotates to a different IP address, it will identify the crawling behavior and block it again.

Another solution to consider is a SaaS platform like Sucuri that offers a WAF and a CDN to speed up performance. Both Wordfence and Sucuri are trusted WordPress security providers, and they come with limited but effective free versions.

Listen to the question and answer at 6:36 minutes of the Google SEO Office Hours podcast: