Google’s Gary Illyes continues to warn about problems with URL parameters

Google’s Gary Illyes recently highlighted a recurring SEO issue on LinkedIn, echoing concerns he previously expressed on a Google podcast.

The question? URL parameters cause difficulties for search engines when crawling websites.

This problem is especially challenging for large websites and online stores. Adding different parameters to a URL can result in several unique URLs all leading to the same content.

This can inhibit search engines, reducing their effectiveness in crawling and indexing websites correctly.

The URL Parameter Conundrum

In both the podcast and the LinkedIn post, Illyes explains that URLs can hold infinite parameters, each creating a distinct URL, even though they all point to the same content.

He writes:

“An interesting quirk of URLs is that you can add an infinite (I call BS) number of URL parameters to the URL path, essentially creating new resources. The new URLs don’t even have to map to different content on the server, each new URL can only display the same content as the parameterless URL, but they are all different URLs.A good example of this is the cache-busting URL parameter on JavaScript references: it does not change the content, but it will force caches to refresh.”

He gave an example of how a simple URL like “/path/file” can be expanded to “/path/file?param1=a” and “/path/file?param1=a¶m2=b“, all potentially delivering identical content.

“Every [is] different URL, all the same content,” noted Illyes.

Accidental URL extension and its consequences

Search engines can sometimes find and try to crawl non-existent pages on your site, which Illyes calls “fake URLs”.

These can appear due to things like poorly coded relative links. What starts as a normal-sized site with about 1,000 pages can balloon to a million phantom URLs.

This explosion of fake sites can cause serious problems. Search engine crawlers can hit your servers hard and try to crawl all these non-existent pages.

This can overwhelm your server resources and potentially crash your site. Additionally, it wastes the search engine’s crawl budget on useless pages instead of your content.

Ultimately, your pages may not be crawled and indexed properly, which can hurt your search rankings.

Illyes states:

“Sometimes you can accidentally create these new fake URLs, exploding your URL space from 1000 cozy URLs to a blazing 1 million, exciting crawlers that in turn hammer your servers unexpectedly, melt pipes and whistles left and right. Bad relative links are a fairly common cause. But robotstxt is your friend in this case.”

E-commerce sites most affected

The LinkedIn post didn’t specifically call out online stores, but the podcast discussion clarified that this issue is a big deal for e-commerce platforms.

These sites typically use URL parameters to handle product tracking, filtering, and sorting.

As a result, you may see several different URLs pointing to the same product page, with each URL variant representing color choices, size options, or where the customer came from.

Fixing the problem

Illyes recommends consistently using robots.txt to solve this problem.

On the podcast, Illyes highlighted possible fixes, such as:

Creating systems to find duplicate URLs
Better ways for website owners to tell search engines about their URL structure
Using robots.txt in smarter ways to guide search engine bots

The tool deprecated URL parameters

In the podcast discussion, Illyes touched on Google’s previous attempts to address this issue, including the now-deprecated URL parameters tool in Search Console.

This tool allowed websites to specify which parameters were important and which could be ignored.

When asked on LinkedIn about potentially bringing back this tool, Illyes was skeptical about its practical effectiveness.

He said: “In theory yes. In practice no,” explaining that the tool suffered from the same problems as robots.txt, namely that “people couldn’t for the life of them figure out how to control their own parameters.”

Implications for SEO and web development

This ongoing discussion from Google has several implications for SEO and web development:

Review budget: For large sites, managing URL parameters can help save crawl budget and ensure important pages are crawled and indexed.
The architecture of the place: Developers may need to rethink how they structure URLs, especially for large e-commerce sites with numerous product variations.
Faceted navigation: E-commerce sites using faceted navigation should be aware of how this affects URL structure and crawlability.
Canonical tags: Canonical tags help Google understand which URL version should be considered primary.