Audisto Robots Directives Checker

How to detect issues with robots directives on your website

The robots hints group collects hints about usage of the robots directive on a website.

Issues with robots directives can lead to problems with the crawling and the indexing of a website.

With this hints section you can identify the most common mistakes with the robots directives on a website.

Example: Audisto Robots Directives Check with the robots hint reports for the current crawl

Audisto Robots Directives Check with the robots hint reports for the current crawl

Here is the list of all specific hints related to the robots directive, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

Robots: Directives Missing

Description

No robots meta tag or X-Robots-Tag header directive was found.

Importance

If no robots meta tag or X-Robots-Tag header directive is used, this equals the "index, follow" robots directive. In this case, crawlers are not restricted by the document from crawling and indexing it.

Under these circumstances, URLs might get crawled and indexed, even though they are not supposed to be in the index. This might result in privacy issues as well as in a waste of indexing budget.

Operating Instruction

We suggest to specify robots directives for every document on your website.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robots definitions.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: follow

Description

The URL is set to "follow" by a robots directive, either per robots meta tag or the X-Robots-Tag header. If there are no robots directives specified, the URL will regarded as if "index, follow" was specified.

Discover all occurrences of "follow" robots directives or meta tags with this hints report.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, follow"

Robots directives in HTTP-header X-Robots-Tag

X-Robots-Tag: index, follow
Importance

If a URL is set to "follow" and there are no robots.txt restrictions interfering, search engines will usually follow all links in the document and crawl the target URLs.

Operating Instruction

We suggest to use the follow directive for robots by default.

Robots: index

Description

The URL is set to "index" by all robots directives, either per robots meta tag or the X-Robots-Tag header.

Find all URLs that are set to “index" with this report.

Examples

HTML tag

<meta name="robots" content="index, follow">

X-Robots-Tag in HTTP header

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index
...
Importance

The “index" directive for robots tells search engines the document is supposed to be indexed. Having URLs indexed that are not supposed to be indexed, can lead to

  • Issues with privacy
  • Issues with indexing budget
  • Issues with thin content or duplicate content
Operating Instruction

We suggest to check on a regular base, if there are parts of the site set to “index", that are not supposed to end up in the search results. There are several ways to deal with this situation, depending on requirements:

  • Set URLs to “noindex", if they should not appear in search results
  • Make links to these URLs not crawlable
  • Use URL Removal Tool offered by search engines like Google
  • Block crawling in robots.txt
  • Block access to the URLs

Robots: nofollow

Description

The site is set to "nofollow" by a robots directive, either per robots meta tag or the X-Robots-Tag header.

Find all instances of “nofollow" usage that have been discovered crawling your site.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">

X-Robots-Tag in HTTP-header

X-Robots-Tag: nofollow
Importance

Using the “nofollow" directive for a URL tells crawlers not to follow any links in the document. This also prevents PageRank flow to the target URLs. By using nofollow for internal links, you weaken your site.

Using the "nofollow" robots directive affects

  • PageRank flow
  • Website structure
  • Crawling
Operating Instruction

URLs that use the nofollow robots directive should be evaluated on a regular base to prevent possible issues. Nofollow should not be used for internal linking.

Instead you should consider setting the document to robots follow.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex

Description

The site is set to "noindex" by a robots directive, either per robots meta tag or the X-Robots-Tag header, but still part of the internal link graph.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="noindex, follow">

X-Robots-Tag in HTTP-header

X-Robots-Tag: noindex, follow
Importance

Using the “noindex" directive for a URL tells crawlers not to index the document. This also prevents the URL from showing up in the search results. If URLs with robots "noindex" directive are part of the internal link graph, they still get crawled and consume crawl budget. This may lead to issues with the crawlrate. These URLs also bind internal linkjuice and - on a large scale - can harm your sites rankings.

Legit use of noindex

A legit reason to keep URLs on "noindex" would be legal restrictions, like an imprint or privacy policy.

Problematic use of noindex

Using noindex on URLs, that are supposed to give structure to the site, e.g. categories, important tags, HTML sitemaps, should be avoided.

If URLs are supposed to give structure to a site in terms of SEO, then they should also add value for the user.

Operating Instruction

We strongly suggest reviewing your sites "noindex" URLs on a regular base to prevent issues with crawl budget and internal PageRank flow. If there is no need to keep a URL, that is set to noindex, it should be dropped with a 410 status code.

Try to improve structure URLs on an individual base. If these URLs are adding value to the user, they'll be worthwile to get indexed as well.

If noindex is used on URLs generated by sorting and filtering options, make sure to use a PRG-pattern instead of linking to these URLs directly.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.