Strict and relaxed robots.txt handling

Whenever you start a new crawl you have a number of advanced settings to configure how we crawl your site. In this article we want to explain the differences between the strict and the relaxed robots.txt handling.

By default our service interprets your robots.txt in strict mode. Strict mode means that we parse your robots.txt strictly by the draft. If your robots.txt is not accurate this will most likely result in an unexpected crawling result. Relaxed mode however means that we parse your robots.txt the same way Googlebot does. Relaxed parsing is most likely what the webmaster intended when he wrote the robots.txt. Relaxed mode works around a number of problems seen in robots.txt files.

So what is the difference:

From a technical point of view relaxed mode is an fault tolerant parsing. Here is an example:

Instead of using blank lines to split sets of records relaxed mode ignores blank lines and starts new records when the next or the next set of user-agent-lines occurs.

A robots.txt with

User-agent: *

Disallow: /

will be interpreted as

User-agent: *
Disallow:

in strict mode and as

User-agent: *
Disallow: /

in relaxed mode.

In this extreme example both interpretations result in exactly the opposite and the relaxed interpretation is probably what the user intended. To point out issues like this we always compare both parsing results to each other and inform you about differences. As a webmaster your want to make sure that both parsings are identical.

Have a look at our robots.txt guide to learn how to write a good robots.txt.

 

Author