Strict and relaxed robots.txt parsing

In this article we want to explain the differences between the strict and the relaxed robots.txt parsing.

Within our UI we show the downloaded robots.txt files and the parsing results. Our software has two different sets of rules for parsing robots.txt files, strict mode and relaxed mode. Strict mode means that we parse your robots.txt strictly by the draft. Relaxed mode however means that we parse your robots.txt the same way Googlebot does.

If your robots.txt is not accurate strict parsing could result result in an unexpected crawling result. Relaxed mode works around a number of problems seen in robots.txt files. Relaxed parsing is most likely what the webmaster intended when he wrote the robots.txt.

In the past we allowed to choose between strict and relaxed robots.txt parsing and by defauled used the strict parsing mode. We removed this setting and always apply relaxed parsing now. We still show the differences so you can write a robots.txt file that could be correctly parsed by a large number of crawlers.

So what is the difference:

From a technical point of view relaxed mode is an fault tolerant parsing. Here is an example:

Instead of using blank lines to split sets of records relaxed mode ignores blank lines and starts new records when the next or the next set of user-agent-lines occurs.

A robots.txt with

User-agent: *

Disallow: /

will be interpreted as

User-agent: *
Disallow:

in strict mode and as

User-agent: *
Disallow: /

in relaxed mode.

In this extreme example both interpretations result in exactly the opposite and the relaxed interpretation is probably what the user intended. To point out issues like this we always compare both parsing results to each other and inform you about differences. As a webmaster your want to make sure that both parsings are identical.

Have a look at our robots.txt guide to learn how to write a good robots.txt.

 

Author