Bot Reference


Audisto operates a crawling bot. The purpose of the bot is to fetch all accessible URLs of a website. Audisto provides a service to analyzes websites. The service is able to analyze the link graph of a website and collects valuable information about possible problems.


Audisto currently operates two different crawlers:

  • The Audisto Essential Crawler: It can crawl every site
  • The Audisto Full Crawler: It may crawl only sites with verified ownership


The crawler obeys the Robot Exclusion Protocol. See below on how to block the crawler through your robots.txt. The crawler can handle rel="nofollow", the nofollow meta-tag and the nofollow X-Robots-Tag-HTTP-Header as well. For verified hosts the crawler can also simulate crawling behaviour like crawling of other bots or with other directives for robots. The crawler can also work with customized robots.txt-files.

The crawler's implementation of parsing and handling a robots.txt file is based on the Internet-Draft Robot Exclusion Protocol from July 01, 2019. It additionally checks the robots.txt against a strict parsing mode based on the original 1994 A Standard for Robot Exclusion document and the 1997 Internet Draft specification A Method for Web Robots Control and reports the differences.

Robots.txt handling is everything but simple and many robots have problems with parsing and understanding robots.txt-files. If you want to learn more about strict and relaxed robots.txt handling or you have any problems with your robots.txt you should read our guide about writing a good robots.txt. You can also try to use a robots.txt checker to validate your robots.txt, read the documents mentioned above or contact us.

If you want to block Audisto you could add this to your robots.txt file:

# The Audisto Essential Crawler User-agent: audisto-essential Disallow: / # The Audisto Full Crawler User-agent: audisto Disallow: /

You can be more specific by addressing the crawler for portable platforms. The user agents to address them in the robots.txt are:

audisto-phone audisto-tablet

Without specific directives for the portable platforms, the crawlers fall back to directives for audisto.

User Agents

With each request, the Audisto crawler sends a user agent similar to the following. [type] crawler[platform] [version] (refer to in robots.txt as [robots.txt UA], see

Where the bracketed expressions are placeholders:

  • [type]: The crawler type ("full" or "essential")
  • [platform]: Optionally. A target platform (either "phone" or "tablet"), prefixed by "/"
  • [version]: The version, consisting of three or four digits, separated by dots (like 7.45.531, or 7.45.531.1)
  • [robots.txt UA]: The robots user agent ("audisto" or "audisto-essential")

Please note that the version number (e.g. 7.45.531) changes with each update.

Following are examples of different user agents.

For the Audisto Full Crawler: full crawler 7.45.531 (refer to in robots.txt as audisto, see

For the Audisto Essential Crawler: essential crawler 7.45.531 (refer to in robots.txt as audisto-essentials, see

This is used for verifying sites and during crawls.

The user agent changes, however, if a target platform other than "Web" is chosen when configuring a crawl (this feature may not be available to all users).

If the platform is "Phone", the user agent becomes: full crawler/phone 7.45.531 (refer to in robots.txt as audisto, see

And if the target platform is "Tablet": full crawler/tablet 7.45.531 (refer to in robots.txt as audisto, see

This does not hold for the Essential Crawler.

Detecting and Resolving Audisto Crawler

Audisto operates a number of servers with different IP addresses that could change from crawl to crawl. If you want to verify that the bot is authentic you should first look at the user agent. We suggest you match against

You should then use DNS to verify that the reverse DNS lookup for the IP points to a host in the Audisto domain, and than do a DNS->IP lookup to verify the reverse DNS lookup wasn't spoofed. You should cache the results of the verification for some time to reduce the number of lookups during a crawl. Here is an example for such an check:

> host domain name pointer > host has address

Audisto and Authentication

If you want to allow Audisto to crawl a password protected system we recommend you remove the password protection for the bot. If you are using Apache 2.2 with mod_access you can use the allow directive for this purpose:

Allow from

Using this will result in the validation of our bot using the same method already described above.

If you are using Apache 2.4 with mod_authz_host, the directive becomes:

Require host

Each of our crawlers is assigned to a specific sub domain:

  • The Audisto Essential Crawler: *, e.g.
  • The Audisto Full Crawler: *, e.g.

Additionally we will make request from *, for example to look up a verification file you uploaded or to check for verification meta tags on your main page. We also may download a robots.txt.

To use both authentication and reverse DNS lookup in Apache 2.2, you may use code similar to this:

AuthName "Access Test Site" AuthType Basic AuthUserFile "{path to password file}" Require valid-user Order Deny,Allow Deny from all # For Domain Verification Allow from # Audisto Full Crawler Allow from # Audisto Essential Crawler Allow from Satisfy any

This will allow access for our bot but ask anybody else for a user name and password.

The same code for Apache 2.4, looks like this:

AuthName "Access Test Site" AuthType Basic AuthUserFile "{path to password file}" Require valid-user # In Order: Domain Verification, Audisto Full Crawler, # Audisto Essential Crawler Require host

IP-Addresses Used By Audisto Crawlers

If you are setting up rules against IPs, these are our crawlers' addresses:

Audisto Full Crawler

Audisto Essential Crawler

The IP address list is also available as JSON:

Overload Protection Through Throttling

When the crawler works it way through a website, it may create quite some load, and the site's performance may start to decrease. Which may prevent the site from working correctly.

We try our best to prevent this. This is why we introduced throttling.

Whenever the crawler notes a site is malfunctioning, it will automatically start to reduce the number of requests it does. Notably it will:

  • Increase the delay between request by 100 milliseconds
  • Periodically decrease the number of parallel requests by two

Indicators for a potential malfunction are:

  • Connection aborts and problems
  • Timeouts
  • HTTP status code 403 - "Access Denied"
  • HTTP status code 429 - "Too Many Requests"
  • All server error related HTTP status codes - that is 500 to 599

The crawler will increase throttling after 10 indicators are observed within one hundred requests. That is after 10% or more of requests show errors.

In turn the crawler will decrease throttling after 4 or less errors occurred within one hundred requests, that is after the error rate drops beneath 5%.

When decreasing throttling the crawler will:

  • Decrease the delay between request by 50 milliseconds
  • Periodically increase the number of parallel requests by one