Bot Reference

About

Audisto operates a crawling bot. The purpose of the bot is to fetch all accessible pages of a website. Audisto provides a service to analyzes websites. The service is able to analyze the link graph of a website and collects valuable information about possible problems.

Flavours

Audisto currently operates three different crawlers:

Robots.Txt

The crawler obeys the Robot Exclusion Standard. See below on how to block the crawler through your robots.txt. The crawler can handle rel="nofollow" and the nofollow meta-tag as well. For verified hosts the crawler can also simulate crawling behaviour like crawling of other bots or with other directives for robots. The crawler can also work with customized robots.txt-files.

The crawler's implementation of parsing a robots.txt file is based on the original 1994 A Standard for Robot Exclusion document, the 1997 Internet Draft specification A Method for Web Robots Control and on one of Googles documents.

The crawler uses a relaxed parsing method, quite like Google does. It additionally checks the robots.txt against a strict parsing mode, and reports the differences. Read our article about strict and relaxed robots.txt handling to learn more.

Robots.txt handling is everything but simple and many robots have problems with parsing and understanding robots.txt-files. If you have any problems with your robots.txt you should read our guide about writing a good robots.txt. You can also try to use a robots.txt checker to validate your robots.txt, read the documents mentioned above or contact us.

If you want to block Audisto you could add this to your robots.txt file:

# The Audisto Essential Crawler User-agent: audisto-essential Disallow: / # The Audisto Full Crawler User-agent: audisto Disallow: /

You can be more specific by addressing the crawler for portable platforms. The user agents to address them in the robots.txt are:

audisto-phone audisto-tablet

Without specific directives for the portable platforms, the crawlers fall back to directives for audisto.

User Agents

With each request, the Audisto crawler sends a user agent similar to the following.

For the Audisto Full Crawler:

audisto.com full crawler 3.25.423 (refer to in robots.txt as audisto, see https://audisto.com/bot)

For the Audisto Essential Crawler:

audisto.com essential crawler 3.25.423 (refer to in robots.txt as audisto-essentials, see https://audisto.com/bot)

This is used for verifying sites and during crawls.

The user agent changes, however, if a target platform other than "Web" is chosen when configuring a crawl (this feature may not be available to all users).

If the platform is "Phone", the user agent becomes:

audisto.com full crawler/phone 3.25.423 (refer to in robots.txt as audisto, see https://audisto.com/bot)

And if the target platform is "Tablet":

audisto.com full crawler/tablet 3.25.423 (refer to in robots.txt as audisto, see https://audisto.com/bot)

This does not hold for the Essential Crawler.

Please note that the version number (e.g. 3.25.423) changes with each update.

Detecting and Resolving Audisto Crawler

Audisto operates a number of servers with different IP addresses that could change from crawl to crawl. If you want to verify that the bot is authentic you should first look at the user agent. We suggest you match against audisto.com.

You should then use DNS to verify that the reverse DNS lookup for the IP points to a host in the Audisto domain, and than do a DNS->IP lookup to verify the reverse DNS lookup wasn't spoofed. You should cache the results of the verification for some time to reduce the number of lookups during a crawl. Here is an example for such an check:

> host xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx.in-addr.arpa domain name pointer a1.cl1.audisto.com. > host a1.cl1.audisto.com a1.cl1.audisto.com has address xxx.xxx.xxx.xxx

Audisto and Authentication

If you want to allow Audisto to crawl a password protected system we recommend you remove the password protection for the bot. If you are using Apache 2.2 with mod_access you can use the allow directive for this purpose:

Allow from cl1.audisto.com

Using this will result in the validation of our bot using the same method already described above.

If you are using Apache 2.4 with mod_authz_host, the directive becomes:

Require host cl1.audisto.com

Each of our crawlers is assigned to a specific sub domain:

Additionally we will make request from *.audisto.com, for example to look up a verification file you uploaded or to check for verification meta tags on your main page. We also may download a robots.txt.

To use both authentication and reverse DNS lookup in Apache 2.2, you may use code similar to this:

AuthName "Access Test Site" AuthType Basic AuthUserFile "{path to password file}" Require valid-user Order Deny,Allow Deny from all # For Domain Verification Allow from audisto.com # Audisto Full Crawler Allow from cl1.audisto.com # Audisto Essential Crawler Allow from cl2.audisto.com Satisfy any

This will allow access for our bot but ask anybody else for a user name and password.

The same code for Apache 2.4, looks like this:

AuthName "Access Test Site" AuthType Basic AuthUserFile "{path to password file}" Require valid-user # In Order: Domain Verification, Audisto Full Crawler, # Audisto Essential Crawler Require host audisto.com cl1.audisto.com cl2.audisto.com

IP-Addresses Used By Audisto Crawlers

If you are setting up rules against IPs, these are our crawlers' addresses:

Audisto Full Crawler

138.201.22.131
176.9.146.72
176.9.151.194
88.198.31.51
5.9.149.137
5.9.116.5
94.23.250.72
94.23.250.135
176.9.158.177
136.243.173.239
138.201.120.14
138.201.138.6
138.201.138.5
138.201.197.213
88.99.29.251
88.99.31.103
88.99.31.105
88.99.31.106
88.99.31.104
88.99.56.113

Audisto Essential Crawler

176.9.116.233
176.9.155.135
136.243.176.231

The IP address list is also available as JSON: https://audisto.com/ips.json