Audisto Crawler

Indexability testing tool for index and noindex robots directives

The indexability checker helps with

  • Analyze and test the indexability of all website documents and resources
  • Identify URLs blocked from crawling and indexing based on robots.txt disallow
  • Debug indexation issues related to meta robots noindex and x-robots tag noindex
  • Detect indexation problems caused by duplicate content, canonical or hreflang
Photo of Sören Bendig
Sören Bendig, CEO Audisto:

Track down and resolve indexability issues for all your important documents and resources

Our indexability checker points out common errors with robots directives that prevents your website from being properly indexed. It also helps to identify opportunities for crawl budget optimization, as many websites suffer from crawl capacity waste due to having large amounts of non-indexable URLs crawled by search engines.

x-robots and meta robots directives

Example of Audisto indexability checker overview

Automated noindex discovery

Documents and resources set to noindex using a meta robots noindex directive or a x-robots tag header noindex directive are automatically discovered by our noindex tester.

Conflicting robots directives

URLs with multiple conflicting robots directives are exposed by the noindex check. If one directive is set to index and the other is noindex the URL is highlighted.

Support for image, video and PDF files

For non-HTML URLs like images, videos and PDF files, the indexability is derived from the x-robots tags HTTP header directives.

Check robots.txt crawling directives

Example of Audisto robots.txt analysis

Identify robots.txt parser issues

Discover if a robots.txt file causes parser issues e.g. due to an incorrect MIME type, incorrect directives, URL encoding issues or a BOM at the beginning of the file.

Differences between strict and relaxed parsing

Find robots.txt files with conflicts, where differences in strict and relaxed parsing exist and can lead to different crawling results.

Unexpected HTTP status codes

Detect robots.txt files with an unexpected non 200 HTTP status code like 401 and 403 that often lead to unintentional crawling results.

Verify blocking of correct URLs

Conveniently review all URLs blocked by a robots.txt using the URL by status view. Double check the impact of blocked URLs to your PageRank.

Canonicals, duplicate content and hreflang

Example of Audisto duplicate content analysis

Test rel=canonical link usage

Identify URLs that are likely not indexable due to having a canonical link pointing to another URL.

Learn more

Check for duplicate content

Discover URLs with duplicate content or similar URLs that are often the root cause of indexation issues.

Learn more

Test hreflang tags implementation

Explore all URLs grouped together by hreflang links and quickly uncover problems and related indexation issues.

Learn more

Crawl budget optimization

Example of Audisto HTTP status report

Identify crawl capacity waste

Prevent the waste of crawling time on non-indexable URL inventory and increase your index coverage. Convenient status dashboards and historical data will always provide an exact overview.

Reduce crawling errors

Improve your crawl health by preventing crawling and indexing difficulties due to broken links or too many HTTP errors. Catch soft 404 errors with our duplicate content analysis.

Avoid internal redirects

Unnecessary internal redirects have a negative effect on crawling. Use a variety of reports and hints to directly spot and eliminate non-indexable redirects within your website structure.

Learn more

Analyze sitemaps

Help search engines to index your content with clean and working sitemaps. Our XML sitemap analysis highlights issues with <lastmod> times, duplicate URLs and correct content types.

Learn more