Link Crawling

State what gets crawled

The Audisto Crawler can be configured to consider or ignore several kinds of references during crawling. This is done using the Links setting when configuring a project or a crawl.

Configuration dialog for links

The crawl will always follow regular anchor links (the <a> tag), and redirects. Additionally it can include resources, and follow <link> tags.

Crawling Resources

If Include Resources is enabled, the crawler will download images, CSS, JavaScript and all kind of files linked through:

  • <img>, both the src and the srcset attribute are evaluted
  • <source>, both the src and the srcset attribute are evaluted
  • <script>
  • <frame>
  • <iframe>
  • <video>
  • <audio>
  • <object>
  • <link>, but only with a rel="stylesheet" attribute, defining a CSS resource

This allows for sidewide checking of images, scripts, and other resources.

Crawling <link> tags

If Include <link> elements is checked, the crawler will discover resources linked though a <link> tag. Only exception are links with a rel="stylesheet" attribute. These are regarded as resources.

Links are extracted both from

  • HTML
  • HTTP Header

Additionally, if following <link> tags is enabled, extended analysis is available for:

  • Checking hreflang: All links with a hreflang attribute are validated for correctness and completeness.