Link Crawling
State what gets crawled
The Audisto Crawler can be configured to consider or ignore several kinds of references during crawling. This is done using the Links setting when configuring a project or a crawl.
The crawl will always follow regular anchor links (the <a> tag), and redirects. Additionally, it can include resources, follow <link> tags and include links from XML sitemaps.
Crawling Resources
If Include resources is enabled, the crawler will download images, CSS, JavaScript and all kind of files linked through:
<img>, both thesrcand thesrcsetattribute are evaluated<source>, both thesrcand thesrcsetattribute are evaluated<script><frame><iframe><video><audio><object><link>, but only with arel="stylesheet"attribute, defining a CSS resource
This allows for side-wide checking of images, scripts, and other resources.
Crawling <link> elements
If Include <link> elements is enabled, the crawler will discover documents and resources linked though a <link> tag. This includes canonical links and hreflang links. The Only exception are URLs with a rel="stylesheet" attribute. These are treated as resources.
Links are extracted from both
- HTML
- HTTP Header
Crawling XML Sitemaps
If Include links from XML sitemaps is enabled, the crawler will discover URLs and resources from XML sitemap index files and XML sitemap files.
Links from the following sitemap extensions are extracted as well:
If you are interested in less technical information, please read the dedicated page for our XML sitemap checker and validator.