Validate Canonical Links Using the Audisto Crawler

Check your canonical linking implementation for inconsistencies

With our crawler, we offer a full-featured rel canonical tester / checker / validator capable of identifying all pages that are linked via canonical tags and evaluating these groups.

For the rel canonical to work properly, it is crucial to ensure that your canonical tags point to a valid, indexable, and reachable URL.

A detailed explanation on canonical can be found in our in-depth canonical guide.

Enable Canonical Validation

The Audisto Crawler automatically validates all canonical tags it encounters, if including <link> elements is enabled.

Required links configuration for canonical

The setting can be found in the project and crawl settings on the "Advanced" Tab within the "Links" section.

Canonical Groups Overview

The Canonical Groups report can be found within the "Indexing" section of the "Current Crawl" menu. The report shows all Canonical Groups. A Canonical Group consists of all pages that are connected by canonical links.

Canonical groups overview

For each group, the report shows:

  • Title: The group's title
  • Status: The overall status
  • Members: The number of pages in this group
  • Top URLs: Some exemplary pages that are members of this group
  • Hints: Hints that were triggered for this group

Clicking on a title brings up a detailed report for a group.

Canonical Status

Canonical groups have a status. This is one of these values:

  • OK: No problems detected
  • Uncertain: The crawler does not have enough information, e.g. because pages are known but not crawled
  • Problem: We detected problems, that might need fixing
  • Error: We detected errors, that require fixing

Hints

To point out specific problems, we provide a set of hints. These are:

  • Canonical chain
  • Canonical loop
  • Conflicting canonical declarations
  • External canonical leader
  • Canonical leader weaker than member
  • Canonical leader is noindex
  • Canonical leader blocked by robots.txt
  • Canonical leader HTTP status not 200
  • Canonical leader missing self reference
  • Canonical URL redirects
  • Cross-language or region canonical mismatch
  • Canonical found twice
  • Canonical found twice and differs
  • Canonical found outside <head>
  • Invalid attributes in canonical annotation
  • Superfluous HTML attributes
  • Canonical URL is not absolute
  • Canonical contains malformed or empty href
  • Canonical URL contains fragment identifier
  • Canonical URL changed
  • Canonical target URL is a redirect
  • Cross-language canonicalization
  • Non-HTML resource to HTML canonical
  • HTML page to non-HTML resource canonical
  • URL discovered only via canonical
  • Canonical target is noindex
  • Canonical target blocked by robots.txt
  • Canonical target returns non-200 status
  • Canonical target is an external URL
  • Group partially unknown

See detailed information on hints here.

Canonical Group Report

The group reports shows status information for the group and lists all hints and pages that are members of this group. Since each page has its own status and hints, it is easy to track down problems caused by individual pages.

Canonical group report

For each page in the group, the report displays:

  • URL: The URL of the page
  • Status: The status of the URL
  • HTTP Status: The HTTP status code - should be "200 - OK"
  • Indexable: Whether the page is allowed to be indexed
  • Document Language: The language of the page, as stated in the <html lang> attribute
  • Leader: Wheather the URL is a designated canonical destinations or not
  • Incoming Links: The number of canonical links pointing to this page
  • Outgoing Links: The canonical links pointing from this URL to other URLs
  • Hints: Specific hints related to that individual URL

For small groups, the Audisto Crawler provides a link graph. This visualization allows you to see "Canonical Chains" and "Loops" spatially, making it easy to identify where a redirect or canonical tag is pointing to through an unintended path of URLs.

Validating Canonical Across Several Domains

In the case of content syndication, identical or very similar content is often distributed across multiple domains.

  • example.com as the original
  • example.net as a secondary publication
  • example.org as a secondary publication

This frequently involves the use of cross-domain canonical links.

To crawl and analyze all these different domains, the Audisto Crawler can be configured to crawl across domain boundaries.

There is a detailed help on configuring domains, but basically the steps are the following:

  1. Open "Account/Sites" in the menu and add all your domains
  2. Verify all domains - this is important!
  3. Edit your project settings, and add all your domains as "Additional Domains"
  4. Check which additional domains should be crawled in project or crawl settings

The crawler will now detect and analyze links between all these domains.