All 30 canonical group hints checked by the Audisto Crawler
Table Of Content
- Canonical chain
- Canonical loop
- Conflicting canonical declarations
- External canonical leader
- Canonical leader weaker than member
- Canonical leader is noindex
- Canonical leader blocked by robots.txt
- Canonical leader HTTP status not 200
- Canonical leader missing self reference
- Canonical URL redirects
- Cross-language or region canonical mismatch
- Canonical found twice
- Canonical found twice and differs
- Canonical found outside <head>
- Invalid attributes in canonical annotation
- Superfluous HTML attributes
- Canonical URL is not absolute
- Canonical contains malformed or empty href
- Canonical URL contains fragment identifier
- Canonical URL changed
- Canonical target URL is a redirect
- Cross-language canonicalization
- Non-HTML resource to HTML canonical
- HTML page to non-HTML resource canonical
- URL discovered only via canonical
- Canonical target is noindex
- Canonical target blocked by robots.txt
- Canonical target returns non-200 status
- Canonical target is an external URL
- Group partially unknown
Hints
Canonical chain
Description
This hint is set on canonical groups and canonical members where the canonical links form a chain rather than all pointing directly to the leader. In a canonical chain, URL A points to B, B points to C, and so on. While the chain eventually resolves to a final leader, this approach dilutes the consolidation signal and is inefficient. Only canonical members with incoming and outgoind links are marked with this hint.
Example
Inefficient chain (A → B → C):
<!-- A.html -->
<link rel="canonical" href="member-B.html">
<!-- B.html -->
<link rel="canonical" href="leader-C.html">Correct approach (all point directly to C):
<!-- A.html -->
<link rel="canonical" href="leader-C.html">
<!-- B.html -->
<link rel="canonical" href="leader-C.html">Importance
Canonical chains weaken the consolidation signal sent to search engines. When a search engine encounters a chain, it must follow it to identify the canonical leader. This reduces the effectiveness of the canonical tag and may result in unwanted rankings or even unconsolidated content. RFC 6596 considers chained canonicals to be improper, and they can lead to applications using their own heuristics.
The canonical tag is intended to consolidate all variations of a page into a single URL. This consolidation is most effective when all variants point directly to the authoritative version (canonical leader), rather than creating intermediate steps.
Operating Instruction
Review all canonical tags in the group and ensure that every member URL points directly to the final canonical leader. Remove any intermediate canonical redirects. This will strengthen the consolidation signal and improve the effectiveness of your canonical implementation.
Canonical loop
Description
This hint is set on canonical groups where canonical links form a circular reference or loop. In such cases, following the canonical chain creates an infinite loop where no definitive leader can be identified.
Example
Circular reference (A → B → C → A):
<!-- A.html -->
<link rel="canonical" href="B.html">
<!-- B.html -->
<link rel="canonical" href="C.html">
<!-- C.html -->
<link rel="canonical" href="A.html">Importance
Circular canonical references are logical paradoxes that prevent search engines from identifying a clear canonical leader. When such loops exist, search engines must either:
- Break the loop arbitrarily and pick a leader
- Disregard all canonical signals in the group
- Treat all URLs in the loop as separate entities
None of these outcomes are desirable. Circular references indicate a fundamental problem in the canonical implementation strategy. RFC 6596 advises avoiding designating a target IRI that also specifies a canonical link relation to an IRI other than itself.
Operating Instruction
Audit all canonical tags in the affected group to identify the loop. Once identified, break the loop by changing all URLs to point to an appropriate leader. Ensure that the canonical structure forms a clear tree toward a single authoritative URL, with no circular dependencies.
Conflicting canonical declarations
Description
This hint is set on canonical groups where at least one URL declares multiple different canonical targets. When a single URL specifies more than one canonical destination, the canonical signal is ambiguous. Search engines may ignore all canonical signals in the group or pick an arbitrary target.
Example
Multiple conflicting canonical declarations:
<!-- A.html -->
<link rel="canonical" href="member-B.html">
<link rel="canonical" href="member-C.html">
<!-- B.html (self reference) -->
<link rel="canonical" href="member-B.html">
<!-- C.html (self reference) -->
<link rel="canonical" href="member-C.html">Importance
The canonical tag is designed to specify a single authoritative URL. When multiple canonical declarations exist on the same page, the intended signal becomes unclear. Sarch engines may either:
- Arbitrarily pick a leader
- Follow the first canonical declarations
- Ignore all canonical signals due to the ambiguity
Conflicting canonicals are a sign of deeper structural issues that must be resolved. RFC 6596 recommends specifying only one canonical link relation for a resource.
Operating Instruction
Identify all URLs in the group that declare multiple canonical targets. For each such URL, determine the single correct canonical destination. Remove all conflicting declarations and keep only one canonical tag per page.
External canonical leader
Description
This hint is set on canonical groups where the designated canonical leader is a URL on a different domain. While external canonicals are common in content syndication scenarios, they warrant special attention.
Example
Canonical pointing to an external domain:
<!-- Member page on example.com -->
<link rel="canonical" href="https://example.org/">Importance
External canonicals are common in legitimate syndication scenarios where a partner site or mirror wants to ensure that the original publisher retains ranking power. In addition to these legitimate uses, an external canonical may surface when problems occur in multi‑domain or multi‑language setups – for example when configuration isn’t properly adjusted and multiple websites share the same code base. In such cases the wrong canonical target can be emitted for many pages at once, effectively stripping indexing from the intended host.
These situations present several risks:
- If the external canonical is removed or changed, the group structure collapses
- Misconfiguration across sites or language versions can propagate incorrect targets at scale
External canonicals should only be used when intentionally consolidating syndicated content to an external authoritative source.
Operating Instruction
Verify that external canonical declarations are intentional and part of a content syndication strategy. If this is unexpected, update the canonical target to point to an appropriate URL on your own domain. Document your canonical consolidation strategy to ensure all stakeholders understand which domain is authoritative.
Canonical leader weaker than member
Description
This hint is set on canonical groups where the designated canonical leader does not receive the most internal linking power. At least one member URL of the group has a higher PageRank.
Importance
The canonical tag tells search engines which URL should be preferred in ranking, but the site's internal linking structure tells search engines which URLs are important in the internal site structure. When these two signals conflict this usually indicates structural issues and search engines may:
- Weight the internal linking signals more heavily
- Disregard or deprioritize the canonical tag
The canonical tag is most effective when the designated leader also receives the most internal linking power within the group.
Operating Instruction
Audit the internal linking structure of your canonical groups. Ensure that the canonical leader receives the most internal link power. If other members are receiving more link power, consider:
- Changing the internal link graph to link the canonical leader instead of other canonical group member pages
- Re-evaluating which URL should actually be the canonical leader based on the site's link structure
Align your canonical strategy with your site's actual prioritization as expressed through internal linking.
Canonical leader is noindex
Description
This hint is set on canonical groups where the canonical leader has a noindex robots directive set. When the robots directive is set to noindex, search engines cannot index it.
Example
Leader with noindex:
<!-- member-A.html -->
<link rel="canonical" href="leader-B.html">
<!-- leader-B.html -->
<meta name="robots" content="noindex">
<link rel="canonical" href="leader-B.html">Importance
When this hint triggers, the canonical tag and noindex directive conflict:
- The canonical tag says: "Prefer this URL as the authoritative version of this content"
- The noindex directive says: "Do not index this URL"
Search engines might resolve this contradiction by disregarding the canonical signals entirely, effectively not consolidating duplicate content.
Operating Instruction
Identify which URL should actually be the canonical leader that search engines can index. Remove the noindex directive from the canonical leader or change the canonical target to a different URL that is indexable. Ensure that:
- The canonical leader does not have a noindex meta tag
- The canonical leader does not have a noindex HTTP header
Canonical leader blocked by robots.txt
Description
This hint is set on canonical groups where the canonical leader is blocked from crawling by a robots.txt directive. When the canonical leader is disallowed in robots.txt, search engines cannot access or render the URL. # Example
Leader blocked by robots.txt:
User-agent: *
Disallow: /leader-B.html<!-- Member pages point to blocked leader -->
<!-- Member-A.html -->
<link rel="canonical" href="leader-B.html">Importance
For the canonical tag to function, search engines must be able to:
- Crawl the canonical leader URL
- Index the content of the canonical leader URL
When robots.txt disallows access to the canonical leader this fails. Search engines might respond to inaccessible canonical targets by:
- Disregarding the canonical signal
- Treating member URLs as separate entities
- Potentially indexing multiple versions of the content
Blocking the leader in robots.txt defeats the purpose of using canonical tags and usually indicates structural issues.
Operating Instruction
Choose the resolution that best fits your situation; any of the following can restore a healthy canonical group:
- Change the canonical leader by editing the
<link rel="canonical">on member pages so they point to a different URL that is crawlable and indexable. - Allow crawling of the canonical leader in robots.txt.
Canonical leader HTTP status not 200
Description
This hint is set on canonical groups where the canonical leader returns an HTTP status code other than 200 (OK). Common problematic status codes include:
- 4xx Client Errors: 404 Not Found, 410 Gone, 403 Forbidden
- 5xx Server Errors: 500 Internal Server Error, 503 Service Unavailable
Importance
The canonical tag consolidates content to a specific URL. That URL must be:
- Crawlable (not blocked by robots.txt)
- Accessible (HTTP 200 status)
- Indexable (not blocked by noindex robots directive)
If the canonical leader returns an non 200 HTTP status, search engines:
- May disregard the entire canonical group
- Might index member URLs as separate content instead of consolidating them
RFC 6596 advises against designating a canonical target that returns an error code, such as a 4xx response. When the canonical link relation is declared improperly, applications can use their own heuristics when processing the resource, such as ignoring the improper canonical designation.
Operating Instruction
Ensure all canonical leaders return an HTTP status 200. If a leader is genuinely gone or obsolete, update the canonical links to point to a new, accessible URL.
Canonical leader missing self reference
Description
This hint is set on canonical groups where the canonical leader lacks a self-referencing canonical tag.
Example
Leader without self-reference:
<!-- Leader.html - Missing the self-referencing canonical -->
<html>
<head>
<title>Leader</title>
<!-- no canonical tag at all -->
</head>
<body>
<h1>Content</h1>
</body>
</html>Leader with self-reference:
<!-- Leader.html - With self-referencing canonical -->
<html>
<head>
<title>Leader</title>
<link rel="canonical" href="leader.html">
</head>
<body>
<h1>Content</h1>
</body>
</html>Importance
While a canonical leader is technically the canonical version of itself, and specifying a self-referential canonical is optional per RFC 6596, providing a self-reference is best practice. A self-reference
- Clarifies intent and makes the canonical structure explicit
- Helps to validate the canonical structure
- Prevents confusion about whether the leader was intentionally designated or incidentally became the target
- Mitigates effects of unintentional content duplication in the future
Operating Instruction
Review all canonical leader URLs and ensure each one includes an explicit self-referencing canonical tag.
Canonical URL redirects
Description
This hint is set on canonical groups where at least one member URL issues a server side or client side redirect instead of displaying content directly.
Examples
Server side redirect
A canonical-> B 30x redirect -> C
Client side redirect
A canonical-> B client side redirect -> C
Importance
Although the target of a canonical may be a redirect per RFC 6596 (temporary redirects are allowed, permanent ones are advised against), redirects within canonical groups should be avoided. Client-side redirects are particularly problematic, as the RFC provides no information on how to handle them. Canonical groups work best when the entire group resolves directly without additional redirects. When redirects exist within a group:
- Crawl efficiency decreases as more HTTP requests are needed to resolve the chain
- Reliability decreases as additional redirects introduce more potential failure points
While search engines attempt to follow through redirect chains, each additional hop weakens the signal and increases processing complexity.
Operating Instruction
Ideally, eliminate all redirects within canonical groups by changing the canonical links to directly point to the canonical leader.
Cross-language or region canonical mismatch
Description
This hint is set on canonical groups where URLs with different language attributes point canonically to each other.
Example
German page pointing to English leader:
<!-- de/page.html -->
<html lang="de">
<head>
<title>Inhalt (German)</title>
<link rel="canonical" href="https://example.com/en/page.html">
</head>
<body>
<h1>German Content</h1>
</body>
</html>
<!-- en/page.html -->
<html lang="en">
<head>
<title>Content (English)</title>
<link rel="canonical" href="https://example.com/en/page.html">
</head>
<body>
<h1>English Content</h1>
</body>
</html>Importance
Canonical tags consolidate duplicate content within the same language only. They are not intended for indicating language variants or regional versions. When pages with different language or region attributes point canonically to each other, it prevents search engines from properly segmenting and ranking localized content.
A German page canonicalizing to an English page tells search engines to ignore the German version entirely. This blocks German-language users from finding the localized content, collapses language-specific rankings, and breaks the user experience.
Equally problematic is using canonical tags between regional variants of the same language. An en-US page should never canonical to en-GB or en-AU. Regional variants have different target audiences and may have region-specific content, pricing, or legal requirements or simply are hosted on a server closer to the user. Using canonical across these variants tells search engines to suppress one region's content in favor of another.
For all language and regional variants, whether different languages like de and en, or regional variants like en-US and en-GB, use hreflang attributes instead. hreflang properly signals that these are alternate versions for different audiences without suppressing any of them.
Operating Instruction
Audit your international site structure to ensure canonical tags stay within the same language and region, and verify that hreflang is properly configured to link all variants.
Canonical found twice
Description
More than one canonical element was found, either as a <link> tag with rel="canonical" or an according link header.
This report shows all URLs with double canonical definitions on your website, that we were able to identify.
Examples
HTML Head:
<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">HTTP Header:
Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"
HTML Head & HTTP Header:
<link rel="canonical" href="http://example.com/">
Link: <http://example.com/>; rel="canonical"
Importance
The canonical link relation should be unique for each resource. RFC 6596 provides the guideline that only one canonical link relation should be defined per resource to avoid confusion. Using more than one canonical link element can cause conflicting definitions or unexpected behavior when documents are available on more than one URL at a time.
Operating Instruction
We suggest that you identify all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third-party code (plugins, extensions and add-ons of the CMS).
If canonical definitions are found twice in a document, this often occurs due to usage of multiple SEO plugins or an SEO plugin in combination with manual canonical definitions.
Canonical found twice and differs
Description
More than one canonical element has been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.
This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.
Examples
HTML Head:
<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">HTTP Header:
Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"
HTML Head & HTTP Header:
<link rel="canonical" href="http://example.com/">
Link: <http://example.com/page1.html>; rel="canonical"
Importance
Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. RFC 6596 provides the guideline that only one canonical link relation should be defined per resource to avoid confusion. This might lead to issues with duplicate content.
Operating Instruction
We suggest that you correct all conflicting canonical definitions by removing the unnecessary definition.
Canonical found outside <head>
Description
This hint is set on canonical groups and canonical members where a canonical element was placed outside the <head> section and so search engines will ignore it.
This report helps you to identify all occurrences of canonical definitions that are invalid due to being placed outside the <head> tag on the crawled website.
Example
<html>
<head>
...
</head>
<body>
...
<link rel="canonical" href="http://example.com/">
...
</body>
</html>Importance
Some search engines ignore improper canonical definitions. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and with the representation of the site in search results. Per the HTML 5 Living Standard, the link element with rel="canonical" is only allowed in the head element and not in the body element, as "canonical" is not a body-ok keyword. Placing it outside the head will cause it to be ignored by conforming parsers.
Operating Instruction
Keep your canonical definitions inside the HTML <head> tag, as the HTML 5 Living Standard requires link elements with rel="canonical" to be placed in the head element only. Inspect the DOM in your browser to find errors based on this hint, as HTML parsers may automatically add missing <head> tags or close them prematurely.
Invalid attributes in canonical annotation
Description
The canonical link element contains attributes that change its meaning. Common problematic attributes include hreflang, lang, media, and type.
Examples
Canonical with hreflang (invalid):
<link rel="canonical" hreflang="de" href="http://example.com/de/">Canonical with media query (invalid):
<link rel="canonical" media="screen and (max-width: 640px)" href="http://example.com/">Canonical with type attribute (invalid):
<link rel="canonical" type="text/html" href="http://example.com/">Correct canonical (no extra attributes):
<link rel="canonical" href="http://example.com/">Importance
The canonical tag must be clean and contain only the rel and href attributes to function properly. Adding attributes like hreflang, lang, media, or type changes the semantic meaning of the tag per HTML specifications. While RFC 6596 does not specify attributes for the canonical relation, search engines may treat these as alternate link annotations rather than canonical directives, effectively disregarding the canonical instruction. This can lead to duplicate content issues and prevent proper consolidation of multiple URLs.
Operating Instruction
Review all canonical tags and remove any non-standard attributes. The canonical link should contain only rel="canonical" and an href attribute pointing to the target URL.
Superfluous HTML attributes
Description
The canonical link element contains non-standard HTML attributes that are not part of the canonical link specification. Common examples include custom data-* attributes, title, id, or other attributes that add metadata to the tag without serving a functional purpose.
Examples
Canonical with custom data attribute (not recommended):
<link rel="canonical" href="http://example.com/" data-id="12345">Canonical with title attribute (not recommended):
<link rel="canonical" href="http://example.com/" title="Canonical Page">Canonical with id attribute (not recommended):
<link rel="canonical" id="canonical-link" href="http://example.com/">Correct minimal canonical:
<link rel="canonical" href="http://example.com/">Importance
The canonical link specification defines a minimal set of attributes: rel and href. While RFC 6596 does not mandate attributes, additional attributes are unnecessary and can:
- Increase HTML file size unnecessarily
- Confuse automated tools and parsers that expect strict adherence to specifications
- Make the code harder to maintain
- Potentially cause issues with search engine rankings if not ignored
While modern browsers and search engines typically ignore superfluous attributes, adhering to best practices ensures compatibility and maintainability.
Operating Instruction
Review all canonical tags and remove any attributes that are not part of the canonical specification. Keep only the rel="canonical" and href attributes.
Canonical URL is not absolute
Description
If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.
This report shows all occurrences of canonical usage with URLs that are not absolute.
Examples
Absolute URL:
<link rel="canonical" href="http://example.com/folder/page.html">
Short URL:
<link rel="canonical" href="page.html">
Short URL - root folder relative:
<link rel="canonical" href="/folder/page.html">
Short URL - protocol relative:
<link rel="canonical" href="//example.com/folder/page.html">
Importance
Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:
- duplicate content issues with different protocol versions
- duplicate content issues with different domains
- duplicate content issues with different folders
RFC 6596 explicitly allows relative references, but absolute references are recommended to avoid these issues.
Operating Instruction
We suggest that you use absolute URLs for canonical links.
Canonical contains malformed or empty href
Description
This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.
Examples
Empty canonical:
<link rel="canonical" href="">
Malformed canonical:
<link rel="canonical" href="htp://example.com/">
Importance
Malformed or empty hrefs in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.
Operating Instruction
We suggest that you check for malformed or empty canonical hrefs on a regular basis.
Canonical URL contains fragment identifier
Description
The canonical link element specifies a URL that includes a fragment identifier (the part after the # symbol).
Examples
Canonical with fragment identifier (invalid):
<link rel="canonical" href="http://example.com/page.html#section-1">Canonical with hash and parameters (invalid):
<link rel="canonical" href="http://example.com/?param=value#content">Correct canonical without fragment:
<link rel="canonical" href="http://example.com/page.html">Importance
Fragments serve a different purpose than canonical tags:
- Fragments tell the browser to jump to a specific section within a page after loading
- Canonicals tell search engines which URL is the authoritative version of content
When a canonical includes a fragment, search engines will strip it out but the presence of the fragment indicates a misunderstanding of how canonicals work. This can:
- Signal underlying configuration issues in your CMS or templating system
- Reduce the precision of your canonical implementation
- Indicate that the canonical is being generated incorrectly
Operating Instruction
Review all canonical tags and ensure they point to clean URLs without fragment identifiers. Remove any # and everything after it from canonical href attributes.
Canonical URL changed
Description
This hint identifies cases where the canonical URL is either changed or added dynamically to an HTML document during page rendering using JavaScript.
Examples
<script>
var can = document.createElement("link");
can.setAttribute("rel", "canonical");
can.setAttribute("href", "https://example.com/")
document.head.appendChild(can);
</script>
<script>
document.querySelector('link[rel="canonical"]').href = 'https://example.com/';
</script>
<script>
document.querySelector('link[rel="canonical"]')?.remove();
</script>
Importance
Canonical URLs are crucial for SEO as they help search engines determine the preferred version of a page, especially when duplicate content exists. However, search engines may not always execute JavaScript properly, and even if they do, the rendering process might be cut off before the script to modify the canonical link is executed. As a result, this can lead to incorrect canonicalization, which may affect your page's visibility in search rankings. If the canonical URL is changed dynamically via JavaScript, there is a risk that search engines might index the wrong page or fail to recognize the correct canonical version.
Operating Instruction
Ensure that the canonical is set within the HTML itself or via an HTTP header, not via JavaScript. If JavaScript is necessary, ensure the canonical tag is rendered early in the page load or use server-side rendering to guarantee that search engines receive the correct information.
Canonical target URL is a redirect
Description
The canonical link element specifies a target URL that responds with a redirect (HTTP 3xx status code) or contains a client side redirect instead of directly serving content. This means that after identifying the canonical target, search engines must follow an additional redirect to access the actual content.
Examples
Canonical pointing to a 301 redirect:
<!-- page-a.html -->
<link rel="canonical" href="http://example.com/old-page.html">
<!-- HTTP Response for old-page.html -->
HTTP/1.1 301 Moved Permanently
Location: http://example.com/new-page.htmlImportance
When a canonical target is a redirect, several issues arise:
- Crawl efficiency decreases: Search engines must make multiple HTTP requests to resolve the full chain
- Reliability risk: Additional redirects introduce more potential failure points
- Signal clarity: The indirect path muddles the canonical signal
- Processing complexity: More hops mean more processing required to identify the true canonical target
RFC 6596 allows the canonical target to be a temporary redirect (e.g., 302, 303, 307) but advises against permanent redirects (e.g., 301). Additionally, the RFC does not provide any information about how to handle client-side redirects (such as JavaScript redirects or meta refresh). Canonical tags work best when they point directly to the final, actual content location without intermediate redirects.
Operating Instruction
Identify all canonical targets that issue redirects and update your canonical tags to point directly to the final destination.
Cross-language canonicalization
Description
A page with content in one language includes a canonical tag that points to a page in a different language.
Examples
French page canonicalizing to English page (incorrect):
<!-- /fr/article.html -->
<html lang="fr">
<head>
<title>Article en Français</title>
<link rel="canonical" href="https://example.com/en/article.html">
</head>
<body>
<h1>Contenu en Français</h1>
</body>
</html>
<!-- /en/article.html -->
<html lang="en">
<head>
<title>Article in English</title>
<link rel="canonical" href="https://example.com/en/article.html">
</head>
<body>
<h1>English Content</h1>
</body>
</html>Importance
Canonical tags consolidate content for the same language and audience. Cross-language canonicalization causes serious problems:
- Suppresses localized content: The localized page will not rank in its target language
- Breaks user experience: Users searching in French will not find the French version
- Wastes SEO potential: Effort spent creating localized content is negated
- Ignores regional differences: Content may differ between regions and languages
Language and regional variants are not duplicates—they are distinct versions for different audiences. They should never be consolidated with canonical tags.
Operating Instruction
Ensure that canonical tags only link within the same language and region.
Non-HTML resource to HTML canonical
Description
A non-HTML resource, such as a PDF document, uses an HTTP Link header to specify a canonical tag that points to an HTML equivalent of the content.
Examples
PDF with Link header canonicalizing to HTML:
HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <https://example.com/article.html>; rel="canonical"
PDF file structure:
https://example.com/article.pdf
↓ (canonical)
https://example.com/article.html
Importance
When different formats of the same content exist (PDF, DOCX, HTML, etc.), using canonical tags to consolidate them to the canonical HTML version helps search engines understand which resource should be the primary version for ranking. This indicates proper handling of resource consolidation and is considered a best practice.
Operating Instruction
This hint indicates proper canonical implementation and is purely informational.
HTML page to non-HTML resource canonical
Description
An HTML page includes a canonical tag that points to a non-HTML resource, such as a PDF document.
Examples
HTML page canonicalizing to PDF (not recommended):
<!-- article.html -->
<html>
<head>
<title>Article</title>
<link rel="canonical" href="https://example.com/article.pdf">
</head>
<body>
<h1>Article Content</h1>
<p>Read the full article in PDF format.</p>
</body>
</html>Importance
Canonicalizing from HTML to a non-HTML resource causes several problems:
- Deprioritizes HTML content: Search engines may crawl and index the less accessible PDF instead of the HTML page
- Harms user experience: Search results may link directly to PDFs instead of HTML pages
HTML should always be the canonical version when both HTML and non-HTML formats exist.
Operating Instruction
Always canonicalize from non-HTML resources to HTML, never the reverse.
URL discovered only via canonical
Description
A URL was discovered and identified as part of a canonical group exclusively through canonical tag annotations, but no internal <a href> links point to it from other pages on the site.
Importance
Pages discovered only through canonical tags may indicate:
- Orphaned content: The page was created or moved but not integrated into site navigation
- Temporary or unfinished pages: The page may not be production-ready
- Incorrect canonical setup: The page may have been accidentally excluded from the site structure
- Crawling issues: Search engines might not properly crawl or index truly isolated pages
- Maintenance challenges: Future updates to the page may be overlooked
Canonical tags are meant to enhance an existing site structure, not replace it. Important pages should be discoverable through normal navigation and internal linking by users and bots.
Operating Instruction
Add internal links to isolated pages if they should be part of your public site structure or completely remove them and all canonical links pointing towards them.
Canonical target is noindex
Description
A page includes a canonical element that points to another URL which has a noindex robots directive set. When the canonical target cannot be indexed, the canonical tag becomes ineffective.
Example
Page with canonical pointing to noindex target:
<!-- article-variant.html -->
<html>
<head>
<title>Article</title>
<link rel="canonical" href="https://example.com/article-main.html">
</head>
<body>
<h1>Article</h1>
</body>
</html>
<!-- article-main.html -->
<html>
<head>
<meta name="robots" content="noindex">
<link rel="canonical" href="https://example.com/article-main.html">
</head>
<body>
<h1>Article</h1>
</body>
</html>Importance
This configuration presents a logical contradiction:
- The canonical tag says: "Index this URL as the authoritative version"
- The noindex directive says: "Do not index this URL"
This typically indicates a configuration error or the result of multiple conflicting systems generating directives. When search engines encounter this conflict, they may:
- Index neither page, if the noindex is followed strictly
- Disregard the canonical signal and treat both URLs as separate
- Index the variant page instead of the canonical target
Operating Instruction
Resolve the conflict by choosing which signal should take precedence. Either remove the canonical or ensure that the canonical target is:
- Indexable (no noindex directive)
- Crawlable (not blocked by robots.txt)
- Returning HTTP 200 status
Canonical target blocked by robots.txt
Description
A page includes a canonical tag that points to a URL which is disallowed in the site's robots.txt file. When the canonical target is blocked by robots.txt, search engines cannot crawl or render it.
Example
Canonical pointing to robots.txt blocked URL:
# robots.txt
User-agent: *
Disallow: /canonical-target.html<!-- article-variant.html -->
<html>
<head>
<title>Article</title>
<link rel="canonical" href="https://example.com/canonical-target.html">
</head>
<body>
<h1>Article</h1>
</body>
</html>Importance
When a canonical target is blocked by robots.txt, several problems result:
- Search engines cannot access the canonical target to understand the content
- The canonical signal becomes impossible to follow
- Search engines may treat all pages in the canonical group as separate entities
- Duplicate content consolidation fails
- Any canonical group member page may be indexed instead of the intended canonical target
This typically indicates a misconfiguration where robots.txt rules and canonical strategies are not synchronized.
Operating Instruction
Resolve this conflict by choosing one of two approaches:
If the target should be canonical and indexed, allow it in robots.txt. If the target must remain blocked in robots.txt, point the canonical to a different, crawlable URL.
Canonical target returns non-200 status
Description
A page includes a canonical tag that points to a URL which returns an HTTP status code other than 200 (OK). Common problematic status codes include client errors (4xx) and server errors (5xx).
Examples
Canonical pointing to 404 Not Found:
<!-- page-variant.html -->
<html>
<head>
<title>Page</title>
<link rel="canonical" href="https://example.com/missing-page.html">
</head>
</html>
<!-- HTTP Response for missing-page.html -->
HTTP/1.1 404 Not Found
Content-Type: text/htmlImportance
RFC 6596 considers designating a canonical target that returns an error code, such as a 4xx response, to be an improper canonical declaration.
When a canonical target returns a non-200 status, search engines cannot properly index or cache the canonical version, making consolidation impossible.
When the canonical target is inaccessible, search engines may:
- disregard the canonical signal
- index variant pages instead
- fail to consolidate content
This typically indicates a broken link, removed page, or error or server misconfiguration.
Operating Instruction
Ensure all canonical targets are valid, accessible pages that return HTTP 200 OK or remove the canonical links pointing to them.
Canonical target is an external URL
Description
A page on your site includes a canonical tag that points to a URL on a different domain (external site). This tells search engines that the content on your site is a duplicate or variant of content on another domain.
Examples
Internal page pointing to external canonical:
<!-- Article on example.com -->
<html>
<head>
<title>Article Title</title>
<link rel="canonical" href="https://example.org/article">
</head>
<body>
<h1>Article Title</h1>
<p>Article content...</p>
</body>
</html>Importance
External canonicals are legitimate in specific scenarios (content syndication, partnerships), but they must be intentional and well-coordinated to avoid accidentally consolidating your own content to unintended sources.
External canonicals have significant implications:
- The external URL receives all ranking credit
- Search results link to the external domain instead of your content
- Configuration errors can accidentally consolidate your site to an external domain
Operating Instruction
Ensure to only use external canonicals when part of an intentional syndication or partnership arrangement.
Group partially unknown
Description
A canonical group is not fully known when some members could not be downloaded and parsed, or when they return an explicit error status (4xx/5xx), meaning their canonical metadata cannot be processed.
Examples
For a canonical group to be complete and verifiable, all members must be downloaded and their canonical metadata fully processed. This requirement is not met when
- The URL is external
- The URL is blocked through robots.txt
- The URL returns an explicit error status (4xx/5xx)
- The URL cannot be downloaded due to a connection error (e.g. a timeout, DNS failure, or server refused connection)
- The URL has not been crawled, due to crawl limits
Importance
Without complete data from all group members, full canonical validation cannot be performed.
Operating Instruction
Ensure all members of a canonical group are accessible to the crawler and return valid responses with processable canonical metadata. This can be achieved by extending the crawl scope to include all group member URLs or by increasing crawl limits to allow more pages to be processed.