Audisto <head> Error Checker

How to detect technical issues in HTML head

This group of hints highlights issues related to the <head> area of a website. Keeping control over the head area on your website is key for your SEO effort.

Errors in the <head> area of your website can cause

  • issues with indexing
  • issues with duplicate content
  • issues with relevancy
  • issues with appearance in search results

Example: Audisto <head> Error Check with the <head> hint reports for the current crawl

Audisto <head> Error Check with the <head> hint reports for the current crawl

Here is the list of all specific hints related to the HTML <head> area, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to the document's URL as base.

Examples

A base with an invalid protocol

<base href="htp://example.com">

A base with a white space in the domain name

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> found

Description

If a base is set in the HTML, the URL is flagged with this hint. If a base is set, all relative links are relative to the base.

Example
<base href="http://example.com/directory/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag will lead to issues with user experience and crawling when relative links are used.

In addition the usage of the base tag often results in problems for poor HTML parsers and poorly programmed robots.

Operating Instruction

We suggest not to use the base tag at all. Remove base tags if possible.

We also suggest to use absolute URLs.

<base> found more than once and differs

Description

More than one <base> directives were found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake

<base href="example.com/">

Relative path on purpose

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> is same as URL

Description

A <base> tag was found, but points to the same URL, rendering itself useless.

Example

HTML base on http://example.com/page.html

<base href="http://example.com/page.html">
Importance

The base tag defines the URL base for relative links in a document. There is no point in using the current URL as base href as it is the same as if the base tag isn't used at all, rendering it useless.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Find all URLs on the crawled site with this report, that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> points to other URL

Description

A <base> tag was found, and points to another URL.

Example

Base on http://example.com/page.html

<base href="http://example.com/page2.html">
Importance

The base tag defines the URL base for relative links in a document. Using a base tag together with fragment only links will make those links pointing to the specific anchor on the URL of the base tag.

If this is unintended it leads to issues with user experience and crawling.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<link rel="canonical"> URL is not absolute

Description

If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.

This report shows all occurrences of canonical usage with URLs that are not absolute.

Examples

Absolute URL

<link rel="canonical" href="http://example.com/folder/page.html">

Short URL

<link rel="canonical" href="page.html">

Short URL - root folder relative

<link rel="canonical" href="/folder/page.html">

Short URL - protocol relative

<link rel="canonical" href="//example.com/folder/page.html">
Importance

Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest using absolute URLs for canonical links.

<link rel="canonical"> contains malformed or empty href

Description

This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.

Examples

Empty canonical

<link rel="canonical" href="">

Malformed canonical

<link rel="canonical" href="htp://example.com/">
Importance

Malformed or empty href in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.

Operating Instruction

We suggest to check for malformed or empty canonical href on a regular base.

<link rel="canonical"> found

Description

A canonical-element has been found, either as a <link> tag with rel="canonical" or an according link header.

Examples

Canonical in HTML <head>

<link rel="canonical" href="http://example.com">

Canonical in HTTP header

HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <http://example.com/page.html>; rel="canonical"
Content-Length: 4223
...
Importance

Canonical links are a valuable tool for webmasters to define the preferred version of a document if it is available on more than one URL at the same time.

Operating Instruction

Use the “canonical found" hint report to identify all URLs that contain a canonical link definition and find out how many URLs on the crawled site have canonical link definitions.

<link rel="canonical"> found outside <head>

Description

A canonical element was placed outside of the <head> section, search engines will ignore it.

This report helps you to identify all occurrences of canonical definitions, that are invalid due to being placed outside the <head> tag on the crawled website.

Example
<html>
  <head>
    ...
  </head>
  <body>
    ...
    <link rel="canonical" href="http://example.com/">
    ...
  </body>
</html>
Importance

Some search engines ignore improper canonical designation. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and representation of the site in search results.

Operating Instruction

Keep your canonical definitions inside the HTML <head> tag, so they don't get ignored by search engines.

<link rel="canonical"> found twice

Description

More than one canonical elements was found, either as a <link> tag with rel="canonical" or an according link header.

This report shows all URLs with double canonical definitions on your website, that we were able to identify.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/>; rel="canonical"
Importance

Using more than one canonical link element can cause conflicting definitions or unexpected behaviour when documents are available on more than one URL at a time.

Operating Instruction

We suggest identifying all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third party code (plugins, extensions and add-ons of the CMS).

If canonical definitions are found twice on a document, this often occurs due to usage of multiple SEO plugins or a SEO plugin in combination with manual canonical definitions.

<link rel="canonical"> found twice and differs

Description

More than one canonical elements have been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.

This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/page1.html>; rel="canonical"
Importance

Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. This might lead to issues with duplicate content.

Operating Instruction

We suggest correcting all conflicting canonical definitions by removing the unnecessary definition.

<link rel="canonical"> not found

Description

A canonical-element has not been found, neither a <link> tag with rel="canonical" nor an according link header. This may be intended.

This report identifies all URLs on the crawled website, that do not have a canonical link element.

Importance

Missing canonical definitions can lead to issues with duplicate content if the document is a duplicate of another document on the site. This can also happen if 3rd party sites copy your content or you syndicate your content.

Operating Instruction

We suggest using canonical definitions in any case. Canonical links should be self referencing by default.

<link rel="canonical"> points to other URL

Description

If the canonical element is found and points to a different URL, the URL is flagged with this hint.

Use this report to identify all instances of canonical elements pointing to other URLs.

Examples

The canonical link element points to the SSL version of the document:

Canonical link element for http://example.com/page.html

<link rel="canonical" href="https://example.com/page.html">

The canonical link element points to a URL without GET-parameter:

Canonical link element for http://example.com/page.html?a=1

<link rel="canonical" href="http://example.com/page.html">
Importance

The canonical link URL specifies a prefered version of a document that is available on more than one URL at a time.

By using a canonical link element pointing to another URL, you are telling search engines to prefer the target URL in search results.

If URLs that are not supposed to be shown in search results are part of the internal link graph, this can lead to waste of crawl budget.

Operating Instruction

You might consider changing internal links to point directly to the prefered version of the document to save crawl budget.

You might also want to evaluate if multiple URLs for one document are necessary at all.

<link rel="canonical"> points to same URL

Description

If the canonical element is found and points to the same URL, the URL is flagged with this hint.

This report shows all occurrences of self referencing canonical link elements on the crawled website.

Example

Canonical link element for http://example.com/page.html

<link rel="canonical" href="http://example.com/page.html">
Importance

Self referencing canonical is the suggested way to use canonical definitions. It prevents duplicate content issues if 3rd parties copy or syndicate your content.

Operating Instruction

You might want to remove the canonical element from the HTML to decrease file size. However this might lead to problems when 3rd parties copy or syndicate your content.

<link> found outside <head>

Description

A <link> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain link tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</body>
</html>

How it should be

<html>
<head>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the link tag outside the HTML <head> is not valid. This may lead to unexpected behaviour or appearance of the website with some clients. Even though modern browsers are using a range of methods to autocorrect this type of common issue, it is not suggested to rely on the browser's ability to guess what the webmaster intended to achieve.

If the link tag is used to reference an external style sheets file, browsers use the information from the linked CSS file to render the site based on that information. If stylesheets have to be processed in the middle or at the end of the document, the browser will have to re-render the entire document based on the given changes. This can lead to a drop in performance and user experience.

If a canonical link element (<link rel="canonical" href="http://www.example.org" />) is used outside of the HTML <head>, it will be ignored by search engines. This might lead to issues with duplicate content, e.g. unexpected behaviour of the website in search results and ranking problems.

Operating Instruction

We suggest to move misplaced link tags to the <head>.

<meta description> missing or empty

Description

If the meta-description is missing or empty, the URL is flagged with this hint. Use this report to identify all URLs that are missing a proper meta description.

Example
<html>
<head>
...
<meta name="description" content="">
...
Importance

The meta description is usually the first choice for the description text in search results snippets. If the meta description is missing, you give up control over the appearance of your documents in the search results. Search engines will then use parts of the documents content as a description, which might lead to unexpected appearance of a site's snippets in search results.

Operating Instruction

We suggest using proper meta descriptions for all documents that are supposed to be indexed by search engines.

<meta description> occurs more than once

Description

If a meta description tag is found more than once in teh HTML, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="First meta description">
<meta name="description" content="Second meta description">
...
Importance

Having more than one meta description can lead to unexpected appearance of the URL in search results. This may result in lower user engagement and therefore a drop for user signals for your site. This may eventually hurt the rankings of the website.

Operating Instruction

There should be only one meta-description for a URL.

Error scenarios like this usually appear due to different software automatically adding meta descriptions. If this issue occurs on a large scale, check if there is a script or CMS plugin automatically adding meta descriptions to your documents.

<meta description> too long for Google snippet

Description

If the meta description is too long to be displayed in the snippet in search results, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="Example.com - The very best long meta descriptions online - We have one of the longest meta descriptions in the internet.">
...
Importance

If the meta description is too long to be displayed in the snippet in search results, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement. Lower user engagement might lead to negative user signals, which can be regarded as a ranking factor by modern search engines and eventually lead to worse rankings of the site in search results.

Operating Instruction

We suggest to analyze the Click-through rates from search results to URLs with a meta description, that is too long to be shown properly in search snippets. If a URL flagged with this hint performs low in terms of Click-through rate, you may want to consider shorten the meta description.

<meta keywords> found

Description

If the HTML contains meta-keywords the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta keywords tag.

Example
<meta name="keywords" content="foo, bar" />
Importance

Meta keywords have been interpreted by search engines in the early days of search technology. If a webmaster added meta keywords, it helped the search engines to determine the topical focus of a document with limited processing resources. However, major search engines don't use the meta keywords any more.

Nowadays, keywords will have no positive impact on ranking in search. In fact, they only add to the size of the document.

Operating Instruction

We suggest to remove the meta keywords tags from your site unless they are required for a specific purpose other than SEO.

<meta language> found

Description

A meta tag <meta name=language> was found. Against common believe this does not define the language of the document. In fact, it is not even defined in any standard.

Example
<html>
<head>
    <meta name="language" content="de-DE">
    ...
</head>

Expected behaviour: Document language is detected as de-DE.

Actual behaviour: Document language is detected as browser's default language.

Correct implementation:

<html lang="de-DE">
<head>
    ...
</head>
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the URL to the matching audience.

Operating Instruction

We suggest not to use a <meta name=language> element at all. Instead use the lang-attribute on the <html> tag. For more details, see W3C's advise on language settings.

<meta refresh> found

Description

If a meta-refresh is found the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta refresh.

Example
<meta http-equiv="refresh" content="5; URL=http://www.example.com/">
Importance

A meta refresh is a client side redirect that triggers a GET request after a given time. It is sometimes used to automatically forward users to another URL. This method has been used widely to manipulate search engines, so these might misinterpret usage of a meta refresh as a sneaky redirect. A meta refresh with a delay of more than one second also violates the Web Content Accessibility Guidelines. Using a meta refresh might result in bad user experience and issues with rankings in search engines.

Operating Instruction

You might consider replacing it with a HTTP-redirect using a 301 or 302 status code or even a JavaScript client side redirect, depending on requirements.

<meta> found outside <head>

Description

A <meta> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain meta tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<meta name="description" content="foo">
...
</body>
</html>

How it should be

<html>
<head>
...
<meta name="description" content="foo">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the meta tag outside the HTML <head> is not valid unless

  • HTML5 is used and
  • an itemprop attribute is used.

So using a meta tag outside of the head area is only viable, if it is used to specify structured data properties. Alternatively, the itemprop attribute can be defined in other tags as well, e.g. <span>, <p>, <img>, which would offer backwards compatibility to HTML versions below HTML5.

If meta tags are not used for structured data, i.e. no itemprop attribute, they are required to be in the <head> to be considered valid. If they are placed outside of the <head> they might just get ignored by search engines.

Operating Instruction

If you find meta tags that need to be in the <head>, you should move them there. In case the discovered meta tags are used for structured data, we suggest to assign the itemprop attributes to other elements in the markup. For HTML versions below HTML5 this is a requirement for valid code.

<title> found outside <head>

Description

A <title> tag was placed outside of the <head> section, where it may have no effect. Use this report to identify all occurences of misplaced HTML <title> tags.

Example
<html>
<head>
...
</head>
<body>
...
<title>Title of the Site</title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. If the title tag is placed outside the HTML <head>, it may be ignored by search engines. This may lead to issues with the search snippet and the site's ranking in search results.

Operating Instruction

If this hint shows up in your crawl report, you should move all misplaced title tags into the HTML <head> section.

<title> missing or empty

Description

If the <title> tag is missing, the URL is flagged with this hint. Use this report to identify all case of missing title tags on the crawled website.

Example

Empty Title

<html>
<head>
...
<title></title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. The document's title is the primary resource for the title of the snippet in search results. If the <title> tag is missing or empty, one of the most important ranking factors is basically left out. This will very likely harm the ranking in search results and can also harm the Click-through rate from search results for the given URLs.

Operating Instruction

If this hint shows up in your crawl report, you should add title tags to all found URLs.

<title> occurs more than once

Description

If the <title> tag is found more than once in the source code, the URL is flagged with this hint.. There should be only one title for a document.

Example
<html>
<head>
...
<title>Welcome to Example.com</title>
<title>Example.com - Best Examples in the Internet</title>
...
Importance

The HTML <title> tag is an important ranking factor, as it is literally supposed to describe the content of the document. Having more than one HTML <title> tags can therefore lead to unexpected appearance of the URL in search results. Additionally it might harm the rankings of the document.

Operating Instruction

If this hint shows up in your crawl report, you might want to make sure you only use one title tag on all found URLs.

Error scenarios like this usually appear due to different software automatically adding HTML <title> tags. If this issue occurs on a large scale, check if there is a script or CMS plugin adding HTML <title> tags to your documents.

<title> short or single word

Description

If the title of a document has less than 10 chars or consists only of a single word, the URL is flagged with this hint.

Example
<html>
<head>
...
<title>Example</title>
...
Importance

The HTML <title> tag should sum up the content of the document, so it's easy to understand what the document is all about. It should be descriptive as well as appealing. Very short titles usually tend to be neither. Using very short titles may lead to lower user engagement as well as lower rankings in search results.

Operating Instruction

If this hint shows up in your crawl report, consider writing more descriptive and appealing titles for the URLs flagged with this hint.

<title> too long for Google snippet

Description

If the title of a document is too long to be displayed in the snippets in search results, the URL is flagged with this hint.

Example
<html>
<head>
...
<title>Example.com - Best Examples Site - We have one of the the longest titles in the internet</title>
...
Importance

If the title of a docuemnt is too long to be displayed in the snippet in search results, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement, which might eventually hurt your site's rankings in search results.

Operating Instruction

You might want to change the title so the title could be displayed in the snippet without beeing shortened.

Charset: Charset set in HTTP Content-Type header and in document differ.

Description

Both the document and the HTTP Content-Type header specify a charset, but these are not identical. Discover all occurences of conflicting duplicate charset definitions on the crawled website.

Examples

HTTP header

HTTP/1.1 200 OK
Server: Apache
Date: Thu, 17 Dec 2015 15:34:23 GMT
Content-Type: text/html; charset=UTF-8
...

meta charset (HTML 5)

<meta charset="iso-8859-1">

meta content-equiv (HTML 4.01)

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

XML

<?xml encoding="iso-8859-1" ?>
Importance

If charset definitions in the HTTP header and the document differ, the browser has to use a heuristic to guess the correct charset to display the document. This may lead to problems handling the encoding of the document and slow down the rendering time for the document.

Note: There are multiple ways to specify the charset in the document that may cause the conflict, e.g. <?xml>, <meta charset> and <meta content type>.

Operating Instruction

We suggest to set a proper charset in the HTTP header and in the document to make it easy for web clients to render the document fast and as expected. Make sure the defined charsets are identical and not conflicting.

Charset: Not set

Description

There is no charset set, neither in the Content-Type HTTP header, nor in the document, e.g. through a <meta> tag.

Importance

If there is no charset defined in the HTTP header, the browser has to fall back to detect the charset to display the document. If the charset has to be guessed, this may lead to problems handling the encoding of the document. Additionally, this may slow down the rendering time for the document.

Operating Instruction

We suggest to set a proper charset in the HTTP header and in the document to make it easy for web clients to render the document fast and as expected. Make sure the defined charsets are identical and not conflicting.

Charset: Not set in document

Description

There is no charset set in the document, e.g. through a <meta> tag. Discover all URLs on the crawled website, that do not define a charset in the document. However, there may be a charset defined in the HTTP header.

Importance

If there is no charset defined in the document, the browser has to use the charset in the HTTP header or has to fall back to detect the charset to display the document. If the charset has to be guessed, this may lead to problems handling the encoding of the document. Additionally, this may slow down the rendering time for the document.

Operating Instruction

We suggest to set a proper charset in the HTTP header and in the document to make it easy for web clients to render the document fast and as expected. Make sure the defined charsets are identical and not conflicting.

Hreflang: Found

Description

The crawler detected <link> tags with an hreflang attribute set.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
</html>
Importance

Hreflang links allow to specify a prefered version of a URL on multilingual and multi-region websites, and help search engines to display the correct version of a URL.

Operating Instruction

Use the “Hreflang: Found" hint report to identify all URLs that contain a hreflang link definition and find out how many URLs on the crawled site have hreflang link definitions.

Hreflang: Language tags in HTML and hreflang self link differ

Description

The crawler detected <link> tags with an hreflang attribute set, and a link to the current URL. However, the languages of the document and the hreflang link differ.

Setting "x-default" as hreflang is accepted and does not trigger this hint.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

First example: Languages are totally different:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="fr" rel="alternate">
</head>
</html>

Second example: Languages differ in region or other aspects, while the main language is the same:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de-DE" rel="alternate">
</head>
</html>
Importance

The languages of a document and the hreflang link should match. However, differences in region or other aspects may be desired. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

If the language of hreflang attribute and the language of the document totally differ - like for example "de" and "fr", this should be fixed.

Hreflang: Self link found, but document has no language

Description

A <link>-tag with hreflang-attribute points to current URL, but the document itself has no language set.

Example

Example for http://de.example.com/

<html>
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>

Correct implementation:

<html lang="de">
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>
Importance

The languages of a document and the hreflang link should match. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

Always assign a language to a document, when using hreflang.

Hreflang: Self link missing

Description

While the crawler detected <link> tags with an attribute hreflang set, a link to self was missing. However, a link to the same URL is mandatory.

Example

Example for http://de.example.com/

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
</head>

Correct implementation:

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
Importance

If hreflang linking is incomplete or has errors, search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

Always add a link to self, since it is required. For more details see our guide on hreflang.

Hreflang: URL empty or malformed

Description

The crawler detected <link> tags with a hreflang attribute set. However, the href attribute does contain an empty or malformed URL.

Example

Empty href

<link href="" hreflang="de" rel="alternate">

Malformed href

<link href="htp://de.example.com/" hreflang="de" rel="alternate">
Importance

Malformed or empty href in hreflang links cause hreflang definitions to be invalid. Search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

We suggest to check for malformed or empty hreflang href on a regular base.

Hreflang: URL is not absolute

Description

If the hreflang element specifies a URL relative to the current URL, it is flagged with this hint.

This report shows all occurrences of hreflang usage with URLs that are not absolute.

Example

Absolute URL

<link href="http://de.example.com/page.html" hreflang="de" rel="alternate">

Short URL

<link href="page.html" hreflang="de" rel="alternate">

Short URL - root folder relative

<link href="/page.html" hreflang="de" rel="alternate">

Short URL - protocol relative

<link href="//de.example.com/page.html" hreflang="de" rel="alternate">
Importance

Using shortened URLs for hreflang links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest using absolute URLs for hreflang links.

Robots: Directives Missing

Description

No robots meta tag or X-Robots-Tag header directive was found.

Importance

If no robots meta tag or X-Robots-Tag header directive is used, this equals the "index, follow" robots directive. In this case, crawlers are not restricted by the document from crawling and indexing it.

Under these circumstances, URLs might get crawled and indexed, even though they are not supposed to be in the index. This might result in privacy issues as well as in a waste of indexing budget.

Operating Instruction

We suggest to specify robots directives for every document on your website.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robots definitions.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: follow

Description

The URL is set to "follow" by a robots directive, either per robots meta tag or the X-Robots-Tag header. If there are no robots directives specified, the URL will regarded as if "index, follow" was specified.

Discover all occurrences of "follow" robots directives or meta tags with this hints report.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, follow"

Robots directives in HTTP-header X-Robots-Tag

X-Robots-Tag: index, follow
Importance

If a URL is set to "follow" and there are no robots.txt restrictions interfering, search engines will usually follow all links in the document and crawl the target URLs.

Operating Instruction

We suggest to use the follow directive for robots by default.

Robots: index

Description

The URL is set to "index" by all robots directives, either per robots meta tag or the X-Robots-Tag header.

Find all URLs that are set to “index" with this report.

Examples

HTML tag

<meta name="robots" content="index, follow">

X-Robots-Tag in HTTP header

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index
...
Importance

The “index" directive for robots tells search engines the document is supposed to be indexed. Having URLs indexed that are not supposed to be indexed, can lead to

  • Issues with privacy
  • Issues with indexing budget
  • Issues with thin content or duplicate content
Operating Instruction

We suggest to check on a regular base, if there are parts of the site set to “index", that are not supposed to end up in the search results. There are several ways to deal with this situation, depending on requirements:

  • Set URLs to “noindex", if they should not appear in search results
  • Make links to these URLs not crawlable
  • Use URL Removal Tool offered by search engines like Google
  • Block crawling in robots.txt
  • Block access to the URLs

Robots: nofollow

Description

The site is set to "nofollow" by a robots directive, either per robots meta tag or the X-Robots-Tag header.

Find all instances of “nofollow" usage that have been discovered crawling your site.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">

X-Robots-Tag in HTTP-header

X-Robots-Tag: nofollow
Importance

Using the “nofollow" directive for a URL tells crawlers not to follow any links in the document. This also prevents PageRank flow to the target URLs. By using nofollow for internal links, you weaken your site.

Using the "nofollow" robots directive affects

  • PageRank flow
  • Website structure
  • Crawling
Operating Instruction

URLs that use the nofollow robots directive should be evaluated on a regular base to prevent possible issues. Nofollow should not be used for internal linking.

Instead you should consider setting the document to robots follow.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex

Description

The site is set to "noindex" by a robots directive, either per robots meta tag or the X-Robots-Tag header, but still part of the internal link graph.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="noindex, follow">

X-Robots-Tag in HTTP-header

X-Robots-Tag: noindex, follow
Importance

Using the “noindex" directive for a URL tells crawlers not to index the document. This also prevents the URL from showing up in the search results. If URLs with robots "noindex" directive are part of the internal link graph, they still get crawled and consume crawl budget. This may lead to issues with the crawlrate. These URLs also bind internal linkjuice and - on a large scale - can harm your sites rankings.

Legit use of noindex

A legit reason to keep URLs on "noindex" would be legal restrictions, like an imprint or privacy policy.

Problematic use of noindex

Using noindex on URLs, that are supposed to give structure to the site, e.g. categories, important tags, HTML sitemaps, should be avoided.

If URLs are supposed to give structure to a site in terms of SEO, then they should also add value for the user.

Operating Instruction

We strongly suggest reviewing your sites "noindex" URLs on a regular base to prevent issues with crawl budget and internal PageRank flow. If there is no need to keep a URL, that is set to noindex, it should be dropped with a 410 status code.

Try to improve structure URLs on an individual base. If these URLs are adding value to the user, they'll be worthwile to get indexed as well.

If noindex is used on URLs generated by sorting and filtering options, make sure to use a PRG-pattern instead of linking to these URLs directly.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.