Audisto Technical Error Checker

How to do technical short audit

The "Errors" hints group lists all hints from the other hints sections, that are considered an error - contrary to unexpected behaviour. These errors can - if not fixed - be harmful to a site's function, user experience and / or SEO.

This hints section helps to do a quick technical audit on a website and lists different kinds of clear errors that should be fixed.

Example: Audisto Technical Error Check with the technical hint reports for the current crawl

Audisto Technical Error Check with the technical hint reports for the current crawl

Here is the list of all specific hints that are part of the “errors” hints section and that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> has malformed href

Description

If a malformed href attribute value is found, the URL is flagged with this hint. A malformed href is usually an URI, that is not valid according to RFC3986 or a result of a parsing error due to invalid HTML.

Examples
<a href="htp://www.example.com">link</a>
<a href="htps://www.example..com">link</a>
<a href="http://www..example.com">link</a>
<a href="http://">link</a>
<a href="htps:// www.example.com">link</a>
<a href="://www.example.com">link</a>
Importance

A link with a malformed href can not be parsed and will therefore not be recognized by search engines. In addition links like this can result in issues with user experience.

Operating Instruction

Fix all malformed href attribute values on your website.

<a> links to fragment by name attribute on non-anchor

Description

The anchor contains a fragment link, but the target is defined by a name attribute and it not an anchor itself.

Example
<h1 name="#top">Headline</h1>
...
<a href="#top">Go to top/a>
Importance

Using the name attribute on non-anchor tags is not valid. Using the name attribute is only allowed for anchor tags. In addition, the name attribute is deprecated since XHTML 1.0 and should not be used any more.

Operating Instruction

We suggest to remove all intances of the "name" attribute. You should use the "id" attribute instead.

<a> links to fragment only, but <base> points to other URL

Description

An <a> element links to a fragment only, while there is a <base> pointing to another URL. Discover all URLs that contain fragment links along with a base tag pointing to another URL.

Example

Example for http://example.com/page.html

<base href="http://example.com/page2.html">
...
<a href="#top">link</a>

Expected behaviour: Browser requests http://example.com/page.html#top Actual behaviour: Browser requests http://example.com/page2.html#top

Importance

Fragment links are relative to the URL defined in the <base> element. If the <base> element is pointing to another URL, this may lead to unexpected user experience and issues with the crawlability of the website if fragment-only links are used.

Operating Instruction

We suggest not to use a <base> element if it is possible to avoid it. We also suggest using absolute links instead of fragment only links.

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to the document's URL as base.

Examples

A base with an invalid protocol

<base href="htp://example.com">

A base with a white space in the domain name

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> found more than once and differs

Description

More than one <base> directives were found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake

<base href="example.com/">

Relative path on purpose

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Find all URLs on the crawled site with this report, that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<h1> occurs more than once

Description

If more than one <h1> tag is found, the URL is flagged with this hint. Discover all URLs that contain more than one <h1> tag.

Examples
<h1><img src="logo.jpg" alt="Example.com"/></h1>
...
<h1>Primary Headline</h1>
Importance

The h1 is the most important headline in the document and should reflect the topic of the document. Having more than one <h1> tag is a sign of poor content structure. Content structure partly determines the content quality. While it is not a huge factor for most search engines, having more than one <h1> tag may be a negative signal in terms of content quality.

Operating Instruction

You might want to use only one <h1> tag per document.

<html> contains too many uncommon non-printable characters

Description

The HTML documents contains too many uncommon non-printable characters, and not all will be shown in live analysis. With this report you can discover all URLs on the crawled website that contain more than 50 uncommon non-printable characters. See the corresponding hint "<html> contains uncommon non-printable characters" for further information about what is "uncommon".

Importance

Non-printable characters are used as control characters and may not be visible in the source code, but nonetheless impact the behaviour of the site. This might affect crawling and user experience when they are inside of an anchor's href or an image's src attribute, possibly resulting in issues with the site's structure and ranking.

Finding too many non printable characters may be a hint for massive encoding issues in a document or documents that are not HTML documents.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> contains uncommon non-printable characters

Description

If uncommon non-printable characters are detected, the URL of the document containing the character will be flagged.

There are non printable characters, that will appear in almost every document, e.g. line feed (\n), carriage return (\r), horizontal tab (\t). In addition, there are commonly used non-printable characters, e.g. BOM, Soft hyphen, Left-To-Right-Mark and Right-To-Left-Mark. These characters will not cause the URL to get flagged with this hint. This hint detects all remaining non printable characters.

Examples

Due to the non printable nature of these characters, you'll find the character codes instead of the actual characters in the live analysis, enclosed by brackets.

[[&#xEFBBBF;]]
Importance

Non-printable characters may not be visible in the source code, but nonetheless impact

  • the behaviour of the site, e.g. when they are inside of an anchor's href or an image's src attribute
  • the ranking of the site, e.g. when they are an invisible part of a word.

This might affect crawling and user experience, possibly resulting in issues with accessibility and ranking.

Usually this hint is triggered by messed up encoding.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> starts with BOM

Description

There is an unicode byte order mark (BOM) at top of the HTML. Discover all URLs on the crawled website, that contain a BOM.

We currently detect BOM in the following encoding:

  • UTF-8
  • UTF-16 BE/LE
  • UTF-32 BE/LE
  • UTF-7
  • UTF-1
  • UTF-EBCDIC
  • SCSU
  • BOCU-1
  • GB-18030
Examples

Example UTF-8 BOM in HTML 5

EF BB BF<!DOCTYPE html>
<html lang="en">

How BOM looks in different encoding and representations:

Encoding BOM hex BOM dec
UTF-8 EF BB BF 239 187 191
UTF-16 (BE) FE FF 254 255
UTF-16 (LE) FF FE 255 254
UTF-32 (BE) 00 00 FE FF 0 0 254 255
UTF-32 (LE) FF FE 00 00 255 254 0 0
UTF-7 2B 2F 76 38 43 47 118 56
2B 2F 76 39 43 47 118 57
2B 2F 76 2B 43 47 118 43
2B 2F 76 2F 43 47 118 47
2B 2F 76 38 2D 43 47 118 56 45
F7 64 4C 247 100 76
UTF-EBCDIC DD 73 66 73 221 115 102 115
SCSU 0E FE FF 14 254 255
BOCU-1 FB EE 28 251 238 40
GB-18030 84 31 95 33 132 49 149 51
Importance

The unicode byte order mark is the unicode character U+FEFF. Some text editors add it to documents. The BOM is used to signal:

  • the byte order, or endianness
  • the fact that the text is unicode
  • the specific unicode encoding

Having a unique byte order mark on top of the HTML, is valid but might result in problems with 3rd party software. As of HTML5, a BOM is supposed to override the charset definition from the HTTP header. If the BOM is used for charsets that are not unicode, this might lead to encoding problems. Encoding problems may lead to issues with the appearance of the site in browsers and search engines and therefore lead to issues with user experience.

Operating Instruction

You should consider removing the BOM and specify the encoding in the HTTP header or as a meta tag in the HTML <head>.

<img> has no alt attribute

Description

If an image without an alt-attribute is found, the URL is flagged with this hint. This report helps to identify all missing alt-attributes.

Example
<img src="file.jpg"/>
Importance

In terms of HTML validation, alt-attributes are requirements for images. The alt-attribute defines the altermative information that will be shown if the image file fails to load.

The alt-attribute is one of the factors that are used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt-attribute, you not only help search engines understand your site, but also add to accessibility for disabled users.

Operating Instruction

Add alt-attributes in all cases where it is missing.

Use a descriptive alt-attribute for images that contain information. You may use an empty alt-attribute if the images are only for decoration.

<link rel="canonical"> URL is not absolute

Description

If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.

This report shows all occurrences of canonical usage with URLs that are not absolute.

Examples

Absolute URL

<link rel="canonical" href="http://example.com/folder/page.html">

Short URL

<link rel="canonical" href="page.html">

Short URL - root folder relative

<link rel="canonical" href="/folder/page.html">

Short URL - protocol relative

<link rel="canonical" href="//example.com/folder/page.html">
Importance

Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest using absolute URLs for canonical links.

<link rel="canonical"> contains malformed or empty href

Description

This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.

Examples

Empty canonical

<link rel="canonical" href="">

Malformed canonical

<link rel="canonical" href="htp://example.com/">
Importance

Malformed or empty href in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.

Operating Instruction

We suggest to check for malformed or empty canonical href on a regular base.

<link rel="canonical"> found outside <head>

Description

A canonical element was placed outside of the <head> section, search engines will ignore it.

This report helps you to identify all occurrences of canonical definitions, that are invalid due to being placed outside the <head> tag on the crawled website.

Example
<html>
  <head>
    ...
  </head>
  <body>
    ...
    <link rel="canonical" href="http://example.com/">
    ...
  </body>
</html>
Importance

Some search engines ignore improper canonical designation. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and representation of the site in search results.

Operating Instruction

Keep your canonical definitions inside the HTML <head> tag, so they don't get ignored by search engines.

<link rel="canonical"> found twice

Description

More than one canonical elements was found, either as a <link> tag with rel="canonical" or an according link header.

This report shows all URLs with double canonical definitions on your website, that we were able to identify.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/>; rel="canonical"
Importance

Using more than one canonical link element can cause conflicting definitions or unexpected behaviour when documents are available on more than one URL at a time.

Operating Instruction

We suggest identifying all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third party code (plugins, extensions and add-ons of the CMS).

If canonical definitions are found twice on a document, this often occurs due to usage of multiple SEO plugins or a SEO plugin in combination with manual canonical definitions.

<link rel="canonical"> found twice and differs

Description

More than one canonical elements have been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.

This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/page1.html>; rel="canonical"
Importance

Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. This might lead to issues with duplicate content.

Operating Instruction

We suggest correcting all conflicting canonical definitions by removing the unnecessary definition.

<link> found outside <head>

Description

A <link> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain link tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</body>
</html>

How it should be

<html>
<head>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the link tag outside the HTML <head> is not valid. This may lead to unexpected behaviour or appearance of the website with some clients. Even though modern browsers are using a range of methods to autocorrect this type of common issue, it is not suggested to rely on the browser's ability to guess what the webmaster intended to achieve.

If the link tag is used to reference an external style sheets file, browsers use the information from the linked CSS file to render the site based on that information. If stylesheets have to be processed in the middle or at the end of the document, the browser will have to re-render the entire document based on the given changes. This can lead to a drop in performance and user experience.

If a canonical link element (<link rel="canonical" href="http://www.example.org" />) is used outside of the HTML <head>, it will be ignored by search engines. This might lead to issues with duplicate content, e.g. unexpected behaviour of the website in search results and ranking problems.

Operating Instruction

We suggest to move misplaced link tags to the <head>.

<meta description> missing or empty

Description

If the meta-description is missing or empty, the URL is flagged with this hint. Use this report to identify all URLs that are missing a proper meta description.

Example
<html>
<head>
...
<meta name="description" content="">
...
Importance

The meta description is usually the first choice for the description text in search results snippets. If the meta description is missing, you give up control over the appearance of your documents in the search results. Search engines will then use parts of the documents content as a description, which might lead to unexpected appearance of a site's snippets in search results.

Operating Instruction

We suggest using proper meta descriptions for all documents that are supposed to be indexed by search engines.

<meta description> occurs more than once

Description

If a meta description tag is found more than once in teh HTML, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="First meta description">
<meta name="description" content="Second meta description">
...
Importance

Having more than one meta description can lead to unexpected appearance of the URL in search results. This may result in lower user engagement and therefore a drop for user signals for your site. This may eventually hurt the rankings of the website.

Operating Instruction

There should be only one meta-description for a URL.

Error scenarios like this usually appear due to different software automatically adding meta descriptions. If this issue occurs on a large scale, check if there is a script or CMS plugin automatically adding meta descriptions to your documents.

<meta> found outside <head>

Description

A <meta> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain meta tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<meta name="description" content="foo">
...
</body>
</html>

How it should be

<html>
<head>
...
<meta name="description" content="foo">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the meta tag outside the HTML <head> is not valid unless

  • HTML5 is used and
  • an itemprop attribute is used.

So using a meta tag outside of the head area is only viable, if it is used to specify structured data properties. Alternatively, the itemprop attribute can be defined in other tags as well, e.g. <span>, <p>, <img>, which would offer backwards compatibility to HTML versions below HTML5.

If meta tags are not used for structured data, i.e. no itemprop attribute, they are required to be in the <head> to be considered valid. If they are placed outside of the <head> they might just get ignored by search engines.

Operating Instruction

If you find meta tags that need to be in the <head>, you should move them there. In case the discovered meta tags are used for structured data, we suggest to assign the itemprop attributes to other elements in the markup. For HTML versions below HTML5 this is a requirement for valid code.

<title> found outside <head>

Description

A <title> tag was placed outside of the <head> section, where it may have no effect. Use this report to identify all occurences of misplaced HTML <title> tags.

Example
<html>
<head>
...
</head>
<body>
...
<title>Title of the Site</title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. If the title tag is placed outside the HTML <head>, it may be ignored by search engines. This may lead to issues with the search snippet and the site's ranking in search results.

Operating Instruction

If this hint shows up in your crawl report, you should move all misplaced title tags into the HTML <head> section.

<title> missing or empty

Description

If the <title> tag is missing, the URL is flagged with this hint. Use this report to identify all case of missing title tags on the crawled website.

Example

Empty Title

<html>
<head>
...
<title></title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. The document's title is the primary resource for the title of the snippet in search results. If the <title> tag is missing or empty, one of the most important ranking factors is basically left out. This will very likely harm the ranking in search results and can also harm the Click-through rate from search results for the given URLs.

Operating Instruction

If this hint shows up in your crawl report, you should add title tags to all found URLs.

<title> occurs more than once

Description

If the <title> tag is found more than once in the source code, the URL is flagged with this hint.. There should be only one title for a document.

Example
<html>
<head>
...
<title>Welcome to Example.com</title>
<title>Example.com - Best Examples in the Internet</title>
...
Importance

The HTML <title> tag is an important ranking factor, as it is literally supposed to describe the content of the document. Having more than one HTML <title> tags can therefore lead to unexpected appearance of the URL in search results. Additionally it might harm the rankings of the document.

Operating Instruction

If this hint shows up in your crawl report, you might want to make sure you only use one title tag on all found URLs.

Error scenarios like this usually appear due to different software automatically adding HTML <title> tags. If this issue occurs on a large scale, check if there is a script or CMS plugin adding HTML <title> tags to your documents.

Charset: Invalid charset in Content-Type HTTP header

Description

The Content-Type HTTP header does specify an invalid charset. Discover all occurences of invalid charset definitions in Content-Type HTTP headers on the crawled website.

Examples
HTTP/1.1 200 OK
Server: Apache
Date: Thu, 17 Dec 2015 15:34:23 GMT
Content-Type: text/html; charset=foo-bar
...
Importance

If there is no valid charset defined in the HTTP header, the browser has to use the charset specified in the document or has to fall back to detect the charset to display the document. If the charset has to be guessed, this may lead to problems handling the encoding of the document. Additionally, this may slow down the rendering time for the document.

Operating Instruction

We suggest to set a proper charset in the HTTP header and in the document to make it easy for web clients to render the document fast and as expected. Make sure the defined charsets are identical and not conflicting.

Content-Security-Policy HTTP Header missing

Linking: Follow link to a so far no-follow URL

Description

A follow link was found, linking to a URL that was previously linked "nofollow" only.

The linking URL will be flagged with this hint and the target URL will be flagged as "No-Follow linking revoked later on".

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

A single follow link will allow the target URL to be crawled, even though it was previously forbidden by nofollow links.

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.

Note: In a document that is using the robots directive "nofollow" a link with "rel=follow" is identified as "follow".

Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Linking: Nofollow link to a follow URL

Description

A link with rel="nofollow" was found, linking to a URL that was previously linked "follow" already.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html follow
http://example.com/page2.html http://example.com/target.html nofollow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Linking: Nofollow linking revoked later on

Description

A URL that has been linked no-follow has later - that is on the same or a deeper level - been linked to as follow. By removing the initial no-follow directive, this URL may be lifted up some levels.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Redirect Loops: Redirect loop starts here

Description

This URL is the first element in a redirect loop. A redirect loop is a chain of redirects that ultimately redirects to this URL again.

This hints report shows all redirect loops that could be identified with the crawl.

Example
Level URL Redirect Target
1st http://example.com/page.html http://example.com/page2.html
2nd http://example.com/page2.html http://example.com/page.html
Importance

Redirect loops cause issues with user experience by making a URL unusable. Redirect loops can also waste crawl budget.

Operating Instruction

We suggest to identify all redirect loops on your site with this report and fix them. One solution might be defining a target URL that delivers a status code 200. You might also consider removing internal links to the redirected URL.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robots definitions.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.

Safe HTTPS URL loads unsafe resource

Description

If a HTTPS URL contains an unsafe resource, that is loaded using HTTP, it is flagged with this hint.

Example

Exampe code for https://www.example.com/

<script type="text/javascript" src="http://www.example.com/file.js"></script>
Importance

All files, that get loaded while opening a document over HTTPS, e.g. images, fonts, stylesheets, JavaScripts, should be requested over the HTTPS protocol as well. If elements are loaded using an unsafe HTTP connection, these might get compromised by a man in the middle attack while being loaded. This can compromise the security of the SSL secured request.

If this happens, the increased risk will be reflected in the SSL symbol in all modern browsers. Instead of displaying a green SSL lock, it would be yellow, orange or red to highlight loading of unsafe resources.

Operating Instruction

In documents only available over HTTPS, you should only include files loaded via the HTTPS protocol.

URL has // in path

Description

If the URL contains two consecutive slashes, it is flagged with this hint.

Example
http://example.com//page.html
http://example.com/directory//page.html
Importance

Two consecutive slashes in a row are valid but usually not wanted in a URL. Any occurence might indicate issues with relative linking and/or the URL base. This may lead to issues with duplicate content if the CMS delivers the same content, e.g. for http://example.com//page.html and http://example.com/page.html.

Operating Instruction

We suggest not to use consecutive slahes. Analyze all occurences of consecutive slashes and fix the reason why they occur.

URL too long for some browsers

Description

If a URL longer than 2000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL , you should stay below 2000 characters to be accessible by a large number of clients and web applications