Audisto Technical Error Checker

How to do technical short audit

The "Errors" hints group lists all hints from the other hints sections, that are considered an error - contrary to unexpected behaviour. These errors can - if not fixed - be harmful to a site's function, user experience and / or SEO.

This hints section helps to do a quick technical audit on a website and lists different kinds of clear errors that should be fixed.

Example: Audisto Technical Error Check with the technical hint reports for the current crawl

Audisto Technical Error Check with the technical hint reports for the current crawl

Here is the list of all specific hints that are part of the “errors” hints section and that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> has malformed href

Description

If a malformed href attribute value is found, the URL is flagged with this hint. A malformed href is usually a URI that is not valid according to RFC3986, or a result of a parsing error due to invalid HTML.

Examples
<a href="htp://www.example.com">link</a>
<a href="htps://www.example..com">link</a>
<a href="http://www..example.com">link</a>
<a href="http://">link</a>
<a href="htps:// www.example.com">link</a>
<a href="://www.example.com">link</a>
Importance

A link with a malformed href can not be parsed and will therefore not be recognized by search engines. In addition, links like this can result in issues with user experience.

Operating Instruction

Fix all malformed href attribute values on your website.

<a> links to fragment by name attribute on non-anchor

Description

The anchor contains a fragment link, but the target is defined by a name attribute and is not an anchor itself.

Example
<h1 name="#top">Headline</h1>
...
<a href="#top">Go to top/a>
Importance

Using the name attribute on non-anchor tags is not valid. Using the name attribute is only allowed for anchor tags. In addition, the name attribute has been deprecated since XHTML 1.0 and should not be used any more.

Operating Instruction

We suggest to remove all intances of the name attribute. You should use the id attribute instead.

<a> links to fragment only, but <base> points to another URL

Description

An <a> element links to a fragment only, while there is a <base> pointing to another URL. Discover all URLs that contain fragment links along with a base tag pointing to another URL.

Example

Example for http://example.com/page.html:

<base href="http://example.com/page2.html">
...
<a href="#top">link</a>

Expected behaviour: Browser requests http://example.com/page.html#top Actual behaviour: Browser requests http://example.com/page2.html#top

Importance

Fragment links are relative to the URL defined in the <base> element. If the <base> element is pointing to another URL, this may lead to unexpected user experience and issues with the crawlability of the website if fragment-only links are used.

Operating Instruction

We suggest that you not use a <base> element if it is possible to avoid it. We also suggest using absolute links instead of fragment-only links.

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to using the document's URL as the base.

Examples

A base with an invalid protocol:

<base href="htp://example.com">

A base with a white space in the domain name:

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes related to the base tag, all relative links in the document need to be checked and probably corrected.

<base> found more than once and differs

Description

More than one <base> directive are found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in a document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and the user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake:

<base href="example.com/">

Relative path on purpose:

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If you make changes to the base tag, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Use this report to find all URLs on the crawled website that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<h1> occurs more than once

Description

If more than one <h1> tag is found, the URL is flagged with this hint. Discover all URLs that contain more than one <h1> tag.

Examples
<h1><img src="logo.jpg" alt="Example.com"/></h1>
...
<h1>Primary Headline</h1>
Importance

The h1 is the most important heading in the document and should reflect the topic of the document. Having more than one <h1> tag is a sign of poor content structure. Content structure partly determines the content quality. While it is not a huge factor for most search engines, having more than one <h1> tag may be a negative signal in terms of content quality.

Operating Instruction

You might want to use only one <h1> tag per document.

<html> contains too many uncommon non-printable characters

Description

The HTML document contains too many uncommon non-printable characters, and not all will be shown in live analysis. With this report you can discover all URLs on the crawled website that contain more than 50 uncommon non-printable characters. See the corresponding hint "<html> contains uncommon non-printable characters" for further information about what is "uncommon".

Importance

Non-printable characters are used as control characters and may not be visible in the source code, but nonetheless impact the behavior of the site. This might affect crawling and the user experience when they are inside of an anchor's href or an image's src attribute, possibly resulting in issues with the site's structure and ranking.

Finding too many non-printable characters may be an indication for massive encoding issues in a document or documents that are not HTML documents.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non-printable characters and probably remove them.

<html> contains uncommon non-printable characters

Description

If uncommon non-printable characters are detected, the URL of the document containing the character is flagged.

There are non printable characters that will appear in almost every document, e.g. line feed (\n), carriage return (\r), horizontal tab (\t). In addition, there are commonly used non-printable characters, e.g. BOM, Soft hyphen, Left-To-Right-Mark and Right-To-Left-Mark. These characters do not cause the URL to get flagged with this hint. This hint detects all other non printable characters.

Examples

Due to the non printable nature of these characters, you'll find the character codes instead of the actual characters in the live analysis, enclosed by brackets:

[[&#xEFBBBF;]]
Importance

Non-printable characters may not be visible in the source code, but nonetheless impact:

  • the behaviour of the site, e.g. when they are inside of an anchor's href or an image's src attribute
  • the ranking of the site, e.g. when they are an invisible part of a word

This might affect crawling and user experience, possibly resulting in issues with accessibility and ranking.

Usually this hint is triggered by problematic encoding.

Operating Instruction

Non-printable characters should generally be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> starts with BOM

Description

There is an unicode byte order mark (BOM) at the top of the HTML. Discover all URLs on the crawled website that contain a BOM.

We currently detect BOM in the following encoding:

  • UTF-8
  • UTF-16 BE/LE
  • UTF-32 BE/LE
  • UTF-7
  • UTF-1
  • UTF-EBCDIC
  • SCSU
  • BOCU-1
  • GB-18030
Examples

Example UTF-8 BOM in HTML 5:

EF BB BF<!DOCTYPE html>
<html lang="en">

How BOM looks in different encoding and representations:

Encoding BOM hex BOM dec
UTF-8 EF BB BF 239 187 191
UTF-16 (BE) FE FF 254 255
UTF-16 (LE) FF FE 255 254
UTF-32 (BE) 00 00 FE FF 0 0 254 255
UTF-32 (LE) FF FE 00 00 255 254 0 0
UTF-7 2B 2F 76 38 43 47 118 56
2B 2F 76 39 43 47 118 57
2B 2F 76 2B 43 47 118 43
2B 2F 76 2F 43 47 118 47
2B 2F 76 38 2D 43 47 118 56 45
F7 64 4C 247 100 76
UTF-EBCDIC DD 73 66 73 221 115 102 115
SCSU 0E FE FF 14 254 255
BOCU-1 FB EE 28 251 238 40
GB-18030 84 31 95 33 132 49 149 51
Importance

The unicode BOM is the unicode character U+FEFF. Some text editors add it to documents. The BOM is used to signal:

  • the byte order, or endianness
  • the fact that the text is unicode
  • the specific unicode encoding

Having a unique BOM at the top of the HTML is valid but might result in problems with third party software. As of HTML5, a BOM is supposed to override the charset definition from the HTTP header. If the BOM is used for charsets that are not unicode, this might lead to encoding problems. Encoding problems may lead to issues with the appearance of the site in browsers and search engines and therefore lead to issues with the user experience.

Operating Instruction

You should consider removing the BOM and specify the encoding in the HTTP header or as a meta tag in the HTML <head>.

<img> has no alt attribute

Description

If an image without an alt attribute is found, the URL is flagged with this hint. This report helps to identify all missing alt attributes.

Example
<img src="file.jpg"/>
Importance

In terms of HTML validation, alt attributes are required for images. The alt attribute defines the altermative information that will be shown if the image file fails to load.

The alt attribute is one of the factors that is used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt attribute, you not only help search engines understand your site, but also, you enhance accessibility for disabled users.

Operating Instruction

Add alt attributes in all cases where it is missing.

Use a descriptive alt-attribute for images that contain information. You may use an empty alt-attribute if the images are only for decoration.

<link rel=canonical> URL is not absolute

Description

If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.

This report shows all occurrences of canonical usage with URLs that are not absolute.

Examples

Absolute URL:

<link rel="canonical" href="http://example.com/folder/page.html">

Short URL:

<link rel="canonical" href="page.html">

Short URL - root folder relative:

<link rel="canonical" href="/folder/page.html">

Short URL - protocol relative:

<link rel="canonical" href="//example.com/folder/page.html">
Importance

Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest that you use absolute URLs for canonical links.

<link rel=canonical> contains malformed or empty href

Description

This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.

Examples

Empty canonical:

<link rel="canonical" href="">

Malformed canonical:

<link rel="canonical" href="htp://example.com/">
Importance

Malformed or empty hrefs in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.

Operating Instruction

We suggest that you check for malformed or empty canonical hrefs on a regular basis.

<link rel=canonical> found outside <head>

Description

A canonical element was placed outside of the <head> section and so search engines will ignore it.

This report helps you to identify all occurrences of canonical definitions that are invalid due to being placed outside the <head> tag on the crawled website.

Example
<html>
  <head>
    ...
  </head>
  <body>
    ...
    <link rel="canonical" href="http://example.com/">
    ...
  </body>
</html>
Importance

Some search engines ignore improper canonical definitions. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and with the representation of the site in search results.

Operating Instruction

Keep your canonical definitions inside the HTML <head> tag, so they don't get ignored by search engines.

<link rel=canonical> found twice

Description

More than one canonical element was found, either as a <link> tag with rel="canonical" or an according link header.

This report shows all URLs with double canonical definitions on your website, that we were able to identify.

Examples

HTML Head:

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">

HTTP Header:

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"

HTML Head & HTTP Header:

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/>; rel="canonical"
Importance

Using more than one canonical link element can cause conflicting definitions or unexpected behaviour when documents are available on more than one URL at a time.

Operating Instruction

We suggest that you identify all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third-party code (plugins, extensions and add-ons of the CMS).

If canonical definitions are found twice in a document, this often occurs due to usage of multiple SEO plugins or an SEO plugin in combination with manual canonical definitions.

<link rel=canonical> found twice and differs

Description

More than one canonical element has been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.

This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.

Examples

HTML Head:

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">

HTTP Header:

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"

HTML Head & HTTP Header:

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/page1.html>; rel="canonical"
Importance

Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. This might lead to issues with duplicate content.

Operating Instruction

We suggest that you correct all conflicting canonical definitions by removing the unnecessary definition.

<link> found outside <head>

Description

A <link> tag was placed outside of the <head> section where it may have no effect. Discover all HTML documents on the crawled website that contain link tags outside the HTML head area.

Examples

What we discover:

<html>
<head>
...
</head>
<body>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</body>
</html>

How it should be:

<html>
<head>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the link tag outside the HTML <head> is not valid. This may lead to unexpected behaviour or appearance of the website with some clients. Even though modern browsers are using a range of methods to autocorrect this type of common issue, it is not suggested to rely on the browser's ability to guess what the webmaster intended to achieve.

If the link tag is used to reference an external stylesheet file, browsers use the information from the linked CSS file to render the site based on that information. If stylesheets have to be processed in the middle, or at the end of the document, the browser will have to re-render the entire document based on the given changes. This can lead to a drop in performance and user experience.

If a canonical link element (<link rel="canonical" href="http://www.example.org" />) is used outside of the HTML <head>, it will be ignored by search engines. This might lead to issues with duplicate content, e.g. unexpected behaviour of the website in search results and ranking problems.

Operating Instruction

We suggest that you move misplaced link tags to the <head>.

<meta description> missing or empty

Description

If the meta description is missing or empty, the URL is flagged with this hint. Use this report to identify all URLs that are missing a proper meta description.

Example
<html>
<head>
...
<meta name="description" content="">
...
Importance

The meta description is usually the first choice for the description text that appears in search result snippets. If the meta description is missing, you give up control over the appearance of your pages in search results. Search engines will instead use parts of the page content as a description, which might lead to unexpected appearance of a site's snippets in search results.

Operating Instruction

We suggest that you use proper meta descriptions for all pages that are supposed to be indexed by search engines.

<meta description> occurs more than once

Description

If a meta description is found more than once in the HTML, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="First meta description">
<meta name="description" content="Second meta description">
...
Importance

Having more than one meta description can lead to unpredictable display of the document in search results. This may result in lower user engagement and therefore a drop in user signals for your website. This may eventually hurt the rankings of your website.

Operating Instruction

There should only be one meta description for a document.

Error scenarios like this usually appear due to different software automatically adding meta descriptions. If this issue occurs on a large scale, check if there is a script or CMS plugin automatically adding meta descriptions to your webpages.

<meta> found outside <head>

Description

A <meta> tag was placed outside of the <head> section where it may have no effect. Discover all HTML documents on the crawled website that contain meta tags outside the HTML head area.

Examples

What we discover:

<html>
<head>
...
</head>
<body>
...
<meta name="description" content="foo">
...
</body>
</html>

How it should be:

<html>
<head>
...
<meta name="description" content="foo">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the meta tag outside of the HTML <head> is not valid unless:

  • HTML5 is used and
  • an itemprop attribute is used

So, using a meta tag outside of the head area is only viable if it is used to specify structured data properties. Alternatively, the itemprop attribute can be defined in other tags as well, e.g. <span>, <p>, <img>, which would offer backwards compatibility to HTML versions below HTML5.

If meta tags are not used for structured data, i.e. no itemprop attribute, they are required to be in the <head> to be considered valid. If they are placed outside of the <head> they might just get ignored by search engines.

Operating Instruction

If you find meta tags that need to be in the <head>, you should move them there. In cases where the discovered meta tags are used for structured data, we suggest that you assign the itemprop attributes to other elements in the markup. For HTML versions below HTML5, this is a requirement for valid code.

<title> found outside <head>

Description

A <title> tag was placed outside of the <head> section, where it may have no effect. Use this report to identify all occurences of misplaced HTML <title> tags.

Example
<html>
<head>
...
</head>
<body>
...
<title>Title of the Site</title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. If the title tag is placed outside the HTML <head>, it may be ignored by search engines. This may lead to issues with the search snippet and the site's ranking in search results.

Operating Instruction

If this hint shows up in your crawl report, you should move all misplaced title tags into the HTML <head> section.

<title> missing or empty

Description

If the <title> tag is missing, the URL is flagged with this hint. Use this report to identify all cases of missing <title> tags on the crawled website.

Example

This is an example of an empty title tag:

<html>
<head>
...
<title></title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. The document's title is the primary resource for the title of the snippet in search results. If the <title> tag is missing or empty, one of the most important ranking factors is basically left out. This will very likely harm the ranking in search results and can also harm the click-through rate from search results for the given URLs.

Operating Instruction

If this hint shows up in your crawl report, you should add <title> tags to all discovered URLs.

<title> occurs more than once

Description

If the <title> tag is found more than once, the URL is flagged with this hint. There should only be one title per page.

Example
<html>
<head>
...
<title>Welcome to Example.com</title>
<title>Example.com - Best Examples on the Internet</title>
...
Importance

The HTML <title> tag is an important ranking factor, as it is literally supposed to describe the content of a page. Having more than one <title> tag can therefore lead to unpredictable displays of the webpage in search results. Additionally, it may harm the rankings of the page.

Operating Instruction

If this hint shows up in your crawl report, you might want to make sure you only use one <title> tag on all discovered URLs.

Error scenarios like this usually appear due to different software automatically adding <title> tags. If this issue occurs on a large scale, check if there is a script or CMS plugin adding <title> tags to your pages.

Charset: Invalid charset in Content-Type HTTP header

Description

The Content-Type HTTP header specifies an invalid charset. Discover all occurences of invalid charset definitions in Content-Type HTTP headers.

Examples
HTTP/1.1 200 OK
Server: Apache
Date: Thu, 17 Dec 2015 15:34:23 GMT
Content-Type: text/html; charset=foo-bar
...
Importance

If there is no valid charset defined in the HTTP header, the browser has to use the charset specified in the document or has to fall back to detect the charset to display the document. If the charset has to be guessed, this may lead to problems handling the encoding of the document. Additionally, this may slow down the rendering time for the document.

Operating Instruction

We suggest that you set a proper charset in the HTTP header and in the document to make it easy for web clients to render the document quickly and as expected. Make sure the defined charsets are identical and not conflicting.

Content-Security-Policy HTTP header missing

Description

If the Content-Security-Policy HTTP header is missing, the URL is flagged with this hint. This report indicates that your site is not properly protected from Cross Site Scripting (XSS) and similar security attacks.

Example

The simplest form of Content-Security-Policy directive is one that restricts access to your own domain:

Content-Security-Policy: default-src 'self'

The directive can also be used to restrict access to only the https version of your domain:

Content-Security-Policy: default-src https://example.com
Importance

Content Security Policy is an additional level of security that can be added to your site to further protect it from attacks, most notably from Cross Site Scripting (XSS). Content Security Policy is implemented with parameters in the Content-Security-Policy HTTP header directive. Essentially, the parameters allow you to define what gets loaded from where, or in more technical terms, which domains and which resources are allowed to load.

Operating Instruction

We suggest that you use this report to identify web pages that are missing a Content Security Policy header and then add one to each of those pages.

Language: Invalid

Description

The language specification of the document does not follow some basic rules for language tags.

Example
<html lang="en_US">

Correct implementation:

<html lang="en-US">
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience.

Operating Instruction

We suggest that you carefully read the [W3C's advisory on language settings] and use the lang attribute and the xml:lang attribute to specify the language of your document.

Language: Set multiple times and differs

Description

If the document language was set multiple times and differs, the URL is flagged with this hint.

Example

XHTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="fr" xmlns="http://www.w3.org/1999/xhtml">

HTML5 Polyglot:

<!DOCTYPE html>
<html lang="en" xml:lang="fr" xmlns="http://www.w3.org/1999/xhtml">
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience. In case of multiple conflicting language specifications the correct language has to be guessed.

Operating Instruction

We suggest that you carefully read the [W3C's advisory on language settings] and use the lang attribute and the xml:lang attribute to specify the language of your document. Make sure you don't have conflicting specifications.

Linking: Follow link to a so far no-follow URL

Description

A follow link was found, linking to a URL that was previously linked "nofollow" only.

The source URL will be flagged with this hint and the target URL will be flagged as "No-Follow linking revoked later on".

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

A single follow link will allow the target URL to be crawled, even though it was previously forbidden by nofollow links.

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.

Note: In a document that uses the robots directive "nofollow", a link with "rel=follow" is identified as "follow".

Operating Instruction

If you encounter this hint when crawling your website, we suggest that you:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow link to a follow URL

Description

A link with rel="nofollow" was found, linking to a URL that was previously linked "follow" already.

This report helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html follow
http://example.com/page2.html http://example.com/target.html nofollow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation:

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow linking revoked later on

Description

A URL that has been linked nofollow has later (i.e. on the same or a deeper level) been linked to as follow. By removing the initial nofollow directive, this URL may be lifted up some levels.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behavior depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Redirect Loops: Redirect loop starts here

Description

This URL is the first element in a redirect loop. A redirect loop is a chain of redirects that ultimately redirects to this URL again.

This hints report shows all redirect loops that could be identified by the crawl.

Example
Level URL Redirect Target
1st http://example.com/page.html http://example.com/page2.html
2nd http://example.com/page2.html http://example.com/page.html
Importance

Redirect loops cause issues with user experience by making a URL unusable. Redirect loops can also waste crawl budget.

Operating Instruction

We suggest that you identify all redirect loops on your site with this report and fix them. One solution might be defining a target URL that delivers a status code 200. You might also consider removing internal links to the redirected URL.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robot definitions.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of a robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtle way to produce this error would be conflicting definitions by omiting parts of the directive, such as:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, such as:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.

Safe HTTPS webpage loads unsafe resource

Description

If an HTTPS webpage contains an unsafe resource, that is loaded using HTTP, it is flagged with this hint.

Example

Exampe code for https://www.example.com/:

<script type="text/javascript" src="http://www.example.com/file.js"></script>
Importance

All files that get loaded while opening a document over HTTPS, e.g. images, fonts, stylesheets, JavaScripts, should be requested over the HTTPS protocol as well. If elements are loaded using an unsafe HTTP connection, these might get compromised by a man in the middle attack while being loaded. This can compromise the security of the SSL secured request.

If this happens, the increased risk will be reflected in the SSL symbol in all modern browsers. Instead of displaying a green SSL lock, it would be yellow, orange or red to highlight loading of unsafe resources.

Operating Instruction

In documents only available over HTTPS, you should only include files loaded via the HTTPS protocol.

URL has // in path

Description

If the URL contains two consecutive slashes, it is flagged with this hint.

Example
http://example.com//page.html
http://example.com/directory//page.html
Importance

Two consecutive slashes in a row are valid but usually not wanted in a URL. Any occurence might indicate issues with relative linking and/or the URL base. This may lead to issues with duplicate content if the CMS delivers the same content, e.g. for http://example.com//page.html and http://example.com/page.html.

Operating Instruction

We suggest that you not use consecutive slahes. Analyze all occurences of consecutive slashes and fix the reason why they occur.

URL too long for some browsers

Description

If a URL longer than 2,000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL, you should stay below 2,000 characters to be accessible by a large number of clients and web applications.