Audisto HTML Markup Error Checker

How to detect issues with HTML on your website

Invalid HTML markup can lead to situations in which search engines fail to understand the site's structure as intended. It can also lead to issues with the rendering of the website in browsers.

Example: Audisto HTML Markup Error Check with the HTML Markup hint reports for the current crawl

Audisto HTML Markup Error Check with the HTML markup hint reports for the current crawl

Here is the list of all specific hints related to HTML markup errors, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> has both href and onclick attribute

Description

If a link with a href attribute and an onclick attribute is found, the URL is flagged with this hint.

Example

Link calling a JavaScript function with the onclick event.

<a href="http://example.com/page.html" onclick="alert('hello world');">Link</a>

Link calling a JavaScript redirect with the onclick event.

<a href="http://example.com/page.html" onclick="document.href='http://example.com/page1.html';">Link</a>
Importance

The onclick attribute defines a JavaScript action to happen, when the "onclick" event for the link is triggered, i.e. the user clicks the link.

This may lead to unexpected behaviour and user experience issues for users with and without JavaScript activated.

Be aware that modern search engines like Google follow JavaScript links like these. If the JavaScript redirects lead to a different target than the HTML link, the search engine might start to mistrust the links.

Operating Instruction

We suggest checking instances of onclick attributes on HTML links and decide if the onclick usage is required. Remove any onclick attribute that is not needed.

<a> has malformed href

Description

If a malformed href attribute value is found, the URL is flagged with this hint. A malformed href is usually an URI, that is not valid according to RFC3986 or a result of a parsing error due to invalid HTML.

Examples
<a href="htp://www.example.com">link</a>
<a href="htps://www.example..com">link</a>
<a href="http://www..example.com">link</a>
<a href="http://">link</a>
<a href="htps:// www.example.com">link</a>
<a href="://www.example.com">link</a>
Importance

A link with a malformed href can not be parsed and will therefore not be recognized by search engines. In addition links like this can result in issues with user experience.

Operating Instruction

Fix all malformed href attribute values on your website.

<a> has no content

Description

If a link without content is found, the URL is flagged with this hint.

Example
<a href="http://example.com/"></a>
Importance

If an anchor tag is empty, it is not clickable for the user. This might cause issues with user experience. In addition links without content can't pass anchor text to the target URL.

Operating Instruction

We suggest evaluating all anchor tags that have no content and fix all occurrences that are not intended as internal anchors.

<a> href attribute has leading or trailing whitespace characters

Description

If a leading or trailing whitespace gets dicovered in a HTML anchor tag, the linking URL gets flagged with this hint. Discover all URLs that link to other documents with a leading or trailing whitepace with this report.

Examples
<a href=" http://www.example.com">link</a>

<a href="http://www.example.com/ ">link</a>

<a href=" http://www.example.com/ ">link</a>
Importance

Leading and trailing whitespaces in a HREF attribute usually get trimmed by browsers. Nonetheless it is better to remove whitespaces. It may also indicate a problem with the code that generates the site.

Operating Instruction

We suggest removing unnecessary leading or trailing whitespaces. Reviewing the code that generates the link might be necessary.

<a> link uses an unknown protocol

Description

An anchor's href uses a protocol that is unknown to our crawler. This may be caused by a misspelling, e.g. httpa:// instead of https://, but also may be a valid link.

Examples
<a href="httpa://example.com">link</a>

<a href="irc://example.com">link</a>

<a href="whatsapp://example.com">link</a>
Importance

If an unknown or invalid protocol is used, the href attribute will be considered invalid by search engines. This may lead to a multitude of follow up problems like crawling issues, ranking issues and user experience issues.

Operating Instruction

We suggest to evaluate and fix all seemingly invalid protocols used in links on the site.

<a> links to fragment by name attribute on non-anchor

Description

The anchor contains a fragment link, but the target is defined by a name attribute and it not an anchor itself.

Example
<h1 name="#top">Headline</h1>
...
<a href="#top">Go to top/a>
Importance

Using the name attribute on non-anchor tags is not valid. Using the name attribute is only allowed for anchor tags. In addition, the name attribute is deprecated since XHTML 1.0 and should not be used any more.

Operating Instruction

We suggest to remove all intances of the "name" attribute. You should use the "id" attribute instead.

<a> links to fragment by name attribute, not id

Description

The anchor contains a fragment link, like "#top", but the target is defined by a name attribute. The name attribute is deprecated and won't be supported in upcoming versions of HTML anymore.

Examples
<span name="top"></span>
...
<a href="#top">link</a>

<a name="top"></a>
...
<a href="#top">link</a>
Importance

The name attribute is deprecated since XHTML 1.0 and should not be used any more. In addition, it was only allowed on anchor tags.

Operating Instruction

We suggest to remove all instances of the "name" attribute. You should use the "id" attribute instead.

<a> links to fragment that was not found

Description

The anchor contains a fragment link, like "#top", but there was no according id or name attribute found in the document.

This report helps to identify all URLs that contain fragment-only links, where the fragment is not present in the document.

Example
<a href="#top">link to fragment that does not exist</a>
Importance

If there are anchors to internal fragment-only links, that are not present in the document, this may hurt the user experience and might probably result in worse user signals.

Operating Instruction

We suggest to fix all occurences of broken fragment-only links..

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to the document's URL as base.

Examples

A base with an invalid protocol

<base href="htp://example.com">

A base with a white space in the domain name

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> found more than once and differs

Description

More than one <base> directives were found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake

<base href="example.com/">

Relative path on purpose

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> is same as URL

Description

A <base> tag was found, but points to the same URL, rendering itself useless.

Example

HTML base on http://example.com/page.html

<base href="http://example.com/page.html">
Importance

The base tag defines the URL base for relative links in a document. There is no point in using the current URL as base href as it is the same as if the base tag isn't used at all, rendering it useless.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Find all URLs on the crawled site with this report, that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links, that might impact search engines and user experience on the website.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> points to other URL

Description

A <base> tag was found, and points to another URL.

Example

Base on http://example.com/page.html

<base href="http://example.com/page2.html">
Importance

The base tag defines the URL base for relative links in a document. Using a base tag together with fragment only links will make those links pointing to the specific anchor on the URL of the base tag.

If this is unintended it leads to issues with user experience and crawling.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<h1> not found

Description

If the HTML does not contain a h1-headline, the URL is flagged with this hint. Discover all HTML documents on the crawled website, that are missing a proper h1 definition.

Example
<h1>Headline</h1>
Importance

Content structure partly determines content quality. Headlines add to structure in a document. Missing headlines are indicators for poorly structured content and therefore indicate lower content quality. This may lead to lower rankings and less user interaction. The h1 is the most important headline in the document. The h1 usually corresponds with the document title.

Operating Instruction

You might want to add a h1 to your HTML.

<h1> occurs more than once

Description

If more than one <h1> tag is found, the URL is flagged with this hint. Discover all URLs that contain more than one <h1> tag.

Examples
<h1><img src="logo.jpg" alt="Example.com"/></h1>
...
<h1>Primary Headline</h1>
Importance

The h1 is the most important headline in the document and should reflect the topic of the document. Having more than one <h1> tag is a sign of poor content structure. Content structure partly determines the content quality. While it is not a huge factor for most search engines, having more than one <h1> tag may be a negative signal in terms of content quality.

Operating Instruction

You might want to use only one <h1> tag per document.

<h1>-<h6> chain broken

Description

If the HTML contains a broken <h1>-<h6> chain, the URL is flagged with this hint. Discover all URLs on the crawled website, that have a broken headline chain.

Examples
<h1>headline</h1>
...
<h3>headline</h3>
...
<h4>headline</h4>
Importance

A headline chain is considered broken, when it is not strictly hierarchically. A broken headline chain can therefore be an indicator for a poor content structure. This is important because content structure partly determines content quality. While this is a minor factor for most search engines, positive content structure signals may add to the ranking of a document.

Operating Instruction

We suggest to keep a strictly hierarchical headline chain. Evaluate the occurences of broken headline chains and consider adding the missing headlines.

<html> contains too many uncommon non-printable characters

Description

The HTML documents contains too many uncommon non-printable characters, and not all will be shown in live analysis. With this report you can discover all URLs on the crawled website that contain more than 50 uncommon non-printable characters. See the corresponding hint "<html> contains uncommon non-printable characters" for further information about what is "uncommon".

Importance

Non-printable characters are used as control characters and may not be visible in the source code, but nonetheless impact the behaviour of the site. This might affect crawling and user experience when they are inside of an anchor's href or an image's src attribute, possibly resulting in issues with the site's structure and ranking.

Finding too many non printable characters may be a hint for massive encoding issues in a document or documents that are not HTML documents.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> contains uncommon non-printable characters

Description

If uncommon non-printable characters are detected, the URL of the document containing the character will be flagged.

There are non printable characters, that will appear in almost every document, e.g. line feed (\n), carriage return (\r), horizontal tab (\t). In addition, there are commonly used non-printable characters, e.g. BOM, Soft hyphen, Left-To-Right-Mark and Right-To-Left-Mark. These characters will not cause the URL to get flagged with this hint. This hint detects all remaining non printable characters.

Examples

Due to the non printable nature of these characters, you'll find the character codes instead of the actual characters in the live analysis, enclosed by brackets.

[[&#xEFBBBF;]]
Importance

Non-printable characters may not be visible in the source code, but nonetheless impact

  • the behaviour of the site, e.g. when they are inside of an anchor's href or an image's src attribute
  • the ranking of the site, e.g. when they are an invisible part of a word.

This might affect crawling and user experience, possibly resulting in issues with accessibility and ranking.

Usually this hint is triggered by messed up encoding.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> contains unencoded Left-To-Right-Mark or Right-To-Left-Mark

Description

A Left-To-Right- or Right-To-Left-Mark was found, but it is unescaped. Discover all URLs that contain an unescaped Left-To-Right-Mark or an unescaped Right-To-Left-Mark.

Examples
Character Name Detected Character HTML Entity (named) HTML Entity (decimal) HTML Entity (hex)
Left-To-Right-Mark U+200E &lrm; &#8206; &#x200e;
Right-To-Left-Mark U+200F &rlm; &#8207; &#x200f;
Importance

The Left-To-Right- and Right-To-Left-Mark are non-printable characters used for typesetting of bi-directional text. The Left-To-Right- or Right-To-Left mark are not visible and, if used without being properly escaped, may lead to a range of unexpected problems that are hard to track down due to the invisible nature of these characters:

  • Issues with the appearance of the website
  • Issues with characters ending up to be used in a URL
Operating Instruction

If unencoded Left-To-Right- or Right-To-Left-Marks are discovered,

  • escape them or
  • remove them completely and
  • if the functionality is required, switch to a CSS solution

when ever possible.

If escaping the characters, you should prefer the named HTML entities (&lrm; and &rlm;) over decimal or hex HTML entities.

<html> contains unencoded soft hyphen (SHY)

Description

If an unescaped soft hyphen was found, the URL is flagged with this hint. Discover all URLs on the crawled website, that contain unencoded soft hyphen.

Examples
Character Name Detected Character HTML Entity (named) HTML Entity (decimal) HTML Entity (hex)
soft hyphen U+00AD &shy; &#173; &#xad;
Character Name Detected Character HTML Entity (named) HTML Entity (decimal)
Left-To-Right-Mark U+200E &lrm; &#8206;
Right-To-Left-Mark U+200F &rlm; &#8207;
Importance

The unencoded soft hyphen is a character that is used for hyphenation of words. The soft hyphen is only visible if the word needs to be hyphenated on a line break. That characteristic can lead to

  • unexpected hyphenation
  • hard to track down issues with the appearance of the website
  • the character ending up in a URL
Operating Instruction

If unencoded soft hyphens are discovered, escape them or remove them completely if they are not required.

If escaping the characters, you should prefer the named HTML entity (&shy;) over the decimal or hex HTML entity.

<html> starts with BOM

Description

There is an unicode byte order mark (BOM) at top of the HTML. Discover all URLs on the crawled website, that contain a BOM.

We currently detect BOM in the following encoding:

  • UTF-8
  • UTF-16 BE/LE
  • UTF-32 BE/LE
  • UTF-7
  • UTF-1
  • UTF-EBCDIC
  • SCSU
  • BOCU-1
  • GB-18030
Examples

Example UTF-8 BOM in HTML 5

EF BB BF<!DOCTYPE html>
<html lang="en">

How BOM looks in different encoding and representations:

Encoding BOM hex BOM dec
UTF-8 EF BB BF 239 187 191
UTF-16 (BE) FE FF 254 255
UTF-16 (LE) FF FE 255 254
UTF-32 (BE) 00 00 FE FF 0 0 254 255
UTF-32 (LE) FF FE 00 00 255 254 0 0
UTF-7 2B 2F 76 38 43 47 118 56
2B 2F 76 39 43 47 118 57
2B 2F 76 2B 43 47 118 43
2B 2F 76 2F 43 47 118 47
2B 2F 76 38 2D 43 47 118 56 45
F7 64 4C 247 100 76
UTF-EBCDIC DD 73 66 73 221 115 102 115
SCSU 0E FE FF 14 254 255
BOCU-1 FB EE 28 251 238 40
GB-18030 84 31 95 33 132 49 149 51
Importance

The unicode byte order mark is the unicode character U+FEFF. Some text editors add it to documents. The BOM is used to signal:

  • the byte order, or endianness
  • the fact that the text is unicode
  • the specific unicode encoding

Having a unique byte order mark on top of the HTML, is valid but might result in problems with 3rd party software. As of HTML5, a BOM is supposed to override the charset definition from the HTTP header. If the BOM is used for charsets that are not unicode, this might lead to encoding problems. Encoding problems may lead to issues with the appearance of the site in browsers and search engines and therefore lead to issues with user experience.

Operating Instruction

You should consider removing the BOM and specify the encoding in the HTTP header or as a meta tag in the HTML <head>.

<img> alt attribute exists but empty

Description

If an image with an empty alt attribute is found, the URL is flagged with this hint.

Example
<img src="image.jpg" alt="" />
Importance

The alt-attribute defines the alternative information that will be shown, if the image file fails to load.

The alt-attribute is one of the factors that are used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt-attribute, you not only help search engines understand your site and rank your images, but also add to accessibility for disabled users.

Operating Instruction

If the image is not of decorative nature and contains information, you should consider adding a value for the alt-attribute.

If the image is of decorative- (i.e. spacer images, backgrounds, style enhancements with images in general) or functional nature (i.e. tracking pixel), you should consider using a more advanced technical solution like CSS or Javascript.

<img> has no alt attribute

Description

If an image without an alt-attribute is found, the URL is flagged with this hint. This report helps to identify all missing alt-attributes.

Example
<img src="file.jpg"/>
Importance

In terms of HTML validation, alt-attributes are requirements for images. The alt-attribute defines the altermative information that will be shown if the image file fails to load.

The alt-attribute is one of the factors that are used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt-attribute, you not only help search engines understand your site, but also add to accessibility for disabled users.

Operating Instruction

Add alt-attributes in all cases where it is missing.

Use a descriptive alt-attribute for images that contain information. You may use an empty alt-attribute if the images are only for decoration.

<link rel="canonical"> URL is not absolute

Description

If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.

This report shows all occurrences of canonical usage with URLs that are not absolute.

Examples

Absolute URL

<link rel="canonical" href="http://example.com/folder/page.html">

Short URL

<link rel="canonical" href="page.html">

Short URL - root folder relative

<link rel="canonical" href="/folder/page.html">

Short URL - protocol relative

<link rel="canonical" href="//example.com/folder/page.html">
Importance

Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest using absolute URLs for canonical links.

<link rel="canonical"> contains malformed or empty href

Description

This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.

Examples

Empty canonical

<link rel="canonical" href="">

Malformed canonical

<link rel="canonical" href="htp://example.com/">
Importance

Malformed or empty href in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.

Operating Instruction

We suggest to check for malformed or empty canonical href on a regular base.

<link rel="canonical"> found outside <head>

Description

A canonical element was placed outside of the <head> section, search engines will ignore it.

This report helps you to identify all occurrences of canonical definitions, that are invalid due to being placed outside the <head> tag on the crawled website.

Example
<html>
  <head>
    ...
  </head>
  <body>
    ...
    <link rel="canonical" href="http://example.com/">
    ...
  </body>
</html>
Importance

Some search engines ignore improper canonical designation. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and representation of the site in search results.

Operating Instruction

Keep your canonical definitions inside the HTML <head> tag, so they don't get ignored by search engines.

<link rel="canonical"> found twice

Description

More than one canonical elements was found, either as a <link> tag with rel="canonical" or an according link header.

This report shows all URLs with double canonical definitions on your website, that we were able to identify.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/>; rel="canonical"
Importance

Using more than one canonical link element can cause conflicting definitions or unexpected behaviour when documents are available on more than one URL at a time.

Operating Instruction

We suggest identifying all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third party code (plugins, extensions and add-ons of the CMS).

If canonical definitions are found twice on a document, this often occurs due to usage of multiple SEO plugins or a SEO plugin in combination with manual canonical definitions.

<link rel="canonical"> found twice and differs

Description

More than one canonical elements have been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.

This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.

Examples

HTML Head

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">

HTTP Header

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"

HTML Head & HTTP Header

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/page1.html>; rel="canonical"
Importance

Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. This might lead to issues with duplicate content.

Operating Instruction

We suggest correcting all conflicting canonical definitions by removing the unnecessary definition.

<link> found outside <head>

Description

A <link> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain link tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</body>
</html>

How it should be

<html>
<head>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the link tag outside the HTML <head> is not valid. This may lead to unexpected behaviour or appearance of the website with some clients. Even though modern browsers are using a range of methods to autocorrect this type of common issue, it is not suggested to rely on the browser's ability to guess what the webmaster intended to achieve.

If the link tag is used to reference an external style sheets file, browsers use the information from the linked CSS file to render the site based on that information. If stylesheets have to be processed in the middle or at the end of the document, the browser will have to re-render the entire document based on the given changes. This can lead to a drop in performance and user experience.

If a canonical link element (<link rel="canonical" href="http://www.example.org" />) is used outside of the HTML <head>, it will be ignored by search engines. This might lead to issues with duplicate content, e.g. unexpected behaviour of the website in search results and ranking problems.

Operating Instruction

We suggest to move misplaced link tags to the <head>.

<meta description> missing or empty

Description

If the meta-description is missing or empty, the URL is flagged with this hint. Use this report to identify all URLs that are missing a proper meta description.

Example
<html>
<head>
...
<meta name="description" content="">
...
Importance

The meta description is usually the first choice for the description text in search results snippets. If the meta description is missing, you give up control over the appearance of your documents in the search results. Search engines will then use parts of the documents content as a description, which might lead to unexpected appearance of a site's snippets in search results.

Operating Instruction

We suggest using proper meta descriptions for all documents that are supposed to be indexed by search engines.

<meta description> occurs more than once

Description

If a meta description tag is found more than once in teh HTML, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="First meta description">
<meta name="description" content="Second meta description">
...
Importance

Having more than one meta description can lead to unexpected appearance of the URL in search results. This may result in lower user engagement and therefore a drop for user signals for your site. This may eventually hurt the rankings of the website.

Operating Instruction

There should be only one meta-description for a URL.

Error scenarios like this usually appear due to different software automatically adding meta descriptions. If this issue occurs on a large scale, check if there is a script or CMS plugin automatically adding meta descriptions to your documents.

<meta description> too long for Google snippet

Description

If the meta description is too long to be displayed in the snippet in search results, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="Example.com - The very best long meta descriptions online - We have one of the longest meta descriptions in the internet.">
...
Importance

If the meta description is too long to be displayed in the snippet in search results, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement. Lower user engagement might lead to negative user signals, which can be regarded as a ranking factor by modern search engines and eventually lead to worse rankings of the site in search results.

Operating Instruction

We suggest to analyze the Click-through rates from search results to URLs with a meta description, that is too long to be shown properly in search snippets. If a URL flagged with this hint performs low in terms of Click-through rate, you may want to consider shorten the meta description.

<meta keywords> found

Description

If the HTML contains meta-keywords the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta keywords tag.

Example
<meta name="keywords" content="foo, bar" />
Importance

Meta keywords have been interpreted by search engines in the early days of search technology. If a webmaster added meta keywords, it helped the search engines to determine the topical focus of a document with limited processing resources. However, major search engines don't use the meta keywords any more.

Nowadays, keywords will have no positive impact on ranking in search. In fact, they only add to the size of the document.

Operating Instruction

We suggest to remove the meta keywords tags from your site unless they are required for a specific purpose other than SEO.

<meta language> found

Description

A meta tag <meta name=language> was found. Against common believe this does not define the language of the document. In fact, it is not even defined in any standard.

Example
<html>
<head>
    <meta name="language" content="de-DE">
    ...
</head>

Expected behaviour: Document language is detected as de-DE.

Actual behaviour: Document language is detected as browser's default language.

Correct implementation:

<html lang="de-DE">
<head>
    ...
</head>
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the URL to the matching audience.

Operating Instruction

We suggest not to use a <meta name=language> element at all. Instead use the lang-attribute on the <html> tag. For more details, see W3C's advise on language settings.

<meta refresh> found

Description

If a meta-refresh is found the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta refresh.

Example
<meta http-equiv="refresh" content="5; URL=http://www.example.com/">
Importance

A meta refresh is a client side redirect that triggers a GET request after a given time. It is sometimes used to automatically forward users to another URL. This method has been used widely to manipulate search engines, so these might misinterpret usage of a meta refresh as a sneaky redirect. A meta refresh with a delay of more than one second also violates the Web Content Accessibility Guidelines. Using a meta refresh might result in bad user experience and issues with rankings in search engines.

Operating Instruction

You might consider replacing it with a HTTP-redirect using a 301 or 302 status code or even a JavaScript client side redirect, depending on requirements.

<meta> found outside <head>

Description

A <meta> tag was placed outside of the <head> section, where it may have no effects. Discover all HTML documents on the crawled website, that contain meta tags outside the HTML head area.

Examples

What we discover

<html>
<head>
...
</head>
<body>
...
<meta name="description" content="foo">
...
</body>
</html>

How it should be

<html>
<head>
...
<meta name="description" content="foo">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the meta tag outside the HTML <head> is not valid unless

  • HTML5 is used and
  • an itemprop attribute is used.

So using a meta tag outside of the head area is only viable, if it is used to specify structured data properties. Alternatively, the itemprop attribute can be defined in other tags as well, e.g. <span>, <p>, <img>, which would offer backwards compatibility to HTML versions below HTML5.

If meta tags are not used for structured data, i.e. no itemprop attribute, they are required to be in the <head> to be considered valid. If they are placed outside of the <head> they might just get ignored by search engines.

Operating Instruction

If you find meta tags that need to be in the <head>, you should move them there. In case the discovered meta tags are used for structured data, we suggest to assign the itemprop attributes to other elements in the markup. For HTML versions below HTML5 this is a requirement for valid code.

<title> found outside <head>

Description

A <title> tag was placed outside of the <head> section, where it may have no effect. Use this report to identify all occurences of misplaced HTML <title> tags.

Example
<html>
<head>
...
</head>
<body>
...
<title>Title of the Site</title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. If the title tag is placed outside the HTML <head>, it may be ignored by search engines. This may lead to issues with the search snippet and the site's ranking in search results.

Operating Instruction

If this hint shows up in your crawl report, you should move all misplaced title tags into the HTML <head> section.

<title> missing or empty

Description

If the <title> tag is missing, the URL is flagged with this hint. Use this report to identify all case of missing title tags on the crawled website.

Example

Empty Title

<html>
<head>
...
<title></title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. The document's title is the primary resource for the title of the snippet in search results. If the <title> tag is missing or empty, one of the most important ranking factors is basically left out. This will very likely harm the ranking in search results and can also harm the Click-through rate from search results for the given URLs.

Operating Instruction

If this hint shows up in your crawl report, you should add title tags to all found URLs.

<title> occurs more than once

Description

If the <title> tag is found more than once in the source code, the URL is flagged with this hint.. There should be only one title for a document.

Example
<html>
<head>
...
<title>Welcome to Example.com</title>
<title>Example.com - Best Examples in the Internet</title>
...
Importance

The HTML <title> tag is an important ranking factor, as it is literally supposed to describe the content of the document. Having more than one HTML <title> tags can therefore lead to unexpected appearance of the URL in search results. Additionally it might harm the rankings of the document.

Operating Instruction

If this hint shows up in your crawl report, you might want to make sure you only use one title tag on all found URLs.

Error scenarios like this usually appear due to different software automatically adding HTML <title> tags. If this issue occurs on a large scale, check if there is a script or CMS plugin adding HTML <title> tags to your documents.

<title> too long for Google snippet

Description

If the title of a document is too long to be displayed in the snippets in search results, the URL is flagged with this hint.

Example
<html>
<head>
...
<title>Example.com - Best Examples Site - We have one of the the longest titles in the internet</title>
...
Importance

If the title of a docuemnt is too long to be displayed in the snippet in search results, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement, which might eventually hurt your site's rankings in search results.

Operating Instruction

You might want to change the title so the title could be displayed in the snippet without beeing shortened.

Duplicate id attributes

Description

The same ID was assigned to several elements. Use this report to identify all URLs on the crawled website, that are using duplicate IDs.

Example
<h1 id="headline">headline</h1>
<h2 id="headline">headline</h2>
Importance

It is not valid to use the exact same id more than once in a document. This may cause unexpected behaviour when used with fragment links and when accessing elements via ID in JavaScript.

Operating Instruction

Only use unique IDs to identify elements in a document.

If the id is used to define stylesheets for groups of elements, classes should be used instead.

Hreflang: Found

Description

The crawler detected <link> tags with an hreflang attribute set.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
</html>
Importance

Hreflang links allow to specify a prefered version of a URL on multilingual and multi-region websites, and help search engines to display the correct version of a URL.

Operating Instruction

Use the “Hreflang: Found" hint report to identify all URLs that contain a hreflang link definition and find out how many URLs on the crawled site have hreflang link definitions.

Hreflang: Language tags in HTML and hreflang self link differ

Description

The crawler detected <link> tags with an hreflang attribute set, and a link to the current URL. However, the languages of the document and the hreflang link differ.

Setting "x-default" as hreflang is accepted and does not trigger this hint.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

First example: Languages are totally different:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="fr" rel="alternate">
</head>
</html>

Second example: Languages differ in region or other aspects, while the main language is the same:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de-DE" rel="alternate">
</head>
</html>
Importance

The languages of a document and the hreflang link should match. However, differences in region or other aspects may be desired. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

If the language of hreflang attribute and the language of the document totally differ - like for example "de" and "fr", this should be fixed.

Hreflang: Self link found, but document has no language

Description

A <link>-tag with hreflang-attribute points to current URL, but the document itself has no language set.

Example

Example for http://de.example.com/

<html>
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>

Correct implementation:

<html lang="de">
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>
Importance

The languages of a document and the hreflang link should match. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

Always assign a language to a document, when using hreflang.

Hreflang: Self link missing

Description

While the crawler detected <link> tags with an attribute hreflang set, a link to self was missing. However, a link to the same URL is mandatory.

Example

Example for http://de.example.com/

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
</head>

Correct implementation:

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
Importance

If hreflang linking is incomplete or has errors, search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

Always add a link to self, since it is required. For more details see our guide on hreflang.

Hreflang: URL empty or malformed

Description

The crawler detected <link> tags with a hreflang attribute set. However, the href attribute does contain an empty or malformed URL.

Example

Empty href

<link href="" hreflang="de" rel="alternate">

Malformed href

<link href="htp://de.example.com/" hreflang="de" rel="alternate">
Importance

Malformed or empty href in hreflang links cause hreflang definitions to be invalid. Search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

We suggest to check for malformed or empty hreflang href on a regular base.

Hreflang: URL is not absolute

Description

If the hreflang element specifies a URL relative to the current URL, it is flagged with this hint.

This report shows all occurrences of hreflang usage with URLs that are not absolute.

Example

Absolute URL

<link href="http://de.example.com/page.html" hreflang="de" rel="alternate">

Short URL

<link href="page.html" hreflang="de" rel="alternate">

Short URL - root folder relative

<link href="/page.html" hreflang="de" rel="alternate">

Short URL - protocol relative

<link href="//de.example.com/page.html" hreflang="de" rel="alternate">
Importance

Using shortened URLs for hreflang links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest using absolute URLs for hreflang links.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robots definitions.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, like in:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.

Safe HTTPS URL loads unsafe resource

Description

If a HTTPS URL contains an unsafe resource, that is loaded using HTTP, it is flagged with this hint.

Example

Exampe code for https://www.example.com/

<script type="text/javascript" src="http://www.example.com/file.js"></script>
Importance

All files, that get loaded while opening a document over HTTPS, e.g. images, fonts, stylesheets, JavaScripts, should be requested over the HTTPS protocol as well. If elements are loaded using an unsafe HTTP connection, these might get compromised by a man in the middle attack while being loaded. This can compromise the security of the SSL secured request.

If this happens, the increased risk will be reflected in the SSL symbol in all modern browsers. Instead of displaying a green SSL lock, it would be yellow, orange or red to highlight loading of unsafe resources.

Operating Instruction

In documents only available over HTTPS, you should only include files loaded via the HTTPS protocol.

URL too long for some browsers

Description

If a URL longer than 2000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL , you should stay below 2000 characters to be accessible by a large number of clients and web applications