Audisto HTML Markup Error Checker

How to detect issues with HTML on your website

Invalid HTML markup can lead to situations in which search engines fail to understand the site's structure as intended. It can also lead to issues with the rendering of the website in browsers.

Example: Audisto HTML Markup Error Check with the HTML Markup hint reports for the current crawl

Audisto HTML Markup Error Check with the HTML markup hint reports for the current crawl

Here is the list of all specific hints related to HTML markup errors, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> has both href and onclick attributes

Description

If a link with an href attribute and an onclick attribute is found, the URL is flagged with this hint.

Example

Link calling a JavaScript function with the onclick event:

<a href="http://example.com/page.html" onclick="alert('hello world');">Link</a>

Link calling a JavaScript redirect with the onclick event:

<a href="http://example.com/page.html" onclick="document.href='http://example.com/page1.html';">Link</a>
Importance

The onclick attribute defines a JavaScript action to happen when the "onclick" event for the link is triggered, i.e. the user clicks the link.

This may lead to unexpected behaviour and user experience issues for users with and without JavaScript activated.

Be aware that modern search engines like Google follow JavaScript links like these. If the JavaScript redirect leads to a different target than the HTML link, the search engine might start to mistrust the links.

Operating Instruction

We suggest that you check instances of onclick attributes in HTML links and decide if the onclick usage is required. Remove any onclick attributes that are not needed.

<a> has malformed href

Description

If a malformed href attribute value is found, the URL is flagged with this hint. A malformed href is usually a URI that is not valid according to RFC3986, or a result of a parsing error due to invalid HTML.

Examples
<a href="htp://www.example.com">link</a>
<a href="htps://www.example..com">link</a>
<a href="http://www..example.com">link</a>
<a href="http://">link</a>
<a href="htps:// www.example.com">link</a>
<a href="://www.example.com">link</a>
Importance

A link with a malformed href can not be parsed and will therefore not be recognized by search engines. In addition, links like this can result in issues with user experience.

Operating Instruction

Fix all malformed href attribute values on your website.

<a> has no content

Description

If a link without content is found, the URL is flagged with this hint.

Example
<a href="http://example.com/"></a>
Importance

If an anchor tag is empty, it is not clickable for the user. This might cause issues with the user experience. In addition, links without content can't pass anchor text to the target URL.

Operating Instruction

We suggest that you evaluate all anchor tags that have no content and fix all occurrences that are not intended as internal anchors.

<a> href attribute has leading or trailing whitespace characters

Description

If a leading or trailing whitespace gets dicovered in an HTML anchor tag, the linking URL gets flagged with this hint. Discover all URLs that link to other documents with a leading or trailing whitepace.

Examples
<a href=" http://www.example.com">link</a>

<a href="http://www.example.com/ ">link</a>

<a href=" http://www.example.com/ ">link</a>
Importance

Leading and trailing whitespaces in an HREF attribute usually get trimmed by browsers. Nonetheless, it is better to remove whitespaces. It may also indicate a problem with the code that generates the site.

Operating Instruction

We suggest removing unnecessary leading or trailing whitespaces. Reviewing the code that generates the link might be necessary.

<a> link uses an unknown protocol

Description

An anchor's href uses a protocol that is unknown to our crawler. This may be caused by a misspelling, e.g. httpa:// instead of https://, but it may also be a valid link.

Examples
<a href="httpa://example.com">link</a>

<a href="irc://example.com">link</a>

<a href="whatsapp://example.com">link</a>
Importance

If an unknown or invalid protocol is used, the href attribute will be considered invalid by search engines. This may lead to a multitude of follow up problems like crawling issues, ranking issues and user experience issues.

Operating Instruction

We suggest that you evaluate and fix all seemingly invalid protocols used in links on your site.

<a> links to fragment by name attribute on non-anchor

Description

The anchor contains a fragment link, but the target is defined by a name attribute and is not an anchor itself.

Example
<h1 name="#top">Headline</h1>
...
<a href="#top">Go to top/a>
Importance

Using the name attribute on non-anchor tags is not valid. Using the name attribute is only allowed for anchor tags. In addition, the name attribute has been deprecated since XHTML 1.0 and should not be used any more.

Operating Instruction

We suggest to remove all intances of the name attribute. You should use the id attribute instead.

<a> links to fragment by name attribute, not id

Description

The anchor contains a fragment link, like #top, but the target is defined by a name attribute. The name attribute is deprecated and won't be supported in upcoming versions of HTML.

Examples
<span name="top"></span>
...
<a href="#top">link</a>

<a name="top"></a>
...
<a href="#top">link</a>
Importance

The name attribute has been deprecated since XHTML 1.0 and should not be used anymore. In addition, it was only allowed on anchor tags.

Operating Instruction

We suggest to remove all instances of the name attribute. You should use the id attribute instead.

<a> links to fragment that was not found

Description

The anchor contains a fragment link, like #content, but there was no corresponding id or name attribute found in the document.

This report helps to identify all URLs that contain fragment-only links, where the fragment is not present in the document.

Note that the fragment #top is always defined as pointing to the top of the document, even if there is no element with a name or id attribute of "top". It therefore never triggers this hint.

Example
<a href="#content">link to fragment that does not exist</a>
Importance

If there are anchors to internal fragment-only links that are not present in the document, this may hurt the user experience and might result in worse user signals.

Operating Instruction

We suggest that you fix all occurrences of broken fragment-only links.

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to using the document's URL as the base.

Examples

A base with an invalid protocol:

<base href="htp://example.com">

A base with a white space in the domain name:

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes related to the base tag, all relative links in the document need to be checked and probably corrected.

<base> found more than once and differs

Description

More than one <base> directive are found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in a document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and the user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake:

<base href="example.com/">

Relative path on purpose:

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If you make changes to the base tag, all relative links in the document need to be checked and probably corrected.

<base> is same as URL

Description

A <base> tag was found, but it points to the same URL, thereby rendering itself useless.

Example

HTML base on http://example.com/page.html

<base href="http://example.com/page.html">
Importance

The base tag defines the URL base for relative links in a document. There is no point in using the current URL as base href as it is the same as if the base tag isn't used at all, rendering it useless.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Use this report to find all URLs on the crawled website that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> points to other URL

Description

A <base> tag was found and it points to another URL.

Example

Base on http://example.com/page.html:

<base href="http://example.com/page2.html">
Importance

The base tag defines the URL base for relative links in a document. Using a base tag together with fragment only links will make those links pointing to the specific anchor on the URL of the base tag.

If this is unintended, it leads to issues with the user experience and crawling.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<h1> not found

Description

If the HTML does not contain an h1 heading, the URL is flagged with this hint. Discover all HTML documents on the crawled website that are missing a proper h1 definition.

Example

Example of a proper h1 heading:

<h1>Heading</h1>
Importance

Content structure partly determines content quality. Headings add to structure in a document. Missing headings are indicators of poorly structured content and therefore indicate lower content quality. This may lead to lower rankings and less user interaction. The h1 is the most important heading in a document. The h1 usually corresponds to the document title.

Operating Instruction

You might want to add an h1 to your HTML.

<h1> occurs more than once

Description

If more than one <h1> tag is found, the URL is flagged with this hint. Discover all URLs that contain more than one <h1> tag.

Examples
<h1><img src="logo.jpg" alt="Example.com"/></h1>
...
<h1>Primary Headline</h1>
Importance

The h1 is the most important heading in the document and should reflect the topic of the document. Having more than one <h1> tag is a sign of poor content structure. Content structure partly determines the content quality. While it is not a huge factor for most search engines, having more than one <h1> tag may be a negative signal in terms of content quality.

Operating Instruction

You might want to use only one <h1> tag per document.

<h1>-<h6> chain broken

Description

If the HTML contains a broken <h1>-<h6> chain, the URL is flagged with this hint. Discover all URLs on the crawled website, that have a broken heading chain.

Examples
<h1>heading</h1>
...
<h3>heading</h3>
...
<h4>heading</h4>
Importance

A heading chain is considered broken when it is not hierarchically strict. A broken heading chain can therefore be an indicator of poor content structure. This is important because content structure partly determines content quality. While this is a minor factor for most search engines, positive content structure signals may add to the ranking of a document.

Operating Instruction

We suggest that you keep a strictly hierarchical headline chain. Evaluate the occurences of broken heading chains and consider adding the missing headings.

<html> contains too many uncommon non-printable characters

Description

The HTML document contains too many uncommon non-printable characters, and not all will be shown in live analysis. With this report you can discover all URLs on the crawled website that contain more than 50 uncommon non-printable characters. See the corresponding hint "<html> contains uncommon non-printable characters" for further information about what is "uncommon".

Importance

Non-printable characters are used as control characters and may not be visible in the source code, but nonetheless impact the behavior of the site. This might affect crawling and the user experience when they are inside of an anchor's href or an image's src attribute, possibly resulting in issues with the site's structure and ranking.

Finding too many non-printable characters may be an indication for massive encoding issues in a document or documents that are not HTML documents.

Operating Instruction

Non-printable characters generally should be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non-printable characters and probably remove them.

<html> contains uncommon non-printable characters

Description

If uncommon non-printable characters are detected, the URL of the document containing the character is flagged.

There are non printable characters that will appear in almost every document, e.g. line feed (\n), carriage return (\r), horizontal tab (\t). In addition, there are commonly used non-printable characters, e.g. BOM, Soft hyphen, Left-To-Right-Mark and Right-To-Left-Mark. These characters do not cause the URL to get flagged with this hint. This hint detects all other non printable characters.

Examples

Due to the non printable nature of these characters, you'll find the character codes instead of the actual characters in the live analysis, enclosed by brackets:

[[&#x2060;]]
Importance

Non-printable characters may not be visible in the source code, but nonetheless impact:

  • the behaviour of the site, e.g. when they are inside of an anchor's href or an image's src attribute
  • the ranking of the site, e.g. when they are an invisible part of a word

This might affect crawling and user experience, possibly resulting in issues with accessibility and ranking.

Usually this hint is triggered by problematic encoding.

Operating Instruction

Non-printable characters should generally be encoded as HTML entities and removed whenever possible. If validating transferred data in an application, the validation should check for non printable characters and probably remove them.

<html> contains unencoded Left-To-Right-Mark or Right-To-Left-Mark

Description

A Left-To-Right or Right-To-Left-Mark was found, but it is unescaped. Discover all URLs that contain an unescaped Left-To-Right-Mark or an unescaped Right-To-Left-Mark.

Examples
Character Name Detected Character HTML Entity (named) HTML Entity (decimal) HTML Entity (hex)
Left-To-Right-Mark U+200E &lrm; &#8206; &#x200e;
Right-To-Left-Mark U+200F &rlm; &#8207; &#x200f;
Importance

The Left-To-Right and Right-To-Left-Mark are non-printable characters used for typesetting of bi-directional text. The Left-To-Right- or Right-To-Left mark are not visible and, if used without being properly escaped, may lead to a range of unexpected problems that are hard to track down due to the invisible nature of these characters:

  • Issues with the appearance of the website
  • Issues with characters ending up being used in a URL
Operating Instruction

If unencoded Left-To-Right or Right-To-Left-Marks are discovered:

  • escape them or
  • remove them completely and
  • if the functionality is required, switch to a CSS solution

whenever possible.

If escaping the characters, the named HTML entities (&lrm; and &rlm;) are recommended over decimal or hex HTML entities.

<html> contains unencoded soft hyphen (SHY)

Description

If an unescaped soft hyphen was found, the URL is flagged with this hint. Discover all URLs on the crawled website, that contain unencoded soft hyphens.

Examples
Character Name Detected Character HTML Entity (named) HTML Entity (decimal) HTML Entity (hex)
soft hyphen U+00AD &shy; &#173; &#xad;
Importance

The unencoded soft hyphen is a character that is used for hyphenation of words. The soft hyphen is only visible if the word needs to be hyphenated on a line break. This characteristic can lead to:

  • unexpected hyphenation
  • hard to track down issues with the appearance of the website
  • the character ending up in a URL
Operating Instruction

If unencoded soft hyphens are discovered, escape them or remove them completely if they are not required.

If escaping the characters, using the named HTML entity (&shy;) is recommended over the decimal or hex HTML entity.

<html> starts with BOM

Description

There is an unicode byte order mark (BOM) at the top of the HTML. Discover all URLs on the crawled website that contain a BOM.

We currently detect BOM in the following encoding:

  • UTF-8
  • UTF-16 BE/LE
  • UTF-32 BE/LE
  • UTF-7
  • UTF-1
  • UTF-EBCDIC
  • SCSU
  • BOCU-1
  • GB-18030
Examples

Example UTF-8 BOM in HTML 5:

EF BB BF<!DOCTYPE html>
<html lang="en">

How BOM looks in different encoding and representations:

Encoding BOM hex BOM dec
UTF-8 EF BB BF 239 187 191
UTF-16 (BE) FE FF 254 255
UTF-16 (LE) FF FE 255 254
UTF-32 (BE) 00 00 FE FF 0 0 254 255
UTF-32 (LE) FF FE 00 00 255 254 0 0
UTF-7 2B 2F 76 38 43 47 118 56
2B 2F 76 39 43 47 118 57
2B 2F 76 2B 43 47 118 43
2B 2F 76 2F 43 47 118 47
2B 2F 76 38 2D 43 47 118 56 45
F7 64 4C 247 100 76
UTF-EBCDIC DD 73 66 73 221 115 102 115
SCSU 0E FE FF 14 254 255
BOCU-1 FB EE 28 251 238 40
GB-18030 84 31 95 33 132 49 149 51
Importance

The unicode BOM is the unicode character U+FEFF. Some text editors add it to documents. The BOM is used to signal:

  • the byte order, or endianness
  • the fact that the text is unicode
  • the specific unicode encoding

Having a unique BOM at the top of the HTML is valid but might result in problems with third party software. As of HTML5, a BOM is supposed to override the charset definition from the HTTP header. If the BOM is used for charsets that are not unicode, this might lead to encoding problems. Encoding problems may lead to issues with the appearance of the site in browsers and search engines and therefore lead to issues with the user experience.

Operating Instruction

You should consider removing the BOM and specify the encoding in the HTTP header or as a meta tag in the HTML <head>.

<img> alt attribute exists but empty

Description

If an image with an empty alt attribute is found, the URL is flagged with this hint.

Example
<img src="image.jpg" alt="" />
Importance

The alt attribute defines the alternative information that will be shown if the image file fails to load.

The alt-attribute is one of the factors that is used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt attribute, you not only help search engines understand your site and rank your images, but also add to accessibility for disabled users.

Operating Instruction

If the image is not of a decorative nature and contains information, you should consider adding a value for the alt-attribute.

If the image is of a decorative nature (i.e. spacer images, backgrounds, style enhancements with images in general) or functional nature (i.e. tracking pixel), you should consider using a more advanced technical solution like CSS or Javascript.

<img> has no alt attribute

Description

If an image without an alt attribute is found, the URL is flagged with this hint. This report helps to identify all missing alt attributes.

Example
<img src="file.jpg"/>
Importance

In terms of HTML validation, alt attributes are required for images. The alt attribute defines the altermative information that will be shown if the image file fails to load.

The alt attribute is one of the factors that is used by search engines to determine the topic of the image. It also represents an alternative to blind users. By supplying a proper alt attribute, you not only help search engines understand your site, but also, you enhance accessibility for disabled users.

Operating Instruction

Add alt attributes in all cases where it is missing.

Use a descriptive alt-attribute for images that contain information. You may use an empty alt-attribute if the images are only for decoration.

<link rel=canonical> URL is not absolute

Description

If the canonical element specifies a URL relative to the document's URL, document's URL is flagged with this hint.

This report shows all occurrences of canonical usage with URLs that are not absolute.

Examples

Absolute URL:

<link rel="canonical" href="http://example.com/folder/page.html">

Short URL:

<link rel="canonical" href="page.html">

Short URL - root folder relative:

<link rel="canonical" href="/folder/page.html">

Short URL - protocol relative:

<link rel="canonical" href="//example.com/folder/page.html">
Importance

Using shortened URLs for canonical links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest that you use absolute URLs for canonical links.

<link rel=canonical> contains malformed or empty href

Description

This hint identifies all occurrences of canonical elements that contain an empty or invalid target URL.

Examples

Empty canonical:

<link rel="canonical" href="">

Malformed canonical:

<link rel="canonical" href="htp://example.com/">
Importance

Malformed or empty hrefs in canonical links cause canonical definitions to be invalid and can cause issues with duplicate content when a document is available on more than one URL.

Operating Instruction

We suggest that you check for malformed or empty canonical hrefs on a regular basis.

<link rel=canonical> found outside <head>

Description

A canonical element was placed outside of the <head> section and so search engines will ignore it.

This report helps you to identify all occurrences of canonical definitions that are invalid due to being placed outside the <head> tag on the crawled website.

Example
<html>
  <head>
    ...
  </head>
  <body>
    ...
    <link rel="canonical" href="http://example.com/">
    ...
  </body>
</html>
Importance

Some search engines ignore improper canonical definitions. If canonical definitions get ignored by search engines, this might cause issues with duplicate content and with the representation of the site in search results.

Operating Instruction

Keep your canonical definitions inside the HTML <head> tag, so they don't get ignored by search engines.

<link rel=canonical> found twice

Description

More than one canonical element was found, either as a <link> tag with rel="canonical" or an according link header.

This report shows all URLs with double canonical definitions on your website, that we were able to identify.

Examples

HTML Head:

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/">

HTTP Header:

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/>; rel="canonical"

HTML Head & HTTP Header:

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/>; rel="canonical"
Importance

Using more than one canonical link element can cause conflicting definitions or unexpected behaviour when documents are available on more than one URL at a time.

Operating Instruction

We suggest that you identify all URLs that have more than one canonical link element defined. We also suggest looking for the reason behind the double definition, as this problem usually can be traced back to third-party code (plugins, extensions and add-ons of the CMS).

If canonical definitions are found twice in a document, this often occurs due to usage of multiple SEO plugins or an SEO plugin in combination with manual canonical definitions.

<link rel=canonical> found twice and differs

Description

More than one canonical element has been found, either as a <link> tag with rel="canonical" or an according Link header. Additionally, they specify different targets.

This report allow you to identify all occurrences of double canonical definitions with conflicting target URLs.

Examples

HTML Head:

<link rel="canonical" href="http://example.com/">
<link rel="canonical" href="http://example.com/page1.html">

HTTP Header:

Link: <http://example.com/>; rel="canonical"
Link: <http://example.com/page1.html>; rel="canonical"

HTML Head & HTTP Header:

<link rel="canonical" href="http://example.com/">

Link: <http://example.com/page1.html>; rel="canonical"
Importance

Having more than one canonical link element with different target URLs in a document can cause search engines to ignore the canonical definitions. This might lead to issues with duplicate content.

Operating Instruction

We suggest that you correct all conflicting canonical definitions by removing the unnecessary definition.

<link> found outside <head>

Description

A <link> tag was placed outside of the <head> section where it may have no effect. Discover all HTML documents on the crawled website that contain link tags outside the HTML head area.

Examples

What we discover:

<html>
<head>
...
</head>
<body>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</body>
</html>

How it should be:

<html>
<head>
...
<link rel="stylesheet" type="text/css" href="style.css">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the link tag outside the HTML <head> is not valid. This may lead to unexpected behaviour or appearance of the website with some clients. Even though modern browsers are using a range of methods to autocorrect this type of common issue, it is not suggested to rely on the browser's ability to guess what the webmaster intended to achieve.

If the link tag is used to reference an external stylesheet file, browsers use the information from the linked CSS file to render the site based on that information. If stylesheets have to be processed in the middle, or at the end of the document, the browser will have to re-render the entire document based on the given changes. This can lead to a drop in performance and user experience.

If a canonical link element (<link rel="canonical" href="http://www.example.org" />) is used outside of the HTML <head>, it will be ignored by search engines. This might lead to issues with duplicate content, e.g. unexpected behaviour of the website in search results and ranking problems.

Operating Instruction

We suggest that you move misplaced link tags to the <head>.

<meta description> missing or empty

Description

If the meta description is missing or empty, the URL is flagged with this hint. Use this report to identify all URLs that are missing a proper meta description.

Example
<html>
<head>
...
<meta name="description" content="">
...
Importance

The meta description is usually the first choice for the description text that appears in search result snippets. If the meta description is missing, you give up control over the appearance of your pages in search results. Search engines will instead use parts of the page content as a description, which might lead to unexpected appearance of a site's snippets in search results.

Operating Instruction

We suggest that you use proper meta descriptions for all pages that are supposed to be indexed by search engines.

<meta description> occurs more than once

Description

If a meta description is found more than once in the HTML, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="First meta description">
<meta name="description" content="Second meta description">
...
Importance

Having more than one meta description can lead to unpredictable display of the document in search results. This may result in lower user engagement and therefore a drop in user signals for your website. This may eventually hurt the rankings of your website.

Operating Instruction

There should only be one meta description for a document.

Error scenarios like this usually appear due to different software automatically adding meta descriptions. If this issue occurs on a large scale, check if there is a script or CMS plugin automatically adding meta descriptions to your webpages.

<meta description> too long for Google snippet

Description

If the meta description is too long to be displayed in the snippet in search results, the URL is flagged with this hint.

Example
<html>
<head>
...
<meta name="description" content="Example.com - The very best long meta descriptions online - We have one of the longest meta descriptions on the internet.">
...
Importance

If the meta description is too long to be displayed in search result snippets, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement. Lower user engagement might lead to negative user signals, which can be regarded as a ranking factor by modern search engines and eventually lead to worse rankings of the site in search results.

Operating Instruction

We suggest that you analyze the click-through rates from search results to URLs that have a meta description which is too long to be shown properly in search snippets. If a URL flagged with this hint performs poorly in terms of click-through rate, you may want to consider shortening the meta description.

<meta keywords> found

Description

If the HTML contains meta keywords, the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta keywords tag.

Example
<meta name="keywords" content="foo, bar" />
Importance

Meta keywords were interpreted by search engines in the early days of search technology. If a webmaster added meta keywords, it helped the search engines to determine the topical focus of a document with limited processing resources. However, major search engines don't use the meta keywords any more.

Today, keywords will have no positive impact on ranking in search. In fact, they only add to the size of the document.

Operating Instruction

We suggest that you remove the meta keywords tags from your site unless they are required for a specific purpose other than SEO.

<meta language> found

Description

A meta tag <meta name=language> was found. Against common belief this does not define the language of the document. In fact, it is not even defined in any standard.

Example
<html>
<head>
    <meta name="language" content="de-DE">
    ...
</head>

Expected behaviour: Document language is detected as de-DE.

Actual behaviour: Document language is detected as browser's default language.

Correct implementation:

<html lang="de-DE">
<head>
    ...
</head>
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience.

Operating Instruction

We suggest that you not use a <meta name=language> element at all. Instead use the lang-attribute in the <html> tag. For more details, see W3C's advisory on language settings.

<meta refresh> found

Description

If a meta refresh is found, the URL is flagged with this hint. Discover all HTML documents on the crawled website that contain a meta refresh.

Example
<meta http-equiv="refresh" content="5; URL=http://www.example.com/">
Importance

A meta refresh is a client side redirect that triggers a GET request after a given amount of time. It is sometimes used to automatically forward users to another URL. This method has been used widely to manipulate search engines, so search engines might misinterpret usage of a meta refresh as a sneaky redirect. A meta refresh with a delay of more than one second also violates the Web Content Accessibility Guidelines. Using a meta refresh might result in a poor user experience as well as issues with rankings in search engines.

Operating Instruction

You might consider replacing meta refresh with an HTTP-redirect using a 301 or 302 status code, or even a JavaScript client-side redirect, depending on the requirements.

<meta> found outside <head>

Description

A <meta> tag was placed outside of the <head> section where it may have no effect. Discover all HTML documents on the crawled website that contain meta tags outside the HTML head area.

Examples

What we discover:

<html>
<head>
...
</head>
<body>
...
<meta name="description" content="foo">
...
</body>
</html>

How it should be:

<html>
<head>
...
<meta name="description" content="foo">
...
</head>
<body>
...
</body>
</html>
Importance

Placing the meta tag outside of the HTML <head> is not valid unless:

  • HTML5 is used and
  • an itemprop attribute is used

So, using a meta tag outside of the head area is only viable if it is used to specify structured data properties. Alternatively, the itemprop attribute can be defined in other tags as well, e.g. <span>, <p>, <img>, which would offer backwards compatibility to HTML versions below HTML5.

If meta tags are not used for structured data, i.e. no itemprop attribute, they are required to be in the <head> to be considered valid. If they are placed outside of the <head> they might just get ignored by search engines.

Operating Instruction

If you find meta tags that need to be in the <head>, you should move them there. In cases where the discovered meta tags are used for structured data, we suggest that you assign the itemprop attributes to other elements in the markup. For HTML versions below HTML5, this is a requirement for valid code.

<title> found outside <head>

Description

A <title> tag was placed outside of the <head> section, where it may have no effect. Use this report to identify all occurences of misplaced HTML <title> tags.

Example
<html>
<head>
...
</head>
<body>
...
<title>Title of the Site</title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. If the title tag is placed outside the HTML <head>, it may be ignored by search engines. This may lead to issues with the search snippet and the site's ranking in search results.

Operating Instruction

If this hint shows up in your crawl report, you should move all misplaced title tags into the HTML <head> section.

<title> missing or empty

Description

If the <title> tag is missing, the URL is flagged with this hint. Use this report to identify all cases of missing <title> tags on the crawled website.

Example

This is an example of an empty title tag:

<html>
<head>
...
<title></title>
...
Importance

The <title> tag is a very important element for search engine optimization and should always be set. The document's title is the primary resource for the title of the snippet in search results. If the <title> tag is missing or empty, one of the most important ranking factors is basically left out. This will very likely harm the ranking in search results and can also harm the click-through rate from search results for the given URLs.

Operating Instruction

If this hint shows up in your crawl report, you should add <title> tags to all discovered URLs.

<title> occurs more than once

Description

If the <title> tag is found more than once, the URL is flagged with this hint. There should only be one title per page.

Example
<html>
<head>
...
<title>Welcome to Example.com</title>
<title>Example.com - Best Examples on the Internet</title>
...
Importance

The HTML <title> tag is an important ranking factor, as it is literally supposed to describe the content of a page. Having more than one <title> tag can therefore lead to unpredictable displays of the webpage in search results. Additionally, it may harm the rankings of the page.

Operating Instruction

If this hint shows up in your crawl report, you might want to make sure you only use one <title> tag on all discovered URLs.

Error scenarios like this usually appear due to different software automatically adding <title> tags. If this issue occurs on a large scale, check if there is a script or CMS plugin adding <title> tags to your pages.

<title> too long for Google snippet

Description

If the title of a document is too long to be displayed in search results snippets, the URL is flagged with this hint.

Example
<html>
<head>
...
<title>Example.com - Best Examples Site - We have one of the the longest titles on the internet</title>
...
Importance

If the title of a docuemnt is too long to be displayed in search result snippets, it will be shortened by the search engine. This usually results in less appealing snippets and lower user engagement, which might eventually hurt your site's rankings in search results.

Operating Instruction

You might want to change the title so that the title displays in the snippet without beeing shortened.

Duplicate id attributes

Description

The same ID was assigned to several elements. Use this report to identify all URLs on the crawled website, that are using duplicate IDs.

Example
<h1 id="heading">heading</h1>
<h2 id="heading">heading</h2>
Importance

It is not valid to use the exact same id more than once in a document. This may cause unexpected behaviour when used with fragment links and when accessing elements via ID in JavaScript.

Operating Instruction

Only use unique IDs to identify elements in a document.

If the id is used to define stylesheets for groups of elements, classes should be used instead.

Hreflang: Found

Description

The crawler detected <link> tags with an hreflang attribute set.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
</html>
Importance

Hreflang links allow you to specify a preferred version of a webpage on multilingual and multi-region websites, and help search engines to display the correct version of a webpage.

Operating Instruction

Use the "Hreflang: Found" hint report to identify all URLs that contain a hreflang link definition and find out how many URLs on the crawled site have hreflang link definitions.

Hreflang: Language tags in HTML and hreflang self link differ

Description

The crawler detected <link> tags with an hreflang attribute set, and a link to the current URL. However, the languages of the document and the hreflang link differ.

Setting "x-default" as hreflang is accepted and does not trigger this hint.

Example

The following code snippets trigger this hint. All examples are for http://de.example.com/.

First example - Languages are totally different:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="fr" rel="alternate">
</head>
</html>

Second example - Languages differ in region or other aspects, while the main language is the same:

<html lang="de">
<head>
    <link href="http://de.example.com/" hreflang="de-DE" rel="alternate">
</head>
</html>
Importance

The languages of a document and the hreflang link should match. However, differences in region or other aspects may be desired. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

If the language of the hreflang attribute and the language of the document totally differ, for example "de" and "fr", this should be fixed.

Hreflang: Self link found, but document has no language

Description

A <link> tag with hreflang attribute points to the current URL, but the document itself has no language set.

Example

Example for http://de.example.com/:

<html>
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>

Correct implementation:

<html lang="de">
    <head>
        <link href="http://de.example.com/" hreflang="de" rel="alternate">
        <link href="http://en.example.com/" hreflang="en" rel="alternate">
        <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    </head>
</html>
Importance

The languages of a document and the hreflang link should match. If hreflang linking is incomplete or erroneous, search engines may discard hreflang related information completely.

Operating Instruction

Always assign a language to a document, when using hreflang.

Hreflang: Self link missing

Description

The crawler detected <link> tags with an attribute hreflang set but a link to self was missing. However, a link to the same URL is mandatory.

Example

Example for http://de.example.com/:

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
</head>

Correct implementation:

<head>
    <link href="http://en.example.com/" hreflang="en" rel="alternate">
    <link href="http://fr.example.com/" hreflang="fr" rel="alternate">
    <link href="http://de.example.com/" hreflang="de" rel="alternate">
</head>
Importance

If hreflang linking is incomplete or has errors, search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

Always add a link to self, since it is required. For more details see our guide on hreflang.

Hreflang: URL empty or malformed

Description

The crawler detected <link> tags with an hreflang attribute set. However, the href attribute contains an empty or malformed URL.

Example

Empty href:

<link href="" hreflang="de" rel="alternate">

Malformed href:

<link href="htp://de.example.com/" hreflang="de" rel="alternate">
Importance

Malformed or empty href in hreflang links cause hreflang definitions to be invalid. Search engines may discard hreflang related information completely. This may lead to inappropriate URLs showing up in the localized search results.

Operating Instruction

We suggest that you check for malformed or empty hreflang hrefs on a regular basis.

Hreflang: URL is not absolute

Description

If the hreflang element specifies a URL relative to the current URL, it is flagged with this hint.

This report shows all occurrences of hreflang usage with URLs that are not absolute.

Example

Absolute URL:

<link href="http://de.example.com/page.html" hreflang="de" rel="alternate">

Short URL:

<link href="page.html" hreflang="de" rel="alternate">

Short URL - root folder relative:

<link href="/page.html" hreflang="de" rel="alternate">

Short URL - protocol relative:

<link href="//de.example.com/page.html" hreflang="de" rel="alternate">
Importance

Using shortened URLs for hreflang links can lead to several kinds of duplicate content issues:

  • duplicate content issues with different protocol versions
  • duplicate content issues with different domains
  • duplicate content issues with different folders
Operating Instruction

We suggest that you use absolute URLs for hreflang links.

Language: Content-Language http-equiv meta element found

Description

If a Content-Language declaration was found as http-equiv meta element the URL is flagged with this hint.

Example

Http-equiv meta element:

<head>
...
<meta http-equiv="Content-Language" content="en">
...
</head>
Importance

Against common believe Content-Language describes the language of the intended audience of a page and not the language of the content of the page. Due to long-standing confusions and inconsistent implementations of this element, the HTML5 specification made this non-conforming in HTML. You should no longer use Content-Language as http-equiv meta element.

Operating Instruction

Remove the Content-Language http-equiv meta element. If you intended to specify the language of the content of the page we suggest that you carefully read the W3C's advisory on language settings and use the lang attribute and the xml:lang attribute to specify the language of your document.

Language: Invalid

Description

The language specification of the document does not follow some basic rules for language tags.

Example

Empty language definition;

<html lang="">

Incorrect language definition:

<html lang="en_US">

Correct implementation:

<html lang="en-US">
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience.

Operating Instruction

We suggest that you carefully read the [W3C's advisory on language settings] and use the lang attribute and the xml:lang attribute to specify the language of your document.

Language: Not set on document

Description

If no proper language definition was found for the HTML document, the URL is flagged with this hint.

Example

Attribute "lang" is empty:

<html lang="">

Attribute "lang" is not set:

<html>

Instead use the lang attribute for HTML documents:

<html lang="en">

Use the lang attribute and the xml:lang attribute for XHTML documents:

<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience.

Operating Instruction

We suggest that you not use the lang attribute and the xml:lang attribute to specify the language of your document. For more details, see W3C's advisory on language settings.

Language: Set multiple times and differs

Description

If the document language was set multiple times and differs, the URL is flagged with this hint.

Example

XHTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang="en" xml:lang="fr" xmlns="http://www.w3.org/1999/xhtml">

HTML5 Polyglot:

<!DOCTYPE html>
<html lang="en" xml:lang="fr" xmlns="http://www.w3.org/1999/xhtml">
Importance

Correct language settings can be crucial for localized content, since it allows search engines to display the best results for the matching audience. In case of multiple conflicting language specifications the correct language has to be guessed.

Operating Instruction

We suggest that you carefully read the [W3C's advisory on language settings] and use the lang attribute and the xml:lang attribute to specify the language of your document. Make sure you don't have conflicting specifications.

Links: Non-standard rel attribute value found

Description

If any of an URL's outgoing links has a rel attribute that is not an official link type, the URL is flagged with this hint.

Possible link types are defined in several HTML specs. Mozilla gives a comprehensive list. We additionally support the common values apple-touch-icon and apple-touch-startup-image.

Example

Non-standard rel attribute:

<head>
...
<link rel="me" href="https://www.facebook.com/Audisto.GmbH/">
...
</head>
Importance

Predefined link types are tied to specific behavior of browsers and provide important information for search engines. Non-standard values on rel attributes do not provide any information, so they can be considered useless.

Operating Instruction

If you use non-standard rel attribute values, check if they can be omitted. If they are used by some JavaScript code, try to rewrite them into data attributes. If you use them in the context of microformates, try to port your code to schema.org vocabulary.

Robots: Directive probably misspelled

Description

We found a meta tag that has a name similar to "robots", but not exactly "robots" (e.g. "robot"). Use this report to identify all websites that use meta-tags similar to "robots".

This hint will trigger for all tag names that can be transformed into "robots" with up to two edits.

Examples
<meta name="robot" content="index,follow">
<meta name="robost" content="index,follow">
<meta name="bots" content="index,follow">
Importance

Some robots might be fault tolerant when handling the robots meta tag and therefore accept misspelled versions.

Misspelled version are not guaranteed to work for all search engine robots which leads to unexpected behaviour.

Operating Instruction

We suggest that you specify your robots directives with the correctly spelled version:

<meta name="robots" content="index,follow">

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robot definitions.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of a robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtle way to produce this error would be conflicting definitions by omiting parts of the directive, such as:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

Robots: noindex differs across specifications

Description

There is more than one source for robots directives, either a robots meta tag or a X-Robots-Tag header. At least one specifies "noindex" while another does not.

Examples

Differing robots directives across specifications could look like this:

Robots Meta tag in HTML header:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="noindex, nofollow">

Note: a more subtile way to produce this error would be conflicting definitions by omiting parts of the directive, such as:

<meta name="robots" content="noindex, follow">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag differ:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="noindex, follow">
Importance

The "noindex" robots directive tells crawlers not to index the current document.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots noindex directive.

Safe HTTPS webpage loads unsafe resource

Description

If an HTTPS webpage contains an unsafe resource, that is loaded using HTTP, it is flagged with this hint.

Example

Exampe code for https://www.example.com/:

<script type="text/javascript" src="http://www.example.com/file.js"></script>
Importance

All files that get loaded while opening a document over HTTPS, e.g. images, fonts, stylesheets, JavaScripts, should be requested over the HTTPS protocol as well. If elements are loaded using an unsafe HTTP connection, these might get compromised by a man in the middle attack while being loaded. This can compromise the security of the SSL secured request.

If this happens, the increased risk will be reflected in the SSL symbol in all modern browsers. Instead of displaying a green SSL lock, it would be yellow, orange or red to highlight loading of unsafe resources.

Operating Instruction

In documents only available over HTTPS, you should only include files loaded via the HTTPS protocol.

URL too long for some browsers

Description

If a URL longer than 2,000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL, you should stay below 2,000 characters to be accessible by a large number of clients and web applications.