Audisto Link Error Checker

How to detect issues with Links on your website

Links are amongst the most essential elements on a website. Links are the foundation for crawling. Search engines follow links on websites to discover new URL, which they can crawl later on. Link data is also used to determine relevancy and importance of documents.

Having a bad internal link structure might result in crawling issues, e.g. the crawler runs in circles, causing a bad crawl rate on URLs deeper in the site’s structure. It can also lead to having a poor structure in general, with a lot of orphan pages and a lot of levels.

Ultimately the combination of the previous problems may lead to ranking issues, hurting the site's presence in search results.

With this hints section, you can identify the most common link related issues on a website.

Example: Audisto Link Error Check with the links hint reports for the current crawl

Audisto Links Error Check with the link hint reports for the current crawl

Here is the list of all specific hints related to links, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> has both href and onclick attributes

Description

If a link with an href attribute and an onclick attribute is found, the URL is flagged with this hint.

Example

Link calling a JavaScript function with the onclick event:

<a href="http://example.com/page.html" onclick="alert('hello world');">Link</a>

Link calling a JavaScript redirect with the onclick event:

<a href="http://example.com/page.html" onclick="document.href='http://example.com/page1.html';">Link</a>
Importance

The onclick attribute defines a JavaScript action to happen when the "onclick" event for the link is triggered, i.e. the user clicks the link.

This may lead to unexpected behaviour and user experience issues for users with and without JavaScript activated.

Be aware that modern search engines like Google follow JavaScript links like these. If the JavaScript redirect leads to a different target than the HTML link, the search engine might start to mistrust the links.

Operating Instruction

We suggest that you check instances of onclick attributes in HTML links and decide if the onclick usage is required. Remove any onclick attributes that are not needed.

<a> has malformed href

Description

If a malformed href attribute value is found, the URL is flagged with this hint. A malformed href is usually a URI that is not valid according to RFC3986, or a result of a parsing error due to invalid HTML.

Examples
<a href="htp://www.example.com">link</a>
<a href="htps://www.example..com">link</a>
<a href="http://www..example.com">link</a>
<a href="http://">link</a>
<a href="htps:// www.example.com">link</a>
<a href="://www.example.com">link</a>
Importance

A link with a malformed href can not be parsed and will therefore not be recognized by search engines. In addition, links like this can result in issues with user experience.

Operating Instruction

Fix all malformed href attribute values on your website.

<a> has no content

Description

If a link without content is found, the URL is flagged with this hint.

Example
<a href="http://example.com/"></a>
Importance

If an anchor tag is empty, it is not clickable for the user. This might cause issues with the user experience. In addition, links without content can't pass anchor text to the target URL.

Operating Instruction

We suggest that you evaluate all anchor tags that have no content and fix all occurrences that are not intended as internal anchors.

<a> has no href attribute

Description

If an anchor tag without an href attribute is found, the URL is flagged with this hint. Use this report to identify all URLs that contain <a> tags without an href attribute.

Example
<a id="internal-anchor">Anchor</a>
Importance

HTML anchor tags without href attribute are usually internal jump marks. Anchor tags without href attribute can be used for internal anchors.

The error case would be when a hyperlink was intended but the href attribute is not set.

Operating Instruction

Evaluate all <a> tags without an href attribute. If the intention was to set a hyperlink, then fix the markup. If the intention was to set an anchor, you might consider adding an ID to another element instead of the <a> tag that was used before, e.g. a span, a div or a headline. Any tag can be defined as a jump mark by defining the id attribute.

<a> href attribute has leading or trailing whitespace characters

Description

If a leading or trailing whitespace gets dicovered in an HTML anchor tag, the linking URL gets flagged with this hint. Discover all URLs that link to other documents with a leading or trailing whitepace.

Examples
<a href=" http://www.example.com">link</a>

<a href="http://www.example.com/ ">link</a>

<a href=" http://www.example.com/ ">link</a>
Importance

Leading and trailing whitespaces in an HREF attribute usually get trimmed by browsers. Nonetheless, it is better to remove whitespaces. It may also indicate a problem with the code that generates the site.

Operating Instruction

We suggest removing unnecessary leading or trailing whitespaces. Reviewing the code that generates the link might be necessary.

<a> href is empty

Description

If a link with an empty href attribute or an href that contains only whitespace is found, the URL is flagged with this hint.

Examples
<a href="">link</a>
<a href="      ">link</a>
Importance

An empty href attribute usually indicates technical issues in the coding of the website.

Operating Instruction

We suggest that you evaluate the reasons for empty href attributes and fix the underlying issues.

<a> link contains user and password

Description

An anchor's href contains user and password, such as http://user:password@example.com. This may not be desired, and it is also not supported by Internet Explorer as of version 10.

Example
http://user:password@example.com
Importance

This hint points out a serious security issue. Login data should not usually be linked to directly on a website. The login data may be crawled with malicious intent and abused later on.

Operating Instruction

If instances of this hint are discovered, we suggest removing all links from your website that contain login data.

<a> link uses a websocket protocol

Description

An anchor's href uses a websocket protocol (either ws: or wss:).

Example
<a href="ws:example.com">link</a>
<a href="wss:example.com">link</a>
Importance

Using the websocket protocol can be useful for full-duplex (bi-directional) communication channels, e.g. chat applications. However, for websites it is rather uncommon / exotic, epecially directly linking to files via the websocket protocol.

Operating Instruction

We suggest that you evaluate if there are good reasons to use the websocket protocol. If this is not the case, use HTTP or HTTPS instead.

<a> link uses an unknown protocol

Description

An anchor's href uses a protocol that is unknown to our crawler. This may be caused by a misspelling, e.g. httpa:// instead of https://, but it may also be a valid link.

Examples
<a href="httpa://example.com">link</a>

<a href="irc://example.com">link</a>

<a href="whatsapp://example.com">link</a>
Importance

If an unknown or invalid protocol is used, the href attribute will be considered invalid by search engines. This may lead to a multitude of follow up problems like crawling issues, ranking issues and user experience issues.

Operating Instruction

We suggest that you evaluate and fix all seemingly invalid protocols used in links on your site.

<a> link uses data: protocol

Description

An anchor's href uses the data: protocol. Detect all documents on the crawled website that contain links using the data: protocol.

Example
<a href="data:image/png;base64,(...base64-encoded png data...)"></a>
Importance

The data: protocol can be used to provide inline data elements, e.g. images, fonts, JavaScripts and other files, without requiring an additional request. It is not fully supported by Internet Explorer. This might cause issues with accessibility and user experience.

Operating Instruction

Evaluate all cases in which a link uses the data: protocol on the crawled website and decide wether it has to be replaced or not.

<a> link uses file: protocol

Description

An anchor's href uses the file: protocol, which is used to open files on the user's computer.

Example
<a href="file://C:/programs/filename.html">link</a>
Importance

The file: protocol is used to reference local files. If used on a public website, it might lead to unexpected issues with the user experience.

Operating Instruction

We suggest removing all references with the file: protocol on your website.

<a> link uses ftp: protocol

Description

An anchor's href uses the ftp: protocol to link to an FTP server.

Example
<a href="ftp://example.com/file.zip">download</a>
Importance

Using the FTP protocol in a HTML link will cause the link to be opened with the default FTP client, if there is one installed on the user's system. If there is no distinct FTP client, this might also be the browser. While this is valid, it can lead to unexpected user experiences, as further behaviour is fully dependent on the user's default FTP client.

Operating Instruction

In order to avoid any kind of issue with an FTP client, we suggest evaluating all instances of links that are using the FTP protocol. We suggest that you provide an HTTP/HTTPs download whenever possible, so that the links are within the site or within the control of the webmaster.

<a> link uses javascript: protocol

Description

An anchor's href uses the javascript: protocol, which may execute arbitrary JavaScript code.

Example
<a href="javascript:alert('hello world');">link</a>
Importance

Using the javascript: protocol might be a security issue if malicious code was injected. It might also cause unexpected user experience issues.

Operating Instruction

You should consider validating all links that use the javascript: protocol. In particular, focus on links that were provided in user-generated content or by any other third party.

<a> link uses mailto: protocol

Description

An anchor's href uses the mailto: protocol.

Example
<a href="mailto:info@example.com">link</a>
Importance

Using the mailto: protocol in an HTML link will cause the link to open the default email client installed on the user's system.

Spam bots use links like this to harvest email adresses for their email spam campaigns.

Operating Instruction

We suggest not having these kind of links on your website. Links with the mailto protocol should be replaced by contact forms to avoid email spam.

<a> link uses sftp: protocol

Description

An anchor's href uses the sftp: protocol to link to an SFTP server.

Example
<a href="sftp://example.com/file.zip">download</a>
Importance

Using the SFTP protocol in an HTML link, will cause the link to be opened with the default SFTP client, if there is one installed on the user's system. If there is no distinct SFTP client, this might be the browser or else the link will not work at all, i.e. the operating system will ask the user which program to use for opening links with this protocol. While this is valid, it can lead to unexpected user experiences, as further behaviour is fully dependent on the user's default SFTP client.

Operating Instruction

In order to avoid any kind of issue with SFTP clients, we suggest evaluating all instances of links that are using the SFTP protocol. We suggest that you provide an HTTPs download instead of a SFTP download whenever possible, so that the links are within the site or within the control of the webmaster.

<a> linking relative to base

Description

If links are found, that are relative to the base, the URL is flagged with this hint. Discover documents that contain links relative to the base with this report.

Examples

URL of the linking document:

http://example.com/folder/folder2/index.html

Examples relative to folder:

<a href="../page.html">link</a>
leads to
http://example.com/folder/page.html

<a href="folder/folder2/page.html">link</a>
leads to
http://example.com/folder/folder2/folder/folder2/page.html

Note: There are relative links that will not get detected by this hint. These are:

Relative to the host:

<a href="/folder/page.html">link</a>
leads to
http://example.com/folder/page.html

Relative to the protocol:

<a href="//example.net/page.html">link</a>
leads to
http://example.net/page.html

Fragment link / anchor links within the same document:

<a href="#top">link</a>
leads to
http://example.com/folder/folder2/index.html#top
Importance

The base is the URL that is used to resolve all relative links in the document. Links relative to the base may cause unexpected errors and behaviour. If errors occur, a search engine's ability to crawl the site might be harmed and user experience may suffer. In particular, this can happen if you are using links relative to the base by mistake.

Operating Instruction

While relative links are not a bad thing, we suggest that you only use links relative to the host / protocol or absolute links.

<a> links to fragment by name attribute on non-anchor

Description

The anchor contains a fragment link, but the target is defined by a name attribute and is not an anchor itself.

Example
<h1 name="#top">Headline</h1>
...
<a href="#top">Go to top/a>
Importance

Using the name attribute on non-anchor tags is not valid. Using the name attribute is only allowed for anchor tags. In addition, the name attribute has been deprecated since XHTML 1.0 and should not be used any more.

Operating Instruction

We suggest to remove all intances of the name attribute. You should use the id attribute instead.

<a> links to fragment by name attribute, not id

Description

The anchor contains a fragment link, like #top, but the target is defined by a name attribute. The name attribute is deprecated and won't be supported in upcoming versions of HTML.

Examples
<span name="top"></span>
...
<a href="#top">link</a>

<a name="top"></a>
...
<a href="#top">link</a>
Importance

The name attribute has been deprecated since XHTML 1.0 and should not be used anymore. In addition, it was only allowed on anchor tags.

Operating Instruction

We suggest to remove all instances of the name attribute. You should use the id attribute instead.

<a> links to fragment only

Description

The anchor tag contains a fragment-only link target. This report lists all URLs that use at least one <a> tag that is linking to a fragment-only link target.

Example
<a href="#top">Back to top</a>
Importance

Fragment links are relative links. Either they are relative to the current document URL or to the URL specified in the <base> tag, if there is a <base> tag.

If used with a <base> tag, fragment links refer to the base URL instead of the current document URL. This may lead to unexpected behaviour in user experience and crawling.

Operating Instruction

You may want to consider using full URL links instead of fragment links, especially when working with a <base> tag.

<a> links to fragment only, but <base> points to another URL

Description

An <a> element links to a fragment only, while there is a <base> pointing to another URL. Discover all URLs that contain fragment links along with a base tag pointing to another URL.

Example

Example for http://example.com/page.html:

<base href="http://example.com/page2.html">
...
<a href="#top">link</a>

Expected behaviour: Browser requests http://example.com/page.html#top Actual behaviour: Browser requests http://example.com/page2.html#top

Importance

Fragment links are relative to the URL defined in the <base> element. If the <base> element is pointing to another URL, this may lead to unexpected user experience and issues with the crawlability of the website if fragment-only links are used.

Operating Instruction

We suggest that you not use a <base> element if it is possible to avoid it. We also suggest using absolute links instead of fragment-only links.

<a> links to fragment that was not found

Description

The anchor contains a fragment link, like #top, but there was no corresponding id or name attribute found in the document.

This report helps to identify all URLs that contain fragment-only links, where the fragment is not present in the document.

Example
<a href="#top">link to fragment that does not exist</a>
Importance

If there are anchors to internal fragment-only links that are not present in the document, this may hurt the user experience and might result in worse user signals.

Operating Instruction

We suggest that you fix all occurrences of broken fragment-only links.

<a> more than 100 links

Description

If more than 100 unique links are found, the URL is flagged with this hint.

Example

Resources of the following categories often trigger this hint:

  • HTML Sitemaps
  • Link listing pages
  • Category pages
  • Archive pages
  • Filter navigation
Importance

Search engines suggest not to exceed a reasonable number of links in a single document. Too many links in a document affect the file size and usability of a document. 100 links on a webpage are not necessarily a problem. 100 links was the maximum number of links suggested by Google some years ago. However, Google refrained from communicating an exact number, because a reasonable number of links depends on the document and the context.

Operating Instruction

Evaluate the internal links in the documents found by this report. Consider removing links that don't add value and/or don't get clicked by users.

If this report contains a large percentage of all crawled URLs, you might consider removing links from elements that are present on all webpages, e.g. top navigation, sidebar, footer.

<a> no outgoing internal links

Description

If the HTML contains no internal links, the URL is flagged with this hint. Use this report to identify all occurences of documents that have no links to URLs on the same host.

Importance

Documents that are part of the internal link graph but don't link to other documents within the host are flow dead ends for PageRank inside your architecture. This may lead to structure issues as well as issues with the proper crawling of the website.

Operating Instruction

We suggest not to have parts of a website that don't contain internal links. Evaluate the reason why these documents are not linking to any other documents.

<a> no target=_blank to external

Description

If a hyperlink has a target to an external domain that does NOT include target=_blank, the URL is flagged with this hint. An external domain is one that does not belong to you and that is not included in the crawl.

Example

If you have the following hyperlink on https://example.com/index.html:

<a href="https://example.org/new.html>External Domain</a>

The result is a new tab is not opened in the browser, but instead, the content from the new.html page of example.org replaces your content (i.e. https://example.com/index.html) in the current browser tab.

To target a new tab, it needs to look like this:

<a href="https://example.org/new.html target="_blank">External Domain</a>
Importance

Using a value of _blank with the target attribute of the <a> tag can be useful because it opens a new browser tab, in which the user can view the new content. This keeps the current browser tab open with the current content so that you do not lose the user. From a marketing and sales perspective, you are sending customers away from your site when you do not open a new tab, which is not generally a good idea. So using target=_blank keeps your customers on your site, or at least keeps your site in a tab on their browser.

However, there are some downsides to opening a new browser tab. For example, it breaks the standard user experience since the Back button does not work in the new tab. It may also have an adverse impact on the accessibility of your site. See the W3C Working Group Note for more information.

Operating Instruction

We suggest that you use this report to identify pages that have external hyperlinks that do not use target=_blank and then develop a consistent policy regarding targeting, and re-run this report to enforce your policy.

<a> no target=_blank to your other domains

Description

If a hyperlink has a target to another domain that is included in the crawl and it does NOT include target=_blank, the URL is flagged with this hint.

Example

If you have the following hyperlink on https://example.com/index.html:

<a href="https://example.org/new.html">My Other Domain</a>

The result is a new tab is not opened in the browser, but instead, the content from the new.html page of example.org replaces your content (i.e. https://example.com/index.html) in the current browser tab.

To target a new tab, it needs to look like this:

<a href="https://example.org/new.html" target="_blank">My Other Domain</a>
Importance

Using a value of _blank with the target attribute of the <a> tag can be useful because it opens a new browser tab, in which the user can view the new content. This keeps the current browser tab open with the current content so that you do not lose the user. From a marketing and sales perspective, you are sending customers away from your site when you do not open a new tab, which is not generally a good idea. So using target=_blank keeps your customers on your site, or at least keeps your site in a tab on their browser.

However, there are some downsides to opening a new browser tab. For example, it breaks the standard user experience since the Back button does not work in the new tab. It may also have an adverse impact on the accessibility of your site. See the W3C Working Group Note for more information.

Operating Instruction

We suggest that you use this report to identify pages that have hyperlinks to other domains in your crawl that do not use target=_blank, and then develop a consistent policy regarding targeting, and re-run this report to enforce your policy.

<a> target=_blank to external

Description

If a hyperlink has a target to an external domain that includes target=_blank, the URL is flagged with this hint. An external domain is one that does not belong to you and that is not included in the crawl.

Example

If you have the following hyperlink on https://example.com/index.html:

<a href="https://example.org/new.html" target="_blank">External Domain</a>

The result is a new tab is opened in the browser and the content from the new.html page of external.org displays in the new tab.

To not open a new tab, just remove the target parameter:

<a href="https://external.org/new.html">External Domain</a>
Importance

Using a value of _blank with the target attribute of the <a> tag can be useful because it opens a new browser tab, in which the user can view the new content. This keeps the current browser tab open with the current content so that you do not lose the user. From a marketing and sales perspective, you are sending customers away from your site when you do not open a new tab, which is not generally a good idea. So using target=_blank keeps your customers on your site, or at least keeps your site in a tab on their browser.

On the other hand, there are some downsides to opening a new browser tab. For example, it breaks the standard user experience since the Back button does not work in the new tab. It may also have an adverse impact on the accessibility of your site. See the W3C Working Group Note for more information.

Operating Instruction

We suggest that you use this report to identify pages that have external hyperlinks that use target=_blank and then develop a consistent policy regarding targeting, and re-run this report to enforce your policy.

<a> target=_blank to same domain

Description

If a hyperlink to a URL in the same domain includes target=_blank, the source URL is flagged with this hint.

Example

If you have the following hyperlink on https://example.com/index.html:

<a href="https://example.com/new.html" target="_blank">Same Domain</a>

The result is a new tab is opened in the browser and the content from the new.html page of example.com displays in the new tab.

To not open a new tab, simply remove the target parameter:

<a href="https://example.com/new.html">Same Domain</a>
Importance

Using a value of _blank with the target attribute of the <a> tag can be useful because it opens a new browser tab, in which the user can view the new content. This keeps the current browser tab open with the current content so that you do not lose the user. From a marketing and sales perspective, you are sending customers away from your site when you do not open a new tab, which is not generally a good idea. So using target=_blank keeps your customers on your site, or at least keeps your site in a tab on their browser.

However, there are some downsides to opening a new browser tab. For example, it breaks the standard user experience since the Back button does not work in the new tab. It may also have an adverse impact on the accessibility of your site. See the W3C Working Group Note for more information.

Operating Instruction

We suggest that you use this report to identify pages that have hyperlinks to the same domain and that use target=_blank, and then develop a consistent policy regarding targeting, and re-run this report to enforce your policy.

<a> target=_blank to your other domains

Description

If a hyperlink has a target to another domain that is included in the crawl and it includes target=_blank, the URL is flagged with this hint.

Example

If you have the following hyperlink on https://example.com/index.html:

<a href="https://example.org/new.html" target="_blank">My Other Domain</a>

The result is a new tab is opened in the browser and the content from the new.html page of example.org displays in the new tab.

To not open a new tab, simply remove the target parameter:

<a href="https://example.org/new.html">My Other Domain</a>
Importance

Using a value of _blank with the target attribute of the <a> tag can be useful because it opens a new browser tab, in which the user can view the new content. This keeps the current browser tab open with the current content so that you do not lose the user. From a marketing and sales perspective, you are sending customers away from your site when you do not open a new tab, which is not generally a good idea. So using target=_blank keeps your customers on your site, or at least keeps your site in a tab on their browser.

However, there are some downsides to opening a new browser tab. For example, it breaks the standard user experience since the Back button does not work in the new tab. It may also have an adverse impact on the accessibility of your site. See the W3C Working Group Note for more information.

Operating Instruction

We suggest that you use this report to identify pages that have hyperlinks to other domains in your crawl that use target=_blank, and then develop a consistent policy regarding targeting, and re-run this report to enforce your policy.

<base> contains malformed or empty href

Description

A <base> tag was found, but its href attribute contains an invalid URL, or a URL that is neither HTTP nor HTTPS. The crawler falls back to using the document's URL as the base.

Examples

A base with an invalid protocol:

<base href="htp://example.com">

A base with a white space in the domain name:

<base href="http:// example.com">
Importance

The base tag defines the URL base for all relative links in the document. Using a malformed URL as base href can cause issues with crawling and accessing of relative links.

Using the base tag adds more complexity when parsing relative links. Poorly programmed crawlers might not understand the base tag at all and therefore show unexpected behaviour.

Operating Instruction

We suggest not to use the HTML base tag at all. Remove it if possible.

Note: If there are changes related to the base tag, all relative links in the document need to be checked and probably corrected.

<base> found

Description

If a base is set in the HTML, the URL is flagged with this hint. If a base is set, all relative links are relative to the base.

Example
<base href="http://example.com/directory/">
Importance

The base tag defines the URL base for all relative links in a document. Mistakes in usage of the base tag will lead to user experience and crawling issues when relative links are used.

In addition, the use of the base tag often results in problems for poor HTML parsers and poorly programmed robots.

Operating Instruction

We suggest that you not use the base tag at all. Remove base tags if possible.

We also suggest that you use absolute URLs.

<base> found more than once and differs

Description

More than one <base> directive are found with a differing href attribute value.

Examples
<base href="http://example.com/">
<base href="http://example.com/folder/">

Note: The following base directives resolve to the same URL and would therefore not trigger this hint:

Base directives on http://example.com/page.html

<base href="http://example.com/">
<base href="/">
Importance

The base tag defines the URL base for all relative links in a document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and the user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> href contains a path only

Description

The <base> tag's href attribute contains a path, not an absolute URL. While this is technically allowed, it is not supported by Internet Explorer as of version 8.

Examples

Relative path by mistake:

<base href="example.com/">

Relative path on purpose:

<base href="/folder/">
Importance

The base tag defines the URL base for all relative links in the document. Mistakes in usage of the base tag might lead to issues with crawling when using relative links in the document. They will also result in issues with Internet Explorer as of version 8.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If you make changes to the base tag, all relative links in the document need to be checked and probably corrected.

<base> is same as URL

Description

A <base> tag was found, but it points to the same URL, thereby rendering itself useless.

Example

HTML base on http://example.com/page.html

<base href="http://example.com/page.html">
Importance

The base tag defines the URL base for relative links in a document. There is no point in using the current URL as base href as it is the same as if the base tag isn't used at all, rendering it useless.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

<base> occurs more than once

Description

More than one <base> tag was found. The Audisto Crawler uses the first valid annotation found for link resolving. Use this report to find all URLs on the crawled website that contain more than one <base> tag.

Example
<head>
...
<base href="http://example.com/">
<base href="http://example.com/">
...
</head>
Importance

The base tag defines the URL base for all relative links in the document. Having more than one base tag is invalid. This may result in issues with relative links that might impact search engines and user experience on the website.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible. If there are changes regarding the base tag, all relative links in the document need to be checked and probably corrected.

<base> points to other URL

Description

A <base> tag was found and it points to another URL.

Example

Base on http://example.com/page.html:

<base href="http://example.com/page2.html">
Importance

The base tag defines the URL base for relative links in a document. Using a base tag together with fragment only links will make those links pointing to the specific anchor on the URL of the base tag.

If this is unintended, it leads to issues with the user experience and crawling.

Operating Instruction

We suggest that you not use the HTML base tag at all. Remove it if possible.

Note: If the base tag is removed, all relative links in the document need to be checked and probably corrected.

Linked as a resource

Description

The element was linked to as a resource, like an image, a CSS file, a script, or the source of a frame. Discover all URLs on the crawled website, that are linked to as a resource.

Examples
<script  type="text/javascript" src="http://example.com/script.js">

<img src="http://example.com/image.png">

<link rel="stylesheet" href="http://example.com/stylesheet.css">

<iframe src="http://example.com/iframe.html">
Importance

Resources can cause problems with performance and user experience. In some case they can even lead to legal issues, e.g. copyrighted images are used and the usage license has expired.

It is useful to keep track of the resources used on a website to discover and prevent usage of outdated, or otherwise unwanted, scripts, images, stylesheets, frames.

Operating Instruction

We suggest that you keep an eye on the resources used on the crawled website to avoid issues of any kind related to the use of unwanted resources.

Linking: Follow link to a so far no-follow URL

Description

A follow link was found, linking to a URL that was previously linked "nofollow" only.

The source URL will be flagged with this hint and the target URL will be flagged as "No-Follow linking revoked later on".

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

A single follow link will allow the target URL to be crawled, even though it was previously forbidden by nofollow links.

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.

Note: In a document that uses the robots directive "nofollow", a link with "rel=follow" is identified as "follow".

Operating Instruction

If you encounter this hint when crawling your website, we suggest that you:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow link to a follow URL

Description

A link with rel="nofollow" was found, linking to a URL that was previously linked "follow" already.

This report helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html follow
http://example.com/page2.html http://example.com/target.html nofollow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation:

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow link to network URL

Description

If a hyperlink is set to nofollow and has a target within the scope of the crawl (network), the URL is flagged with this hint.

Example

If you have the following hyperlink on https://example.com/:

<a rel="nofollow" href="https://example.com/nofollow.html">Nofollow Link</a>
Importance

The usage of nofollow links for internal links will weaken your website. Before 2008 internal nofollow links were used for pagerank sculpting, however Google change its behaviour and now nofollow links do not pass anchortext and PageRank.

Now let’s talk about the rel=nofollow attribute. Nofollow is method (introduced in 2005 and supported by multiple search engines) to annotate a link to tell search engines “I can’t or don’t want to vouch for this link." In Google, nofollow links don’t pass PageRank and don’t pass anchortext [*].

So what happens when you have a page with “ten PageRank points" and ten outgoing links, and five of those links are nofollowed? Let’s leave aside the decay factor to focus on the core part of the question. Originally, the five links without nofollow would have flowed two points of PageRank each (in essence, the nofollowed links didn’t count toward the denominator when dividing PageRank by the outdegree of the page). More than a year ago, Google changed how the PageRank flows so that the five links without nofollow would flow one point of PageRank each. -- Matt Cutts, June 15, 2009

In addition nofollow links are no guarantee that a URL will not be crawled. A single internal or external follow link can result in the URL being crawled.

Operating Instruction

Do not use the nofollow attribute for internal links. If you want to prevent crawling of URLs use the robots.txt instead. If you want to mask outgoing links or tracking consider using techniques like the PRG pattern.

Linking: nofollow linking revoked later on

Description

A URL that has been linked nofollow has later (i.e. on the same or a deeper level) been linked to as follow. By removing the initial nofollow directive, this URL may be lifted up some levels.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behavior depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Robots: Specified more than once

Description

Robots directives for a single URL were specified more than once. Use this report to identify all instances of multiple robot definitions.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index">
<meta name="robots" content="follow">

Robots directives in X-Robots-Tag and meta tag:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, follow
...

<meta name="robots" content="index, follow">
Importance

More than one instance of a robots directives can lead to conflicting definitions or in a directive being left out. This may result in a range of issues with privacy, indexing in general and crawl budget, depending on the situation.

Operating Instruction

Use only one way to specify the robots directive.

Robots: follow

Description

The URL is set to follow by a robots directive, either by the robots meta tag or the X-Robots-Tag header. If there are no robots directives specified, the URL will act as if index, follow was specified.

Discover all occurrences of follow robots directives or meta tags with this hints report.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, follow"

Robots directives in HTTP-header X-Robots-Tag:

X-Robots-Tag: index, follow
Importance

If a URL is set to follow and there are no robots.txt restrictions interfering, search engines will usually follow all links in the document and crawl the target URLs.

Operating Instruction

We suggest that you use the follow directive for robots by default.

Robots: nofollow

Description

The webpage is set to "nofollow" by a robots directive, either by the robots meta tag or the X-Robots-Tag header.

Find all instances of "nofollow" usage that have been discovered crawling your website.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, nofollow">

X-Robots-Tag in HTTP-header:

X-Robots-Tag: nofollow
Importance

Using the "nofollow" directive for a URL tells crawlers not to follow any links in the document. This also prevents the PageRank flow to the target URLs. By using nofollow for internal links, you weaken your site.

Using the "nofollow" robots directive affects:

  • PageRank flow
  • Website structure
  • Crawling
Operating Instruction

URLs that use the nofollow robots directive should be evaluated on a regular basis to prevent possible issues. Nofollow should not be used for internal linking.

Instead, you should consider setting the document to robots follow.

Robots: nofollow differs across specifications

Description

There is more than one source for robots, either a robots meta tag or a X-Robots-Tag header, and at least one specifies "nofollow" while another does not.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index, follow">

Note: a more subtle way to produce this error would be conflicting definitions by omiting parts of the directive, such as:

<meta name="robots" content="index, nofollow">
<meta name="robots" content="index">

Robots directives in X-Robots-Tag and meta tag differ:

HTTP/1.1 200 OK
Date: Tue, 16 October 2015 10:01:33 GMT
X-Robots-Tag: index, nofollow
...

<meta name="robots" content="index, follow">
Importance

The "nofollow" robots directive tells crawlers not to follow the links in a document. This can be used on purpose to prevent search engines from crawling the linked URLs.

Having conflicting definitions is unconclusive. Search engines will usually use the most restrictive directive they find. The Audisto Crawler adapts this behaviour.

Operating Instruction

Use only one way to specify the robots nofollow directive.

URL too long for some browsers

Description

If a URL longer than 2,000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL, you should stay below 2,000 characters to be accessible by a large number of clients and web applications.