Audisto Website Structure Checker

How to detect structure related issues on your website

The website's structure is influencing the user experience, the crawling and ultimately the ranking of a website. It is one of the most important goals of website optimization to achieve a structure that provides easy crawlability and decent user experience, which results very likely in better appearance in search results as well as better rankings in search results.

With the Audisto structure hints, you can identify possible issues with a website's structure.

Example: Audisto Website Structure Check with the structure hint reports for the current crawl

Example: Audisto Website Structure Check with the structure hint reports for the current crawl

Here is the list of all specific hints related to a website's structure, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> more than 100 links

Description

If more than 100 unique links are found, the URL is flagged with this hint.

Example

Resources of the following categories often trigger this hint:

  • HTML Sitemaps
  • Link listing pages
  • Category pages
  • Archive pages
  • Filter navigation
Importance

Search engines suggest not to exceed a reasonable number of links on a single document. Too many links in a document affect the file size and usability of a document. 100 links on URL are not necessarily a problem. 100 links was the maximum number of links suggested by Google some years ago. However, Google refrained from communicating an exact number, because it depends on the document and context what exactly is a reasonable number of links.

Operating Instruction

Evaluate the internal links in the documents found by this report. Consider removing links that don't add value and/or don't get clicked by users.

If this report contains a large percentage of all crawled URLs, you might consider removing links from elements, that are present on all URLs, e.g. top navigation, sidebar, footer.

<a> no outgoing internal links

Description

If the HTML contains no internal links, the URL is flagged with this hint. Use this report to identify all occurences of URLs that have no links to URLs on the same host.

Importance

URLs that are part of the internal link graph but don't link to other documents within the host, are dead ends to flow of PageRank inside your architecture. This may lead to structure issues as well as issues with the proper crawling of the website.

Operating Instruction

We suggest not to have parts of a website that don't contain internal links. Evaluate the reason why these documents are not linking to any other documents.

<form> method is GET

Description

If a form using the GET-method is found, the URL is flagged with this hint. This report shows all URLs on the crawled website, which contain forms, that are using the GET-method.

Example
<form method="GET" action="/submit.php">
...
</form>
Importance

If forms are submitted with the GET method, the form input data will become part of the URL that is requested. They are accessible to everyone who knows the exact URL, e.g. user, bots and search engines. This may lead to unexpected direct requests to these URLs.

If GET URLs are unexpectedly called out of context, it may result in waste of crawl budget and issues with the crawl rate. Furthermore, the GET method is defined as cacheable by default. Any URL generated by a form that uses the GET method may therefore be cached if not indicated otherwise, e.g. by status code or cache policy.

It is also noteworthy, that any form that uses the GET method along with free defineable input fields without restrictions, allows to generate an unlimited number of unique URLs. If these get crawled and indexed, this might lead to issues with the crawl budget as well as with the indexing budget. This situation can be exploited for negative SEO.

Operating Instruction

If it is not your intention to generate URLs that can be acessed out of the form context, switch the form method to POST. Internal search forms often trigger this hint. You might consider blocking bots in the robots.txt from crawling the form action URL.

To avoid user experience problems after switching to the POST method, you might want to utilize the PRG-pattern. This avoids problems when reloading the document. If the PRG pattern is not used, the browser will ask the user to resubmit the form, if the URL is reloaded.

<frameset> found, but <noframes> missing

Description

A <frameset> was found, but a <noframes> is missing. Discover all URLs on the crawled website, that contain a frameset without a <noframes> tag.

Example
<frameset cols="10%, *, 25%">
<frame src="https://example.com/frame1.html">
<frame src="https://example.com/frame2.html">
<frame src="https://example.com/frame3.html">
</frameset>
Importance

Framesets are outdated and a potential error source in terms of search engine optimization. A frameset consists of several frames. Each of the frames has a different URL. While search engines try to associate the framed content with the URL containing the frames, this is not guaranteed. Frames do not correspond with the conceptional model of the web, where one URL displays only one resource.

The <noframes> tag allows to provide content for clients that do not support frames. Back when framesets were used a lot more often, it was suggested to use the <noframes> area to provide a noframed version or to provide a link to a noframed version.

With framesets, search engines might index the individual frame URLs and list them in the search results. This may result in users accessing frame URLs out of the frameset context.

This may lead to

  • unexpected user experience due to incomplete functionality and missing context
  • unexpected appearance of the site in search results
Operating Instruction

We suggest to refrain from using framesets.

If framesets need to be used, make sure to offer a <noframes> tag that includes the markup for a non framed version of the document or a link to a noframed version of the document. It might be a possibility to set the robots directive for the frame URLs to noindex, in order to avoid frame URLs to appear in the search results.

Recrawl the site after fixing the frameset. By default, the Audisto Crawler will not follow the source attributes of frames.

<link rel="canonical"> points to other URL

Description

If the canonical element is found and points to a different URL, the URL is flagged with this hint.

Use this report to identify all instances of canonical elements pointing to other URLs.

Examples

The canonical link element points to the SSL version of the document:

Canonical link element for http://example.com/page.html

<link rel="canonical" href="https://example.com/page.html">

The canonical link element points to a URL without GET-parameter:

Canonical link element for http://example.com/page.html?a=1

<link rel="canonical" href="http://example.com/page.html">
Importance

The canonical link URL specifies a prefered version of a document that is available on more than one URL at a time.

By using a canonical link element pointing to another URL, you are telling search engines to prefer the target URL in search results.

If URLs that are not supposed to be shown in search results are part of the internal link graph, this can lead to waste of crawl budget.

Operating Instruction

You might consider changing internal links to point directly to the prefered version of the document to save crawl budget.

You might also want to evaluate if multiple URLs for one document are necessary at all.

<meta refresh> found

Description

If a meta-refresh is found the URL is flagged with this hint. Discover all HTML documents on the crawled website, that contain a meta refresh.

Example
<meta http-equiv="refresh" content="5; URL=http://www.example.com/">
Importance

A meta refresh is a client side redirect that triggers a GET request after a given time. It is sometimes used to automatically forward users to another URL. This method has been used widely to manipulate search engines, so these might misinterpret usage of a meta refresh as a sneaky redirect. A meta refresh with a delay of more than one second also violates the Web Content Accessibility Guidelines. Using a meta refresh might result in bad user experience and issues with rankings in search engines.

Operating Instruction

You might consider replacing it with a HTTP-redirect using a 301 or 302 status code or even a JavaScript client side redirect, depending on requirements.

Linking: Follow link to a so far no-follow URL

Description

A follow link was found, linking to a URL that was previously linked "nofollow" only.

The linking URL will be flagged with this hint and the target URL will be flagged as "No-Follow linking revoked later on".

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

A single follow link will allow the target URL to be crawled, even though it was previously forbidden by nofollow links.

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.

Note: In a document that is using the robots directive "nofollow" a link with "rel=follow" is identified as "follow".

Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Linking: Nofollow link to a follow URL

Description

A link with rel="nofollow" was found, linking to a URL that was previously linked "follow" already.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html follow
http://example.com/page2.html http://example.com/target.html nofollow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Linking: Nofollow linking revoked later on

Description

A URL that has been linked no-follow has later - that is on the same or a deeper level - been linked to as follow. By removing the initial no-follow directive, this URL may be lifted up some levels.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Linking URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed
  • Make sure to use follow or nofollow consistently

Robots: follow

Description

The URL is set to "follow" by a robots directive, either per robots meta tag or the X-Robots-Tag header. If there are no robots directives specified, the URL will regarded as if "index, follow" was specified.

Discover all occurrences of "follow" robots directives or meta tags with this hints report.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, follow"

Robots directives in HTTP-header X-Robots-Tag

X-Robots-Tag: index, follow
Importance

If a URL is set to "follow" and there are no robots.txt restrictions interfering, search engines will usually follow all links in the document and crawl the target URLs.

Operating Instruction

We suggest to use the follow directive for robots by default.

Robots: nofollow

Description

The site is set to "nofollow" by a robots directive, either per robots meta tag or the X-Robots-Tag header.

Find all instances of “nofollow" usage that have been discovered crawling your site.

Examples

Robots meta tag in HTML <head>

<meta name="robots" content="index, nofollow">

X-Robots-Tag in HTTP-header

X-Robots-Tag: nofollow
Importance

Using the “nofollow" directive for a URL tells crawlers not to follow any links in the document. This also prevents PageRank flow to the target URLs. By using nofollow for internal links, you weaken your site.

Using the "nofollow" robots directive affects

  • PageRank flow
  • Website structure
  • Crawling
Operating Instruction

URLs that use the nofollow robots directive should be evaluated on a regular base to prevent possible issues. Nofollow should not be used for internal linking.

Instead you should consider setting the document to robots follow.