Audisto Website Structure Checker

How to detect structure related issues on your website

The website's structure is influencing the user experience, the crawling and ultimately the ranking of a website. It is one of the most important goals of website optimization to achieve a structure that provides easy crawlability and decent user experience, which results very likely in better appearance in search results as well as better rankings in search results.

With the Audisto structure hints, you can identify possible issues with a website's structure.

Example: Audisto Website Structure Check with the structure hint reports for the current crawl

Example: Audisto Website Structure Check with the structure hint reports for the current crawl

Here is the list of all specific hints related to a website's structure, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

<a> more than 100 links

Description

If more than 100 unique links are found, the URL is flagged with this hint.

Example

Resources of the following categories often trigger this hint:

  • HTML Sitemaps
  • Link listing pages
  • Category pages
  • Archive pages
  • Filter navigation
Importance

Search engines suggest not to exceed a reasonable number of links in a single document. Too many links in a document affect the file size and usability of a document. 100 links on a webpage are not necessarily a problem. 100 links was the maximum number of links suggested by Google some years ago. However, Google refrained from communicating an exact number, because a reasonable number of links depends on the document and the context.

Operating Instruction

Evaluate the internal links in the documents found by this report. Consider removing links that don't add value and/or don't get clicked by users.

If this report contains a large percentage of all crawled URLs, you might consider removing links from elements that are present on all webpages, e.g. top navigation, sidebar, footer.

<a> no outgoing internal links

Description

If the HTML contains no internal links, the URL is flagged with this hint. Use this report to identify all occurences of documents that have no links to URLs on the same host.

Importance

Documents that are part of the internal link graph but don't link to other documents within the host are flow dead ends for PageRank inside your architecture. This may lead to structure issues as well as issues with the proper crawling of the website.

Operating Instruction

We suggest not to have parts of a website that don't contain internal links. Evaluate the reason why these documents are not linking to any other documents.

<form> method is GET

Description

If a form using the GET method is found, the URL is flagged with this hint. This report shows all URLs on the crawled website that contain forms, which use the GET method.

Example
<form method="GET" action="/submit.php">
...
</form>
Importance

If forms are submitted with the GET method, the form input data will become part of the URL that is requested. They are accessible to everyone who knows the exact URL, e.g. user, bots and search engines. This may lead to unexpected direct requests to these URLs.

If GET URLs are unexpectedly called out of context, it may result in waste of crawl budget and issues with the crawl rate. Furthermore, the GET method is defined as cacheable by default. Any URL generated by a form that uses the GET method may therefore be cached if not indicated otherwise, e.g. by status code or cache policy.

It is also noteworthy, that any form that uses the GET method along with freely definable input fields without restrictions, allows the generation of an unlimited number of unique URLs. If these get crawled and indexed, this might lead to issues with the crawl budget as well as with the indexing budget. This situation can be exploited for negative SEO.

Operating Instruction

If it is not your intention to generate URLs that can be accessed out of the form context, switch the form method to POST. Internal search forms often trigger this hint. You might consider blocking bots in the robots.txt from crawling the form action URL.

To avoid user experience problems after switching to the POST method, you might want to utilize the PRG-pattern. This avoids problems when reloading the document. If the PRG-pattern is not used, the browser will ask the user to resubmit the form, if the URL is reloaded.

<frameset> found, but <noframes> missing

Description

A <frameset> was found, but a <noframes> is missing. Discover all URLs on the crawled website that contain a frameset without a <noframes> tag.

Example
<frameset cols="10%, *, 25%">
<frame src="https://example.com/frame1.html">
<frame src="https://example.com/frame2.html">
<frame src="https://example.com/frame3.html">
</frameset>
Importance

Framesets are outdated and a potential error source in terms of search engine optimization. A frameset consists of several frames. Each of the frames has a different URL. While search engines try to associate the framed content with the URL containing the frames, this is not guaranteed. Frames do not correspond with the conceptual model of the web, where one URL displays only one resource.

The <noframes> tag allows you to provide content for clients that do not support frames. Back when framesets were used a lot more often, it was suggested to use the <noframes> area to provide a noframed version or to provide a link to a noframed version.

With framesets, search engines might index the individual frame URLs and list them in the search results. This may result in users accessing frame URLs out of the frameset context.

This may lead to:

  • unexpected user experience due to incomplete functionality and missing context
  • unexpected appearance of the site in search results
Operating Instruction

We suggest that you refrain from using framesets.

If framesets need to be used, make sure to offer a <noframes> tag that includes the markup for a non-framed version of the document or a link to a non-framed version of the document. It might be a possibility to set the robots directive for the frame URLs to noindex, in order to avoid frame URLs appears in the search results.

Recrawl the site after fixing the frameset. By default, the Audisto Crawler will not follow the source attributes of frames.

<link rel=canonical> points to other URL

Description

If the canonical element is found and it points to a different URL, the URL is flagged with this hint.

Use this report to identify all instances of canonical elements pointing to other URLs.

Examples

In this first example, the canonical link element points to the SSL version of the document, where the canonical link element is on http://example.com/page.html:

<link rel="canonical" href="https://example.com/page.html">

In this second example, the canonical link element points to a URL without a GET parameter, where the canonical link element is on http://example.com/page.html?a=1:

<link rel="canonical" href="http://example.com/page.html">
Importance

The canonical link URL specifies a preferred version of a document that is available on more than one URL at a time.

By using a canonical link element pointing to another URL, you are telling search engines to prefer the target URL in search results.

If URLs that are not supposed to be shown in search results are part of the internal link graph, this can lead to wasted crawl budget.

Operating Instruction

You might consider changing internal links to point directly to the preferred version of the document to save crawl budget.

You might also want to evaluate if multiple URLs for one document are necessary at all.

<meta refresh> found

Description

If a meta refresh is found, the URL is flagged with this hint. Discover all HTML documents on the crawled website that contain a meta refresh.

Example
<meta http-equiv="refresh" content="5; URL=http://www.example.com/">
Importance

A meta refresh is a client side redirect that triggers a GET request after a given amount of time. It is sometimes used to automatically forward users to another URL. This method has been used widely to manipulate search engines, so search engines might misinterpret usage of a meta refresh as a sneaky redirect. A meta refresh with a delay of more than one second also violates the Web Content Accessibility Guidelines. Using a meta refresh might result in a poor user experience as well as issues with rankings in search engines.

Operating Instruction

You might consider replacing meta refresh with an HTTP-redirect using a 301 or 302 status code, or even a JavaScript client-side redirect, depending on the requirements.

Linking: Follow link to a so far no-follow URL

Description

A follow link was found, linking to a URL that was previously linked "nofollow" only.

The source URL will be flagged with this hint and the target URL will be flagged as "No-Follow linking revoked later on".

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

A single follow link will allow the target URL to be crawled, even though it was previously forbidden by nofollow links.

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.

Note: In a document that uses the robots directive "nofollow", a link with "rel=follow" is identified as "follow".

Operating Instruction

If you encounter this hint when crawling your website, we suggest that you:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow link to a follow URL

Description

A link with rel="nofollow" was found, linking to a URL that was previously linked "follow" already.

This report helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html follow
http://example.com/page2.html http://example.com/target.html nofollow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behaviour depending on your situation:

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Linking: nofollow link to network URL

Description

If a hyperlink is set to nofollow and has a target within the scope of the crawl (network), the URL is flagged with this hint.

Example

If you have the following hyperlink on https://example.com/:

<a rel="nofollow" href="https://example.com/nofollow.html">Nofollow Link</a>
Importance

The usage of nofollow links for internal links will weaken your website. Before 2008 internal nofollow links were used for pagerank sculpting, however Google change its behaviour and now nofollow links do not pass anchortext and PageRank.

Now let’s talk about the rel=nofollow attribute. Nofollow is method (introduced in 2005 and supported by multiple search engines) to annotate a link to tell search engines “I can’t or don’t want to vouch for this link." In Google, nofollow links don’t pass PageRank and don’t pass anchortext [*].

So what happens when you have a page with “ten PageRank points" and ten outgoing links, and five of those links are nofollowed? Let’s leave aside the decay factor to focus on the core part of the question. Originally, the five links without nofollow would have flowed two points of PageRank each (in essence, the nofollowed links didn’t count toward the denominator when dividing PageRank by the outdegree of the page). More than a year ago, Google changed how the PageRank flows so that the five links without nofollow would flow one point of PageRank each. -- Matt Cutts, June 15, 2009

In addition nofollow links are no guarantee that a URL will not be crawled. A single internal or external follow link can result in the URL being crawled.

Operating Instruction

Do not use the nofollow attribute for internal links. If you want to prevent crawling of URLs use the robots.txt instead. If you want to mask outgoing links or tracking consider using techniques like the PRG pattern.

Linking: nofollow linking revoked later on

Description

A URL that has been linked nofollow has later (i.e. on the same or a deeper level) been linked to as follow. By removing the initial nofollow directive, this URL may be lifted up some levels.

This reports helps to identify inconsistency in usage of rel=nofollow.

Example
Source URL Target URL Link Relation
http://example.com/page.html http://example.com/target.html nofollow
http://example.com/page2.html http://example.com/target.html follow
Importance

Inconsistency in usage of rel=nofollow can lead to unexpected behavior depending on your situation.

  • If the target URL is supposed to be recognized by search engines, nofollow linking will weaken the URL. Removing nofollow from internal links can lead to an uplift in ranking.
  • If the target URL is not supposed to be recognized by search engines, a single follow link will allow the target URL to be crawled.
Operating Instruction

If you encounter this hint when crawling your website, we suggest to:

  • Evaluate if rel=nofollow is needed.
  • Make sure to use follow or nofollow consistently.

Robots: follow

Description

The URL is set to follow by a robots directive, either by the robots meta tag or the X-Robots-Tag header. If there are no robots directives specified, the URL will act as if index, follow was specified.

Discover all occurrences of follow robots directives or meta tags with this hints report.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, follow"

Robots directives in HTTP-header X-Robots-Tag:

X-Robots-Tag: index, follow
Importance

If a URL is set to follow and there are no robots.txt restrictions interfering, search engines will usually follow all links in the document and crawl the target URLs.

Operating Instruction

We suggest that you use the follow directive for robots by default.

Robots: nofollow

Description

The webpage is set to "nofollow" by a robots directive, either by the robots meta tag or the X-Robots-Tag header.

Find all instances of "nofollow" usage that have been discovered crawling your website.

Examples

Robots meta tag in HTML <head>:

<meta name="robots" content="index, nofollow">

X-Robots-Tag in HTTP-header:

X-Robots-Tag: nofollow
Importance

Using the "nofollow" directive for a URL tells crawlers not to follow any links in the document. This also prevents the PageRank flow to the target URLs. By using nofollow for internal links, you weaken your site.

Using the "nofollow" robots directive affects:

  • PageRank flow
  • Website structure
  • Crawling
Operating Instruction

URLs that use the nofollow robots directive should be evaluated on a regular basis to prevent possible issues. Nofollow should not be used for internal linking.

Instead, you should consider setting the document to robots follow.