Audisto URL Checker

How to detect URL related issues on your website

The URL structure can be a deciding factor for the success of a SEO campaign on a website. Having poorly formed URLs might result in issues with crawling, duplicate content and user experience.

This hints section helps to identify common issues with the URLs of a website.

Example: Audisto URL Check with the URL hint reports for the current crawl

Audisto URL Check with the URL hint reports for the current crawl

Here is the list of all specific hints related to URLs, that can be identified with the help of the Audisto Crawler.

Table Of Content

Hints

URL contains escaped characters in path

Description

If the URL contains escaped chars (non-ascii) in the path, it is flagged with this hint. Escaped chars are detected by looking for a % char in the path. Find all URLs with non-ascii characters in the path by using this report.

Examples
Displayed URL Properly Escaped URL
http://example.com/entré.html http://example.com/entr%C3%A9.html
http://example.com/münchen.html http://example.com/m%C3%BCnchen.html
Importance

If non-ascii chars are used within the path of a URL, they will be URL-encoded/escaped by web clients. RFC 3986 states, that non-ascii characters must first be encoded according to UTF-8 and then percent-encoded. This may lead to unexpected issues if a web client does not follow this encoding procedure correctly.

This often happens in systems that don't use UTF-8 by default.

Operating Instruction

We suggest to evaluate if URLs were escaped correctly. We also suggest not to use non-ascii characters in URLs if possible.

URL has & in path

Description

If the URL has an ampersand (&) in the path, this hint in triggered.

Example
http://example.com/seo&more.html
http://example.com/&a=1?b=1
Importance

An ampersand is a reserved character that is distinguishable from other data within a URL . It can be used as a delimiter. If it is not used as a delimiter, it needs to be URL-encoded as "&".

Since ampersands are usually used in the query part of a URL , an ampersand in the path may indicate problems in building a correct URL.

Operating Instruction

Evaluate all occurrences of ampersand in the path. If the ampersand is not intended to be a delimiter, use proper URL-encoding.

URL has // in path

Description

If the URL contains two consecutive slashes, it is flagged with this hint.

Example
http://example.com//page.html
http://example.com/directory//page.html
Importance

Two consecutive slashes in a row are valid but usually not wanted in a URL. Any occurence might indicate issues with relative linking and/or the URL base. This may lead to issues with duplicate content if the CMS delivers the same content, e.g. for http://example.com//page.html and http://example.com/page.html.

Operating Instruction

We suggest not to use consecutive slahes. Analyze all occurences of consecutive slashes and fix the reason why they occur.

URL has 3 or more query parameters

Description

If the URL contains a query string with 3 or more parameters, it is flagged with this hint.

Example
http://example.com/page.html?a=1&b=2&c=3&d=4
Importance

URLs with more than 3 parameters are considered to be highly dynamic. Highly dynamic URLs often indicate a poor quality for search engines, e.g. URLs for filter combinations in faceted search. Changing the order of the parameters can lead to a very high number of URLs. The content shown will usually be similar or duplicate content.

The result can be crawl budget issues as well as serious issues with duplicate content.

Operating Instruction

Use this report to find highly dynamic URLs on the crawled website. You might want to reduce the number of parameters within URLs.

Consider reducing the number of URLs with GET-parameters by using the PRG-pattern.

URL has ? in query

Description

If the URL has a question mark (?) in the query, this hint in triggered.

Example
http://example.com/?foo=bar?
http://example.com/?foo=bar?foo=bar
Importance

The query starts with a question mark. Using a second question mark in the URL is valid, but you may refrain from doing so, because poorly implemented clients might handle such data incorrectly.

In addition queries with two question marks often indicate technical problems on building URLs.

Operating Instruction

We suggest evaluating the reasons for using more than one question mark in a query. Even with valid usage you might consider taking a different technical approach that does not produce two question marks.

URL has more than 115 characters

Description

If the URL is more than 115 characters long, it is flagged with this hint.

Example
http://subdomain.example.com/folder1/folder2/folder3/folder4/folder5/folder6/very_long_page_filename.html?a=1&b=2&c=3&d=4
Importance

Long URLs are hard to read and often not fully or properly displayed, e.g. snippets in search results, posts in bulletin boards or social media websites. If a URL is too long, it might get shortened and not be fully displayed.

115 was the maximum count of characters, that Google displayed in their snippets in search results some time ago. As of now, it is not a fixed number of characters any more, but instead a pixel length is used by Google.

Operating Instruction

If you encounter occurences of this hint, we suggest to utilize shorter URLs so they can be properly displayed.

This may be done by reducing the number of:

  • GET-parameters
  • Folders in the path
  • Characters in filename
  • Using IDs instead of speaking URLs

URL has non-ascii, non-lowercase elements in path

Description

If the URL contains chars which are non-ascii or non-lowercase, the URL is flagged with this hint. This hint has been split into two distinct hints as of Audisto version 0.9.92 and is therefore not triggered any further. See the other hints for more advanced insights.

URL has non-lowercase elements in path

Description

If the URL contains non lowercase characters, it is flagged with this hint. Use this report to identify all occurences of non-lowercase characters in the path of a URL.

Example
http://example.com/Page.html
Importance

URLs that contain non lower case elements, are often a resource for errors. If an application does not expect a non-lower-case URL, it might automatically convert it to all lower case. This might cause issues with duplicate content or accessibility (404 status codes) depending on wether the webserver handles URLs case sensitive or not.

Operating Instruction

We suggest to stick to lower case characters in paths.

URL has query

Description

If the URL contains a query string, it is flagged with this hint. This report identifies all URLs that contain a query string.

Example
http://example.com/foo/?a=1&b=2
Importance

Query strings usually contain dynamic name/value pairs, that might affect the content returned. Use cases for parameters are:

  • Sorts
  • Narrows
  • Specifies
  • Translates
  • Paginates
Operating Instruction

We suggest to use this report to get an overview over GET-parameter usage on the crawled website. We also suggest to keep the number of used parameters at a minimum.

URL has repetitive elements in path

Description

If the URL contains repeating elements like /foo/foo/ or /foo/bar/foo/bar/ in path, it triggers this hint. Use this report to identify all occurences of repeating elements in URLs.

Examples
http://example.com/a/a/
http://example.com/a/b/a/b/

This hint does not find the pattern:

/a/b/c/a/b/
Importance

Repetitive path segments can be a hint for issues with relative URLs as well as for a poor folder structure.

Operating Instruction

We suggest to re-evaluate your folder structure based on the URLs shown in this report. Try to avoid repeating folder names in different hierarchic levels if possible.

URL has repetitive parameters in query

Description

If the URL contains repeating parameters like ?a=1&a=1 or ?b=1&b=2 in query, it triggers this hint. Use this report to identify all occurences of directly linked URLs with repetitive parameters in the query.

Example
http://example.com/foo/?a=1&a=1
Importance

While repetitive parameters do not make a URL invalid, software might handle this kind of URLs in different ways, depending on implementation. Sometimes repetitive parameters are consolidated by name. This might lead to a loss of the information if values differ. If values do not differ, it might be safe to consolidate parameters by name.

Repetitive parameters also create unnecessarily long URLs and might indicate an issue with the software that generates the GET-parameters.

Operating Instruction

We suggest to check for reasons that might cause repetitive usage of GET-parameters and fix it to avoid unexpected behaviour of software parsing or handling the URL.

URL has repetitive parameters in query, values differ

Description

If the URL contains repeating parameters in query, and the values are different, like ?a=1&a=2, it triggers this hint. Use this report to identify all occurences of directly linked URLs with repetitive query parameters that have differing values.

Example
http://example.com/foo/?a=1&a=2
Importance

While repetitive parameters do not make a URL invalid, software might handle this kind of URLs in different ways, depending on implementation. If repetitive parameters with differing values are consolidated by name, this leads to a loss of the information. If the values differ for repetitive parameters, it might also indicate a problem with the logic of the software that generates the URLs.

Repetitive parameters create unnecessarily long URLs and might indicate issues with the software that generates the GET-parameters.

Operating Instruction

We suggest to check for reasons that might cause the usage of repetitive parameters with different values and fix the underlying issues.

URL too long for some browsers

Description

If a URL longer than 2000 characters is encountered, it is flagged with this hint.

Example

Long URLs are often generated dynamically in scenarios like:

  • a form posts data from input fields or a textarea via GET-method to the form action URL
  • GET-parameters from complex filter combinations in faceted search
Importance

Long URLs might cause problems.

Some browsers are unable to handle URLs of this length. Some web applications might not be able to resolve the URLs and/or shorten them automatically, causing issues with access to these URLs.

Operating Instruction

While theoretically there is no limit on the length of a URL , you should stay below 2000 characters to be accessible by a large number of clients and web applications